Fast creation of a set (was: Strange machine-level exception)

new topic     » topic index » view thread      » older message » newer message

Derek Parnell wrote:

> Ricardo Forno wrote:
> > 
> > Hi Juergen.
> > I did not try your program, but apart from this, I have a suggestion to 
> > improve its speed.
> >  From the point of view of computational complexity, your code is
> > O(n*n).
> > This can be improved by first sorting the sequence, which is O(log(n)*n),
> > and then examining it linearly eliminating the equal elements, which is
> > O(n), giving a joint complexity O(log(n)*n).
> > This improvement will be more noticeable as the sequence length increases.
> 
> Actually, I tried that too because it sounds sensible. However when I did
> that,
> pre-sorting it makes it run about 3X slower and causes the resulting set to
> be ordered differently from the input set, which might be significant to some
> applications.

[code snipped]

Hi,

I'm sorry for the delay in replying. I firstly wanted to hunt that
machine-level bug. Now Pete has surrounded it.

Ricardo, thanks a lot for your suggestion, it works fine for me. This
is a cool trick which I hopefully will remember in similar situations. smile

Mathematically speaking, a set is a "collection of objects in which
order has no significance". <http://mathworld.wolfram.com/Set.html>
So no application should rely on the assumtion that the elements in
a set are in a particular order, shold it?

In the following I've tested
a) my old function using find()
b) my new function using find_from()
c) the function that Derek posted according to Ricardo's suggestion

As expected, b) is faster than a), and c) is considerably faster than b).

Derek, what data did you use for teting? Maybe using (nested) sequences
makes the difference? Just a guess.
without type_check

global function create1_set (sequence s)
   -- create a set (= remove duplicate entries)
   -- by Juergen
   integer p

   if length(s) < 2 then
      return s
   end if

   p = 1
   for i = 2 to length(s) do
      if find(s[i], s[1..p]) = 0 then
         p += 1
         s[p] = s[i]
      end if
   end for
   return s[1..p]
end function

global function create2_set (sequence s)
   -- create a set (= remove duplicate entries)
   -- by Juergen
   integer p

   if length(s) < 2 then
      return s
   end if

   p = length(s)
   for i = length(s)-1 to 1 by -1 do
      if find_from(s[i], s, i+1) = 0 then
         p -= 1
         s[p] = s[i]
      end if
   end for
   return s[p..$]
end function

include sort.e

global function create3_set (sequence s)
   -- create a set (= remove duplicate entries)
   -- by Derek, after Ricardo's suggestion 
   integer p

   if length(s) < 2 then
      return s
   end if

   s = sort(s)
   p = 1
   for i = 2 to length(s) do
      if not equal(s[p], s[i]) then
         p += 1
         if p != i then
            s[p] = s[i]
         end if
      end if
   end for
   return s[1..p]
end function

------------------------------------------------------------------------
-- Tests

constant
   n = 50000,
   tests = 2

sequence s, set1, set2, set3
atom t, t1, t2, t3
s = repeat(0, n)
t1 = 0
t2 = 0
t3 = 0

for i = 1 to tests do
   -- create random sequence
   for k = 1 to n do
      s[k] = rand(n)
   end for
   
   t = time()
   set1 = create1_set(s)
   t1 += time()-t
   
   t = time()
   set2 = create2_set(s)
   t2 += time()-t
   
   t = time()
   set3 = create3_set(s)
   t3 += time()-t
end for

puts(1, "Average elapsed time:\n")
? t1/tests                    -- e.g. 4.19 sec.
? t2/tests                    -- e.g. 2.13 sec.
? t3/tests                    -- e.g. 0.06 sec.

if getc(0) then end if

Regards,
   Juergen

new topic     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu