1. Fast creation of a set (was: Strange machine-level exception)

Derek Parnell wrote:

> Ricardo Forno wrote:
> > 
> > Hi Juergen.
> > I did not try your program, but apart from this, I have a suggestion to 
> > improve its speed.
> >  From the point of view of computational complexity, your code is
> > O(n*n).
> > This can be improved by first sorting the sequence, which is O(log(n)*n),
> > and then examining it linearly eliminating the equal elements, which is
> > O(n), giving a joint complexity O(log(n)*n).
> > This improvement will be more noticeable as the sequence length increases.
> 
> Actually, I tried that too because it sounds sensible. However when I did
> that,
> pre-sorting it makes it run about 3X slower and causes the resulting set to
> be ordered differently from the input set, which might be significant to some
> applications.

[code snipped]

Hi,

I'm sorry for the delay in replying. I firstly wanted to hunt that
machine-level bug. Now Pete has surrounded it.

Ricardo, thanks a lot for your suggestion, it works fine for me. This
is a cool trick which I hopefully will remember in similar situations. smile

Mathematically speaking, a set is a "collection of objects in which
order has no significance". <http://mathworld.wolfram.com/Set.html>
So no application should rely on the assumtion that the elements in
a set are in a particular order, shold it?

In the following I've tested
a) my old function using find()
b) my new function using find_from()
c) the function that Derek posted according to Ricardo's suggestion

As expected, b) is faster than a), and c) is considerably faster than b).

Derek, what data did you use for teting? Maybe using (nested) sequences
makes the difference? Just a guess.
without type_check

global function create1_set (sequence s)
   -- create a set (= remove duplicate entries)
   -- by Juergen
   integer p

   if length(s) < 2 then
      return s
   end if

   p = 1
   for i = 2 to length(s) do
      if find(s[i], s[1..p]) = 0 then
         p += 1
         s[p] = s[i]
      end if
   end for
   return s[1..p]
end function

global function create2_set (sequence s)
   -- create a set (= remove duplicate entries)
   -- by Juergen
   integer p

   if length(s) < 2 then
      return s
   end if

   p = length(s)
   for i = length(s)-1 to 1 by -1 do
      if find_from(s[i], s, i+1) = 0 then
         p -= 1
         s[p] = s[i]
      end if
   end for
   return s[p..$]
end function

include sort.e

global function create3_set (sequence s)
   -- create a set (= remove duplicate entries)
   -- by Derek, after Ricardo's suggestion 
   integer p

   if length(s) < 2 then
      return s
   end if

   s = sort(s)
   p = 1
   for i = 2 to length(s) do
      if not equal(s[p], s[i]) then
         p += 1
         if p != i then
            s[p] = s[i]
         end if
      end if
   end for
   return s[1..p]
end function

------------------------------------------------------------------------
-- Tests

constant
   n = 50000,
   tests = 2

sequence s, set1, set2, set3
atom t, t1, t2, t3
s = repeat(0, n)
t1 = 0
t2 = 0
t3 = 0

for i = 1 to tests do
   -- create random sequence
   for k = 1 to n do
      s[k] = rand(n)
   end for
   
   t = time()
   set1 = create1_set(s)
   t1 += time()-t
   
   t = time()
   set2 = create2_set(s)
   t2 += time()-t
   
   t = time()
   set3 = create3_set(s)
   t3 += time()-t
end for

puts(1, "Average elapsed time:\n")
? t1/tests                    -- e.g. 4.19 sec.
? t2/tests                    -- e.g. 2.13 sec.
? t3/tests                    -- e.g. 0.06 sec.

if getc(0) then end if

Regards,
   Juergen

new topic     » topic index » view message » categorize

2. Re: Fast creation of a set (was: Strange machine-level exception)

Juergen Luethje wrote:
> 
<snipped/>

> 
> Mathematically speaking, a set is a "collection of objects in which
> order has no significance". <<a
> href="http://mathworld.wolfram.com/Set.html">http://mathworld.wolfram.com/Set.html</a>>
> So no application should rely on the assumtion that the elements in
> a set are in a particular order, shold it?
> 

This is true. However, any bijection between a set and an ordered set gives the
source set an ordering.
Hence, an application may define any sort of order on a set if it is convenient
in any way. After all, there is an order relation on all Euphoria objects,
defined by:
x < y <===> compare(x,y)=-1

And since all sets we deal with in computer world are finite, hence mappable to
some starting section of |N, there is always a natural way to order any set we
encounter.
It is easy to see that any set of objects can be considered part of a set of
cardinality aleph_n, with {\it n} the maximum sequence nesting level in said
collection of objects. And there is a natural order relation over such a set,
built recursively starting from the natural ordering of integers. (Over |R (n=1),
this is the ordering of reals by their continued fraction).

CChris

new topic     » goto parent     » topic index » view message » categorize

3. Re: Fast creation of a set (was: Strange machine-level exception)

Juergen Luethje wrote:

> Derek, what data did you use for teting? Maybe using (nested) sequences
> makes the difference? Just a guess.

I don't know. I just ran your test and my version is by far the fastest now, so
I must have done something weird before.

-- 
Derek Parnell
Melbourne, Australia
Skype name: derek.j.parnell

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu