OpenEuphoria: Forum: fast integer sort

1. fast integer sort

Posted by Mike <michael at IGRIN.CO.NZ> Dec 30, 1998
479 views

Recently, I had a look at the integer sorting function bucket_sort() which
is in c:\euphoria\demo\allsorts.ex to see if I could use that to make a fast
character string sorting routine. *Big Mistake*. Although the routine did
work , there was a speed difference (compared to sort() ) of about tenfold,
in Rob's favour. Ouch!
Ok. What do I do now. Maybe I should study bucket_sort() and try to make it
faster (Yeah, Sure!).
Surprisingly, it worked. Of course, if Rob had really wanted to he could
easily
have optimized his code.
Both routines will sort (relatively small) integers. If any sort elements
are < 1 then the speed increase factor is about 1.2 but for elements > 0
then the factor jumps to a whopping 1.8

--Cut  here-------------------------------------------------------------
--Time for Rob was 152.976705
--Time for mike was 126.980664

--Speed fraction is :  1.204724 (elements less than 1 were used for this
test)

include machine.e
tick_rate(1000)

--Function callously copied from allsorts.ex
function bucket_sort(sequence s, integer min_value, integer max_value)
-- Sort s into ascending order. No elements are compared.
-- The values of s must be integers from min_value to max_value.
    sequence count, sorted
    integer value, k, offset
    count = repeat(0, max_value-min_value+1)
    offset = min_value - 1
    -- count the number of occurrences of each integer value:
for i = 1 to length(s) do
value = s[i] - offset
count[value] = count[value]+1
    end for
    sorted = repeat(0, length(s))
    k = 1
    -- make the resulting sorted sequence
    for i = 1 to length(count) do
for j = 0 to count[i]-1 do
     sorted[k] = i + offset
     k = k + 1
end for
    end for
    return sorted
end function


--My own routine loosely based on above
function basic_sort3(sequence s,integer min,integer max)
integer c,k,disp,m
sequence list,result
if min<1 then
  disp=min-1 s=s-disp m=1
else
  disp=0 m=min
end if
result=repeat(0,length(s))
list=repeat(0,max - disp)
for i= 1 to length(s) do
  c=s[i]
  list[c]=list[c]+1
end for
k=1
for i = m to length(list) do
  if list[i] then
   c=list[i]
   result[k..k+c-1] = i
   k=k + c
  end if
end for
if min<1 then return result + disp end if
return result
end function


-- setup test sequences & timing variables
atom t1,t2,f
f=1000
sequence q,r

q=repeat(0,100000) + rand(100)
r=repeat(0,length(q)) for i = 1 to length(r) do r[i]=q[i] end for

t1=time()
q=bucket_sort(q,1,100)
t1=time()-t1

t2=time()
r=basic_sort3(q,1,100)
t2=time()-t2

printf(1,"Time for Rob was %f \nTime for mike was %f\n\nSpeed fraction is :
%f",{t1*f, t2*f, t1/t2})

new topic » topic index » view message » categorize

2. Re: fast integer sort

Posted by Robert Craig <rds at EMAIL.MSN.COM> Dec 30, 1998
494 views

Mike wrote:
> Ok. What do I do now. Maybe I should study bucket_sort() and
> try to make it faster (Yeah, Sure!).
> Surprisingly, it worked. Of course, if Rob had really wanted to he
> could easily have optimized his code.

I think you over-optimized your routine - it doesn't sort anymore.

> Recently, I had a look at the integer sorting function bucket_sort() which
> is in c:\euphoria\demo\allsorts.ex to see if I could use that to make
> a fast
> character string sorting routine. *Big Mistake*. Although the routine did
> work , there was a speed difference (compared to sort() ) of
> about tenfold, in Rob's favour. Ouch!

It *is* possible to adapt bucket sort to efficiently handle strings.
Art Adamson came up this idea a long time ago (See Archive).
I recently wrote my own bucket sort for strings to see how fast I could
make it. With a bit of tuning I came up with something twice as fast as
the standard Euphoria sort() in sort.e. Art's was also faster than sort()
as I recall. Here's my code:

-- Sort strings using "buckets"
-- Based on an idea by Art Adamson

include sort.e
include machine.e
tick_rate(200)

constant NSTRINGS = 20000  -- number of random strings to sort
constant STRLEN = 8        -- length of strings
constant NBUCKETS = floor(NSTRINGS/10)  -- number of buckets to use
constant MAX = 255*256+255+1  -- maximum value for 2 characters
constant SCALE = NBUCKETS/MAX

sequence strings, result
sequence bucket
strings = rand(repeat(repeat(256, STRLEN), NSTRINGS))-1
bucket = repeat({}, NBUCKETS)

function string_sort(sequence strings)
-- "bucket sort" for strings
    integer c1, c2, b

    -- put each string into the appropriate bucket
    for i = 1 to length(strings) do
        c1 = strings[i][1]
        c2 = strings[i][2]
        b = 1+floor((c1*256+c2)*SCALE)
        bucket[b] = append(bucket[b], strings[i])
    end for

    -- sort each bucket, and concatenate the buckets
    result = {}
    for i = 1 to length(bucket) do
        result = result & sort(bucket[i]) -- use regular sort
    end for
    return result
end function

procedure stats()
-- sanity check on result
    printf(1, "length of result: %d\n", length(result))
    puts(1, "first string: ")
    ? result[1]
    puts(1, "last string: ")
    ? result[NSTRINGS]
    puts(1, '\n')
end procedure

atom t
t = time()
result = sort(strings)
printf(1, "regular sort: %.2f\n", time()-t)
stats()

t = time()
result = string_sort(strings)
printf(1, "bucket sort: %.2f\n", time()-t)
stats()

The basic idea is that it's faster to sort a bunch of small
sequences, than it is to sort one big sequence.

I originally tried it with 256 buckets, where each string
is placed into a bucket based on the value of its first
character. That wasn't as fast as using the first
two characters, which implies 256*256=65536 buckets.
However 65536 is  a lot of buckets if you don't have many
strings, so I scale it to NSTRINGS/10 which seems to run
faster anyway.

Note: this was only tested on *random* strings.
If the strings were not random, bucket sort would not
do as well, since there would likely be a very uneven distribution
of strings across the buckets.

Regards,
     Rob Craig
     Rapid Deployment Software
     http://members.aol.com/FilesEu/

new topic » goto parent » topic index » view message » categorize

3. Re: fast integer sort

Posted by Mike <michael at IGRIN.CO.NZ> Dec 31, 1998
460 views
Last edited Jan 01, 1999

>Mike wrote:
>> Ok. What do I do now. Maybe I should study bucket_sort() and
>> try to make it faster (Yeah, Sure!).
>> Surprisingly, it worked. Of course, if Rob had really wanted to he
>> could easily have optimized his code.

>Rob wrote:
>I think you over-optimized your routine - it doesn't sort anymore.
>
 Oh dear! Try this..

--replace this line
q=repeat(0,100000) + rand(100)

--With these
q=repeat(0,100000)
for i = 1 to length(q) do
    q[i]=rand(100)
end for

michael at igrin.co.nz

new topic » goto parent » topic index » view message » categorize

4. Re: fast integer sort

Posted by Robert Craig <rds at EMAIL.MSN.COM> Dec 31, 1998
475 views
Last edited Jan 01, 1999

Mike wrote:
> Oh dear! Try this..

OK, I tried it. Your bucket sort is faster, although
in some cases yours will create extra unnecessary buckets.

Your loop at the end is faster than mine. I'm going to steal your
idea, and change demo\allsorts.ex to use a slice as follows:

-- make the resulting sorted sequence (faster way)
k = 1
for i = 1 to length(count) do
     c = count[i]
     if c then
         sorted[k..k+c-1] = i + offset
         k = k + c
     end if
end for

Using a slice (instead of the subscript loop I had before)
is especially helpful when there are lots of duplicates,
(c is large) like in your test case, but it also seems to be faster
 even when c is small, because the c=0 case is handled
efficiently by the if-statement.

Thanks,
     Rob Craig
     Rapid Deployment Software
     http://members.aol.com/FilesEu/

OpenEuphoria

1. fast integer sort

2. Re: fast integer sort

3. Re: fast integer sort

4. Re: fast integer sort

Search

Include:

Quick Links

User menu

Misc Menu