1. fast integer sort
Recently, I had a look at the integer sorting function bucket_sort() which
is in c:\euphoria\demo\allsorts.ex to see if I could use that to make a fast
character string sorting routine. *Big Mistake*. Although the routine did
work , there was a speed difference (compared to sort() ) of about tenfold,
in Rob's favour. Ouch!
Ok. What do I do now. Maybe I should study bucket_sort() and try to make it
faster (Yeah, Sure!).
Surprisingly, it worked. Of course, if Rob had really wanted to he could
easily
have optimized his code.
Both routines will sort (relatively small) integers. If any sort elements
are < 1 then the speed increase factor is about 1.2 but for elements > 0
then the factor jumps to a whopping 1.8
--Cut here-------------------------------------------------------------
--Time for Rob was 152.976705
--Time for mike was 126.980664
--Speed fraction is : 1.204724 (elements less than 1 were used for this
test)
include machine.e
tick_rate(1000)
--Function callously copied from allsorts.ex
function bucket_sort(sequence s, integer min_value, integer max_value)
-- Sort s into ascending order. No elements are compared.
-- The values of s must be integers from min_value to max_value.
sequence count, sorted
integer value, k, offset
count = repeat(0, max_value-min_value+1)
offset = min_value - 1
-- count the number of occurrences of each integer value:
for i = 1 to length(s) do
value = s[i] - offset
count[value] = count[value]+1
end for
sorted = repeat(0, length(s))
k = 1
-- make the resulting sorted sequence
for i = 1 to length(count) do
for j = 0 to count[i]-1 do
sorted[k] = i + offset
k = k + 1
end for
end for
return sorted
end function
--My own routine loosely based on above
function basic_sort3(sequence s,integer min,integer max)
integer c,k,disp,m
sequence list,result
if min<1 then
disp=min-1 s=s-disp m=1
else
disp=0 m=min
end if
result=repeat(0,length(s))
list=repeat(0,max - disp)
for i= 1 to length(s) do
c=s[i]
list[c]=list[c]+1
end for
k=1
for i = m to length(list) do
if list[i] then
c=list[i]
result[k..k+c-1] = i
k=k + c
end if
end for
if min<1 then return result + disp end if
return result
end function
-- setup test sequences & timing variables
atom t1,t2,f
f=1000
sequence q,r
q=repeat(0,100000) + rand(100)
r=repeat(0,length(q)) for i = 1 to length(r) do r[i]=q[i] end for
t1=time()
q=bucket_sort(q,1,100)
t1=time()-t1
t2=time()
r=basic_sort3(q,1,100)
t2=time()-t2
printf(1,"Time for Rob was %f \nTime for mike was %f\n\nSpeed fraction is :
%f",{t1*f, t2*f, t1/t2})
2. Re: fast integer sort
Mike wrote:
> Ok. What do I do now. Maybe I should study bucket_sort() and
> try to make it faster (Yeah, Sure!).
> Surprisingly, it worked. Of course, if Rob had really wanted to he
> could easily have optimized his code.
I think you over-optimized your routine - it doesn't sort anymore.
> Recently, I had a look at the integer sorting function bucket_sort() which
> is in c:\euphoria\demo\allsorts.ex to see if I could use that to make
> a fast
> character string sorting routine. *Big Mistake*. Although the routine did
> work , there was a speed difference (compared to sort() ) of
> about tenfold, in Rob's favour. Ouch!
It *is* possible to adapt bucket sort to efficiently handle strings.
Art Adamson came up this idea a long time ago (See Archive).
I recently wrote my own bucket sort for strings to see how fast I could
make it. With a bit of tuning I came up with something twice as fast as
the standard Euphoria sort() in sort.e. Art's was also faster than sort()
as I recall. Here's my code:
-- Sort strings using "buckets"
-- Based on an idea by Art Adamson
include sort.e
include machine.e
tick_rate(200)
constant NSTRINGS = 20000 -- number of random strings to sort
constant STRLEN = 8 -- length of strings
constant NBUCKETS = floor(NSTRINGS/10) -- number of buckets to use
constant MAX = 255*256+255+1 -- maximum value for 2 characters
constant SCALE = NBUCKETS/MAX
sequence strings, result
sequence bucket
strings = rand(repeat(repeat(256, STRLEN), NSTRINGS))-1
bucket = repeat({}, NBUCKETS)
function string_sort(sequence strings)
-- "bucket sort" for strings
integer c1, c2, b
-- put each string into the appropriate bucket
for i = 1 to length(strings) do
c1 = strings[i][1]
c2 = strings[i][2]
b = 1+floor((c1*256+c2)*SCALE)
bucket[b] = append(bucket[b], strings[i])
end for
-- sort each bucket, and concatenate the buckets
result = {}
for i = 1 to length(bucket) do
result = result & sort(bucket[i]) -- use regular sort
end for
return result
end function
procedure stats()
-- sanity check on result
printf(1, "length of result: %d\n", length(result))
puts(1, "first string: ")
? result[1]
puts(1, "last string: ")
? result[NSTRINGS]
puts(1, '\n')
end procedure
atom t
t = time()
result = sort(strings)
printf(1, "regular sort: %.2f\n", time()-t)
stats()
t = time()
result = string_sort(strings)
printf(1, "bucket sort: %.2f\n", time()-t)
stats()
The basic idea is that it's faster to sort a bunch of small
sequences, than it is to sort one big sequence.
I originally tried it with 256 buckets, where each string
is placed into a bucket based on the value of its first
character. That wasn't as fast as using the first
two characters, which implies 256*256=65536 buckets.
However 65536 is a lot of buckets if you don't have many
strings, so I scale it to NSTRINGS/10 which seems to run
faster anyway.
Note: this was only tested on *random* strings.
If the strings were not random, bucket sort would not
do as well, since there would likely be a very uneven distribution
of strings across the buckets.
Regards,
Rob Craig
Rapid Deployment Software
http://members.aol.com/FilesEu/
3. Re: fast integer sort
- Posted by Mike <michael at IGRIN.CO.NZ>
Dec 31, 1998
-
Last edited Jan 01, 1999
>Mike wrote:
>> Ok. What do I do now. Maybe I should study bucket_sort() and
>> try to make it faster (Yeah, Sure!).
>> Surprisingly, it worked. Of course, if Rob had really wanted to he
>> could easily have optimized his code.
>Rob wrote:
>I think you over-optimized your routine - it doesn't sort anymore.
>
Oh dear! Try this..
--replace this line
q=repeat(0,100000) + rand(100)
--With these
q=repeat(0,100000)
for i = 1 to length(q) do
q[i]=rand(100)
end for
michael at igrin.co.nz
4. Re: fast integer sort
- Posted by Robert Craig <rds at EMAIL.MSN.COM>
Dec 31, 1998
-
Last edited Jan 01, 1999
Mike wrote:
> Oh dear! Try this..
OK, I tried it. Your bucket sort is faster, although
in some cases yours will create extra unnecessary buckets.
Your loop at the end is faster than mine. I'm going to steal your
idea, and change demo\allsorts.ex to use a slice as follows:
-- make the resulting sorted sequence (faster way)
k = 1
for i = 1 to length(count) do
c = count[i]
if c then
sorted[k..k+c-1] = i + offset
k = k + c
end if
end for
Using a slice (instead of the subscript loop I had before)
is especially helpful when there are lots of duplicates,
(c is large) like in your test case, but it also seems to be faster
even when c is small, because the c=0 case is handled
efficiently by the if-statement.
Thanks,
Rob Craig
Rapid Deployment Software
http://members.aol.com/FilesEu/