1. fast integer sort
- Posted by Mike <michael at IGRIN.CO.NZ> Dec 30, 1998
- 334 views
Recently, I had a look at the integer sorting function bucket_sort() which is in c:\euphoria\demo\allsorts.ex to see if I could use that to make a fast character string sorting routine. *Big Mistake*. Although the routine did work , there was a speed difference (compared to sort() ) of about tenfold, in Rob's favour. Ouch! Ok. What do I do now. Maybe I should study bucket_sort() and try to make it faster (Yeah, Sure!). Surprisingly, it worked. Of course, if Rob had really wanted to he could easily have optimized his code. Both routines will sort (relatively small) integers. If any sort elements are < 1 then the speed increase factor is about 1.2 but for elements > 0 then the factor jumps to a whopping 1.8 --Cut here------------------------------------------------------------- --Time for Rob was 152.976705 --Time for mike was 126.980664 --Speed fraction is : 1.204724 (elements less than 1 were used for this test) include machine.e tick_rate(1000) --Function callously copied from allsorts.ex function bucket_sort(sequence s, integer min_value, integer max_value) -- Sort s into ascending order. No elements are compared. -- The values of s must be integers from min_value to max_value. sequence count, sorted integer value, k, offset count = repeat(0, max_value-min_value+1) offset = min_value - 1 -- count the number of occurrences of each integer value: for i = 1 to length(s) do value = s[i] - offset count[value] = count[value]+1 end for sorted = repeat(0, length(s)) k = 1 -- make the resulting sorted sequence for i = 1 to length(count) do for j = 0 to count[i]-1 do sorted[k] = i + offset k = k + 1 end for end for return sorted end function --My own routine loosely based on above function basic_sort3(sequence s,integer min,integer max) integer c,k,disp,m sequence list,result if min<1 then disp=min-1 s=s-disp m=1 else disp=0 m=min end if result=repeat(0,length(s)) list=repeat(0,max - disp) for i= 1 to length(s) do c=s[i] list[c]=list[c]+1 end for k=1 for i = m to length(list) do if list[i] then c=list[i] result[k..k+c-1] = i k=k + c end if end for if min<1 then return result + disp end if return result end function -- setup test sequences & timing variables atom t1,t2,f f=1000 sequence q,r q=repeat(0,100000) + rand(100) r=repeat(0,length(q)) for i = 1 to length(r) do r[i]=q[i] end for t1=time() q=bucket_sort(q,1,100) t1=time()-t1 t2=time() r=basic_sort3(q,1,100) t2=time()-t2 printf(1,"Time for Rob was %f \nTime for mike was %f\n\nSpeed fraction is : %f",{t1*f, t2*f, t1/t2})
2. Re: fast integer sort
- Posted by Robert Craig <rds at EMAIL.MSN.COM> Dec 30, 1998
- 338 views
Mike wrote: > Ok. What do I do now. Maybe I should study bucket_sort() and > try to make it faster (Yeah, Sure!). > Surprisingly, it worked. Of course, if Rob had really wanted to he > could easily have optimized his code. I think you over-optimized your routine - it doesn't sort anymore. > Recently, I had a look at the integer sorting function bucket_sort() which > is in c:\euphoria\demo\allsorts.ex to see if I could use that to make > a fast > character string sorting routine. *Big Mistake*. Although the routine did > work , there was a speed difference (compared to sort() ) of > about tenfold, in Rob's favour. Ouch! It *is* possible to adapt bucket sort to efficiently handle strings. Art Adamson came up this idea a long time ago (See Archive). I recently wrote my own bucket sort for strings to see how fast I could make it. With a bit of tuning I came up with something twice as fast as the standard Euphoria sort() in sort.e. Art's was also faster than sort() as I recall. Here's my code: -- Sort strings using "buckets" -- Based on an idea by Art Adamson include sort.e include machine.e tick_rate(200) constant NSTRINGS = 20000 -- number of random strings to sort constant STRLEN = 8 -- length of strings constant NBUCKETS = floor(NSTRINGS/10) -- number of buckets to use constant MAX = 255*256+255+1 -- maximum value for 2 characters constant SCALE = NBUCKETS/MAX sequence strings, result sequence bucket strings = rand(repeat(repeat(256, STRLEN), NSTRINGS))-1 bucket = repeat({}, NBUCKETS) function string_sort(sequence strings) -- "bucket sort" for strings integer c1, c2, b -- put each string into the appropriate bucket for i = 1 to length(strings) do c1 = strings[i][1] c2 = strings[i][2] b = 1+floor((c1*256+c2)*SCALE) bucket[b] = append(bucket[b], strings[i]) end for -- sort each bucket, and concatenate the buckets result = {} for i = 1 to length(bucket) do result = result & sort(bucket[i]) -- use regular sort end for return result end function procedure stats() -- sanity check on result printf(1, "length of result: %d\n", length(result)) puts(1, "first string: ") ? result[1] puts(1, "last string: ") ? result[NSTRINGS] puts(1, '\n') end procedure atom t t = time() result = sort(strings) printf(1, "regular sort: %.2f\n", time()-t) stats() t = time() result = string_sort(strings) printf(1, "bucket sort: %.2f\n", time()-t) stats() The basic idea is that it's faster to sort a bunch of small sequences, than it is to sort one big sequence. I originally tried it with 256 buckets, where each string is placed into a bucket based on the value of its first character. That wasn't as fast as using the first two characters, which implies 256*256=65536 buckets. However 65536 is a lot of buckets if you don't have many strings, so I scale it to NSTRINGS/10 which seems to run faster anyway. Note: this was only tested on *random* strings. If the strings were not random, bucket sort would not do as well, since there would likely be a very uneven distribution of strings across the buckets. Regards, Rob Craig Rapid Deployment Software http://members.aol.com/FilesEu/
3. Re: fast integer sort
- Posted by Mike <michael at IGRIN.CO.NZ> Dec 31, 1998
- 321 views
- Last edited Jan 01, 1999
>Mike wrote: >> Ok. What do I do now. Maybe I should study bucket_sort() and >> try to make it faster (Yeah, Sure!). >> Surprisingly, it worked. Of course, if Rob had really wanted to he >> could easily have optimized his code. >Rob wrote: >I think you over-optimized your routine - it doesn't sort anymore. > Oh dear! Try this.. --replace this line q=repeat(0,100000) + rand(100) --With these q=repeat(0,100000) for i = 1 to length(q) do q[i]=rand(100) end for michael at igrin.co.nz
4. Re: fast integer sort
- Posted by Robert Craig <rds at EMAIL.MSN.COM> Dec 31, 1998
- 335 views
- Last edited Jan 01, 1999
Mike wrote: > Oh dear! Try this.. OK, I tried it. Your bucket sort is faster, although in some cases yours will create extra unnecessary buckets. Your loop at the end is faster than mine. I'm going to steal your idea, and change demo\allsorts.ex to use a slice as follows: -- make the resulting sorted sequence (faster way) k = 1 for i = 1 to length(count) do c = count[i] if c then sorted[k..k+c-1] = i + offset k = k + c end if end for Using a slice (instead of the subscript loop I had before) is especially helpful when there are lots of duplicates, (c is large) like in your test case, but it also seems to be faster even when c is small, because the c=0 case is handled efficiently by the if-statement. Thanks, Rob Craig Rapid Deployment Software http://members.aol.com/FilesEu/