1. Challenge for Sort programs.
- Posted by rforno at tutopia.com Jun 03, 2002
- 450 views
Hi all: I was attempting to improve the performance of the Shell sort that comes with Euphoria. One of the ways is substituting the divisor 3 by some other, especially for non-integers in the range 2.5 - 3.5. 2.6, 2.7 and 3.4 have proven to be good.. Some other tactics have been tried, with no great success. As you know, Shell sort is based on Insertion sort, one of the simple ones, with times O(n ^ 2), instead of O(n log(n)) as the efficient ones. Then I started seeking to improve the simple sorts or develop new simple sorting methods, with the aim to apply then to Shell sort replacing Insertion sort in it. My attempts failed with modified versions of Insertion sort, but I succeed in programming double_sel_sort (a modification of normal Selection sort working from both ends) and new_sort, a novel approach (also a modification of Selection sort) which takes advantage of the previous-to-best key to reduce the search range. These sorts are marginally faster than Insertion sort if the number of items is not too low. But they should not be used to replace Insertion sort inside Shell sort, because they are slower than it when the sequence is nearly sorted. Shell sort takes advantage of this fact especially in its last pass, identical to Insertion sort, precisely because the sequence is then nearly sorted. So, I tried Bubble sort, that is the fastest one for nearly sorted sequences, and the generated a new_shell_sort based on Bubble sort. It is not faster, but rather slower than the original Shell sort. Well, after this rant, you will find the code for all these sorts, plus two slightly improved versions of Bubble sort, and Selection sort, one of the basic ones (Rob: why is it not included in Allsorts.ex?). The challenge to find faster sorts remain open. I hope to have contributed some ideas on this subject. ------------------------------------------------------- function new_shell_sort(sequence x) -- Shell sort based on bubble sort integer gap, last, limit, flip object temp last = length(x) gap = floor(last / 3) + 1 while 1 do for j = 1 to gap do flip = last while flip do limit = flip flip = 0 for i = j to limit - gap by gap do temp = x[i + gap] if compare(temp, x[i]) < 0 then x[i + gap] = x[i] x[i] = temp flip = i end if end for end while end for if gap = 1 then return x else gap = floor(gap / 3) + 1 end if end while end function function new_sort(sequence x) integer len object min, temp integer first, second len = length(x) if len <= 1 then return x end if first = 1 second = 1 min = x[1] for i = 1 to len - 1 do for j = first + 1 to len do if compare(x[j], min) < 0 then second = first first = j min = x[j] end if end for if first = second then x[first] = x[i] x[i] = min first = i + 1 second = first min = x[first] else temp = x[second] x[first] = x[i] x[i] = min min = temp if second = i then second = first else first = second end if end if end for return x end function function new_bubble_sort2(sequence x) object temp, temp1 integer flip, limit, j flip = length(x) while flip do limit = flip flip = 0 for i = 1 to limit - 1 do j = i + 1 temp = x[j] temp1 = x[i] if compare(temp, temp1) < 0 then x[j] = temp1 x[i] = temp flip = i end if end for end while return x end function function new_bubble_sort(sequence x) object temp integer flip, limit, j flip = length(x) while flip do limit = flip flip = 0 for i = 1 to limit - 1 do j = i + 1 temp = x[j] if compare(temp, x[i]) < 0 then x[j] = x[i] x[i] = temp flip = i end if end for end while return x end function function double_sel_sort(sequence x) object min, max integer p, q, i, len len = length(x) i = 1 while len > i do p = i min = x[p] q = i max = min for j = i + 1 to len do if compare(x[j], min) < 0 then p = j min = x[p] elsif compare(x[j], max) > 0 then q = j max = x[q] end if end for x[p] = x[i] x[i] = min if q = i then x[p] = x[len] else x[q] = x[len] end if x[len] = max i += 1 len -= 1 end while return x end function function selection_sort(sequence x) object temp integer k, len len = length(x) for i = 1 to len - 1 do k = i temp = x[k] for j = i + 1 to len do if compare(x[j], temp) < 0 then k = j temp = x[k] end if end for x[k] = x[i] x[i] = temp end for return x end function
2. Re: Challenge for Sort programs.
- Posted by rforno at tutopia.com Jun 05, 2002
- 418 views
Quicksort is undoubtedly one of the fastest sorts. Knuth proves this is the best one under MIX (a virtual assembly program he invented). But you can check the relative performances of several sorts included in \euphoria\demo\allsorts.ex and verify this is not the case here, at least with its implementation under Euphoria. Moreover, quicksort has a drawback: if applied to data that is already sorted or nearly sorted, it is one of the slowest, even slower than the inefficient ones (this last trait can be corrected selecting the first element at random each time). Then, with the exception of some eclectic sorts in allsorts.ex, Shell sort is the fastest. Heapsort (not in this file) would be one of the candidates, but someone has to code it in Euphoria. I already attained better performance than Shell sort with SortUp, included in my package General Functions. But it has also some drawbacks: 1) It uses much more storage; 2) Its code is much longer and complex; 3) Its speed advantage starts to show at about 500 elements. I has, however, one big advantage: for already sorted or nearly sorted data, either in ascending or descending order, is much faster than any one... except the disregarded bubble sort. So, what I am trying to attain is a sort faster than the one provided as standard with Euphoria (Shell sort) and that does not have the drawbacks of the ones I mentioned before. Regards. ----- Original Message ----- From: "Don Phillips" <Graebel at hotmail.com> To: "EUforum" <EUforum at topica.com> Sent: Monday, June 03, 2002 10:48 PM Subject: RE: Challenge for Sort programs. > > > rforno at tutopia.com wrote: > > Hi all: > > I was attempting to improve the performance of the Shell sort that comes > > with Euphoria. One of the ways is substituting the divisor 3 by some > > other, > > especially for non-integers in the range 2.5 - 3.5. 2.6, 2.7 and 3.4 > > have > > proven to be good.. Some other tactics have been tried, with no great > > success. > > As you know, Shell sort is based on Insertion sort, one of the simple > > ones, > > with times O(n ^ 2), instead of O(n log(n)) as the efficient ones. > > Then I started seeking to improve the simple sorts or develop new simple > > sorting methods, with the aim to apply then to Shell sort replacing > > Insertion sort in it. My attempts failed with modified versions of > > Insertion > > sort, but I succeed in programming double_sel_sort (a modification of > > normal > > Selection sort working from both ends) and new_sort, a novel approach > > (also > > a modification of Selection sort) which takes advantage of the > > previous-to-best key to reduce the search range. These sorts are > > marginally > > faster than Insertion sort if the number of items is not too low. > > But they should not be used to replace Insertion sort inside Shell sort, > > because they are slower than it when the sequence is nearly sorted. > > Shell > > sort takes advantage of this fact especially in its last pass, identical > > to > > Insertion sort, precisely because the sequence is then nearly sorted. > > So, I tried Bubble sort, that is the fastest one for nearly sorted > > sequences, and the generated a new_shell_sort based on Bubble sort. It > > is > > not faster, but rather slower than the original Shell sort. > > Well, after this rant, you will find the code for all these sorts, plus > > two > > slightly improved versions of Bubble sort, and Selection sort, one of > > the > > basic ones (Rob: why is it not included in Allsorts.ex?). > > The challenge to find faster sorts remain open. I hope to have > > contributed > > some ideas on this subject. > > <SNIP> > > Actually I have a question. Are you simply trying to improve the speed > of the sort? Or trying to improve the performance of the shell sort > only? > > Seems to me the best way to improve its speed is to use a different algo > altogether. Like a quicksort: > > function QuickSort( sequence Data ) > sequence Left > sequence Right > integer Pivot > integer NextVal > > -- initialize > Left = {} > Right = {} > Pivot = Data[ 1 ] > > -- partition both halves > for ListIter = 2 to length(Data) do > NextVal = Data[ ListIter ] > if NextVal < Pivot then > Left &= NextVal > else > Right &= NextVal > end if > end for > > -- sort partions > if length(Left) > 1 then Left = QuickSort(Left) end if > if length(Right) > 1 then Right = QuickSort(Right) end if > > -- join back > return( Left & Pivot & Right ) > end function > > > I have not tested this for speed but it should be very compariable to > just about anything else out there... > > > >
3. Re: Challenge for Sort programs.
- Posted by rforno at tutopia.com Jun 05, 2002
- 416 views
This is a multi-part message in MIME format. ------=_NextPart_000_0010_01C20CA5.3F1568A0 charset="iso-8859-1" Hi, Karl! I tested your quick_sort, and found it gives the wrong result {1, 3, 2} for input {3, 1, 2}. Obviously some fix has to be applied. I am attaching a little program that allows full testing of sort routines. Anyway, your quick_sort routine uses a method similar to the ones in allsorts.ex (hybrid_sort, great_sort, best_sort), that are usually better than standard Shell sort and standard quick_sort. Maybe these types of sort should be the ones to be used to get more performance, but I was trying to develop a "monolithic" sort with better performance than other monolithic ones. Shell_sort using either insertion or bubble technique is, to my understanding, this type of sort. Bubble technique proved (up to this point) to be worse than insertion technique. I was also trying that the new sort routines are "stable". "Stable" means that, if transformed into an Index sorts as opposed to a Data sort, for equal keys the original order is preserved. This is important if you sort two "parallel" sequences by the key of one of them. Regards. ----- Original Message ----- From: <kbochert at ix.netcom.com> To: "EUforum" <EUforum at topica.com> Sent: Tuesday, June 04, 2002 3:48 AM Subject: RE: Challenge for Sort programs. You wrote on 6/3/02 3:47:47 PM: > >Hi all: >I was attempting to improve the performance of the Shell sort that comes >with Euphoria. <snip> >Well, after this rant, you will find the code for all these sorts, plus two >slightly improved versions of Bubble sort, and Selection sort, one of the >basic ones (Rob: why is it not included in Allsorts.ex?). >The challenge to find faster sorts remain open. I hope to have contributed >some ideas on this subject. This got me thinking about sorts and how the fastest of all sorts is the partition sort, also known as quicksort. I wrote one for Euphoria, and although it does out-perform the others, its advantage is not as great as I might have expected. I just wrote it with no concern for Euphoria optimizations, so there might be room for considerable improvement. I have also not tested it on a variety of data (of course). I am attaching the code for the quicksort, a test harness and the code from your message. Karl Bochert ------=_NextPart_000_0010_01C20CA5.3F1568A0 Content-Type: text/plain; name="test.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="test.txt" include genfunc.e procedure test_sort(integer n) sequence x, s, set integer quant for size = 1 to n do quant = 1 for j = 1 to size do quant *= j end for set = IndexGenerator(size) for which = 1 to quant do s = Permutation(set, size, which) --if equal(s, {3, 1, 2}) then --trace(1) --end if x = q_sort(s) if not equal(x, set) then puts(1, "Error\n") ? s ? x abort(1) end if end for end for end procedure --trace(1) test_sort(8) ------=_NextPart_000_0010_01C20CA5.3F1568A0--
4. Re: Challenge for Sort programs.
- Posted by kbochert at ix.netcom.com Jun 06, 2002
- 444 views
-------Phoenix-Boundary-07081998- You wrote on 6/5/02 10:25:38 AM: > >Hi, Karl! >I tested your quick_sort, and found it gives the wrong result {1, 3, 2} for >input {3, 1, 2}. Obviously some fix has to be applied. This sequence works fine. I suspect you are using the first version I sent which did indeed have a bug. I have since sent a fixed version. > I am attaching a >little program that allows full testing of sort routines. >Anyway, your quick_sort routine uses a method similar to the ones in >allsorts.ex (hybrid_sort, great_sort, best_sort), that are usually better >than standard Shell sort and standard quick_sort. Maybe these types of sort >should be the ones to be used to get more performance, but I was trying to >develop a "monolithic" sort with better performance than other monolithic >ones. Shell_sort using either insertion or bubble technique is, to my >understanding, this type of sort. Bubble technique proved (up to this >point) to be worse than insertion technique. The general principle is that the speed of a sort routine depends on the average distance that an element moves during an exchange. The quicksort is very fast because it begins at opposite ends of the array to find exchages as far apart as possible. Bubble, shell and insertion tend to only move elements short distances and therefore must be slow. I tested the sorts in allsorts.e. Of all of them, great_sort clearly appeared to be the fastest, by about 10% ( this makes it 10% faster thane sort() in the Euphoria library). I tested my qsort against great_sort, and it appeared much poorer when using the test in allsorts.e. Using my 'short sequences' test, it appeared competitive, ranging from about 5% slower on small sequences to about 5% faster on large (above 10,000 elements). The large difference between the tests is a mystery. great_sort can be improved by 5-10% by setting 'hybrid_limit' to 12. I rewrote qsort using lessons learned from great_sort, and the result is noticably improved -- 10% faster than great_sort for 200 elements, 20% faster at 1600 elements. Still tests at half the speed of great_sort on the integers test from allsorts though! >I was also trying that the new sort routines are "stable". "Stable" means >that, if transformed into an Index sorts as opposed to a Data sort, for >equal keys the original order is preserved. This is important if you sort >two "parallel" sequences by the key of one of them. great_sort seems to NOT be 'stable'. I wonder if this quality is incompatible with partition sorts=3F -------Phoenix-Boundary-07081998---
5. Re: Challenge for Sort programs.
- Posted by rforno at tutopia.com Jun 07, 2002
- 439 views
I know that bubble sort, insertion sort and others are not efficient, taking O(n ^2) time as opposed to O(n * log(n)) time for the efficient ones like Shell sort, quick sort, heap sort and merge sort, but in Shell sort you use an inefficient technique in each step, finishing with a gap of 1 unit where the inefficient sub-sort is applied, because these inefficient techniques work well with nearly-sorted data. Of the inefficient techniques, the only ones I know that benefit from nearly-sorted data are insertion sort and bubble sort. Standard Shell sort uses insertion sort, so I tried instead bubble sort with poorer results. I would not try for example selection sort, because it is indifferent to the previous sorting state of the data. Regards. ----- Original Message ----- From: <kbochert at ix.netcom.com> To: "EUforum" <EUforum at topica.com> Sent: Thursday, June 06, 2002 11:19 AM Subject: Re: Challenge for Sort programs. You wrote on 6/5/02 10:25:38 AM: > >Hi, Karl! >I tested your quick_sort, and found it gives the wrong result {1, 3, 2} for >input {3, 1, 2}. Obviously some fix has to be applied. This sequence works fine. I suspect you are using the first version I sent which did indeed have a bug. I have since sent a fixed version. > I am attaching a >little program that allows full testing of sort routines. >Anyway, your quick_sort routine uses a method similar to the ones in >allsorts.ex (hybrid_sort, great_sort, best_sort), that are usually better >than standard Shell sort and standard quick_sort. Maybe these types of sort >should be the ones to be used to get more performance, but I was trying to >develop a "monolithic" sort with better performance than other monolithic >ones. Shell_sort using either insertion or bubble technique is, to my >understanding, this type of sort. Bubble technique proved (up to this >point) to be worse than insertion technique. The general principle is that the speed of a sort routine depends on the average distance that an element moves during an exchange. The quicksort is very fast because it begins at opposite ends of the array to find exchages as far apart as possible. Bubble, shell and insertion tend to only move elements short distances and therefore must be slow. I tested the sorts in allsorts.e. Of all of them, great_sort clearly appeared to be the fastest, by about 10% ( this makes it 10% faster thane sort() in the Euphoria library). I tested my qsort against great_sort, and it appeared much poorer when using the test in allsorts.e. Using my 'short sequences' test, it appeared competitive, ranging from about 5% slower on small sequences to about 5% faster on large (above 10,000 elements). The large difference between the tests is a mystery. great_sort can be improved by 5-10% by setting 'hybrid_limit' to 12. I rewrote qsort using lessons learned from great_sort, and the result is noticably improved -- 10% faster than great_sort for 200 elements, 20% faster at 1600 elements. Still tests at half the speed of great_sort on the integers test from allsorts though! >I was also trying that the new sort routines are "stable". "Stable" means >that, if transformed into an Index sorts as opposed to a Data sort, for >equal keys the original order is preserved. This is important if you sort >two "parallel" sequences by the key of one of them. great_sort seems to NOT be 'stable'. I wonder if this quality is incompatible with partition sorts?
6. Re: Challenge for Sort programs.
- Posted by a.tammer at hetnet.nl Jun 08, 2002
- 434 views
Hi Sorters, won't enter in this competition. Only real fast sort know to me : binary sort. Will be possible on whatever data for AFAIK why do other sort-algorithms then? IF taking simplest and fastest approach THEN got more time to do other things ENDIF a@t