1. Challenge for Sort programs.
- Posted by rforno at tutopia.com
Jun 03, 2002
Hi all:
I was attempting to improve the performance of the Shell sort that comes
with Euphoria. One of the ways is substituting the divisor 3 by some other,
especially for non-integers in the range 2.5 - 3.5. 2.6, 2.7 and 3.4 have
proven to be good.. Some other tactics have been tried, with no great
success.
As you know, Shell sort is based on Insertion sort, one of the simple ones,
with times O(n ^ 2), instead of O(n log(n)) as the efficient ones.
Then I started seeking to improve the simple sorts or develop new simple
sorting methods, with the aim to apply then to Shell sort replacing
Insertion sort in it. My attempts failed with modified versions of Insertion
sort, but I succeed in programming double_sel_sort (a modification of normal
Selection sort working from both ends) and new_sort, a novel approach (also
a modification of Selection sort) which takes advantage of the
previous-to-best key to reduce the search range. These sorts are marginally
faster than Insertion sort if the number of items is not too low.
But they should not be used to replace Insertion sort inside Shell sort,
because they are slower than it when the sequence is nearly sorted. Shell
sort takes advantage of this fact especially in its last pass, identical to
Insertion sort, precisely because the sequence is then nearly sorted.
So, I tried Bubble sort, that is the fastest one for nearly sorted
sequences, and the generated a new_shell_sort based on Bubble sort. It is
not faster, but rather slower than the original Shell sort.
Well, after this rant, you will find the code for all these sorts, plus two
slightly improved versions of Bubble sort, and Selection sort, one of the
basic ones (Rob: why is it not included in Allsorts.ex?).
The challenge to find faster sorts remain open. I hope to have contributed
some ideas on this subject.
-------------------------------------------------------
function new_shell_sort(sequence x)
-- Shell sort based on bubble sort
integer gap, last, limit, flip
object temp
last = length(x)
gap = floor(last / 3) + 1
while 1 do
for j = 1 to gap do
flip = last
while flip do
limit = flip
flip = 0
for i = j to limit - gap by gap do
temp = x[i + gap]
if compare(temp, x[i]) < 0 then
x[i + gap] = x[i]
x[i] = temp
flip = i
end if
end for
end while
end for
if gap = 1 then
return x
else
gap = floor(gap / 3) + 1
end if
end while
end function
function new_sort(sequence x)
integer len
object min, temp
integer first, second
len = length(x)
if len <= 1 then
return x
end if
first = 1
second = 1
min = x[1]
for i = 1 to len - 1 do
for j = first + 1 to len do
if compare(x[j], min) < 0 then
second = first
first = j
min = x[j]
end if
end for
if first = second then
x[first] = x[i]
x[i] = min
first = i + 1
second = first
min = x[first]
else
temp = x[second]
x[first] = x[i]
x[i] = min
min = temp
if second = i then
second = first
else
first = second
end if
end if
end for
return x
end function
function new_bubble_sort2(sequence x)
object temp, temp1
integer flip, limit, j
flip = length(x)
while flip do
limit = flip
flip = 0
for i = 1 to limit - 1 do
j = i + 1
temp = x[j]
temp1 = x[i]
if compare(temp, temp1) < 0 then
x[j] = temp1
x[i] = temp
flip = i
end if
end for
end while
return x
end function
function new_bubble_sort(sequence x)
object temp
integer flip, limit, j
flip = length(x)
while flip do
limit = flip
flip = 0
for i = 1 to limit - 1 do
j = i + 1
temp = x[j]
if compare(temp, x[i]) < 0 then
x[j] = x[i]
x[i] = temp
flip = i
end if
end for
end while
return x
end function
function double_sel_sort(sequence x)
object min, max
integer p, q, i, len
len = length(x)
i = 1
while len > i do
p = i
min = x[p]
q = i
max = min
for j = i + 1 to len do
if compare(x[j], min) < 0 then
p = j
min = x[p]
elsif compare(x[j], max) > 0 then
q = j
max = x[q]
end if
end for
x[p] = x[i]
x[i] = min
if q = i then
x[p] = x[len]
else
x[q] = x[len]
end if
x[len] = max
i += 1
len -= 1
end while
return x
end function
function selection_sort(sequence x)
object temp
integer k, len
len = length(x)
for i = 1 to len - 1 do
k = i
temp = x[k]
for j = i + 1 to len do
if compare(x[j], temp) < 0 then
k = j
temp = x[k]
end if
end for
x[k] = x[i]
x[i] = temp
end for
return x
end function
2. Re: Challenge for Sort programs.
- Posted by rforno at tutopia.com
Jun 05, 2002
Quicksort is undoubtedly one of the fastest sorts. Knuth proves this is the
best one under MIX (a virtual assembly program he invented).
But you can check the relative performances of several sorts included in
\euphoria\demo\allsorts.ex and verify this is not the case here, at least
with its implementation under Euphoria. Moreover, quicksort has a drawback:
if applied to data that is already sorted or nearly sorted, it is one of the
slowest, even slower than the inefficient ones (this last trait can be
corrected selecting the first element at random each time).
Then, with the exception of some eclectic sorts in allsorts.ex, Shell sort
is the fastest. Heapsort (not in this file) would be one of the candidates,
but someone has to code it in Euphoria.
I already attained better performance than Shell sort with SortUp, included
in my package General Functions. But it has also some drawbacks: 1) It uses
much more storage; 2) Its code is much longer and complex; 3) Its speed
advantage starts to show at about 500 elements. I has, however, one big
advantage: for already sorted or nearly sorted data, either in ascending or
descending order, is much faster than any one... except the disregarded
bubble sort.
So, what I am trying to attain is a sort faster than the one provided as
standard with Euphoria (Shell sort) and that does not have the drawbacks of
the ones I mentioned before.
Regards.
----- Original Message -----
From: "Don Phillips" <Graebel at hotmail.com>
To: "EUforum" <EUforum at topica.com>
Sent: Monday, June 03, 2002 10:48 PM
Subject: RE: Challenge for Sort programs.
>
>
> rforno at tutopia.com wrote:
> > Hi all:
> > I was attempting to improve the performance of the Shell sort that comes
> > with Euphoria. One of the ways is substituting the divisor 3 by some
> > other,
> > especially for non-integers in the range 2.5 - 3.5. 2.6, 2.7 and 3.4
> > have
> > proven to be good.. Some other tactics have been tried, with no great
> > success.
> > As you know, Shell sort is based on Insertion sort, one of the simple
> > ones,
> > with times O(n ^ 2), instead of O(n log(n)) as the efficient ones.
> > Then I started seeking to improve the simple sorts or develop new simple
> > sorting methods, with the aim to apply then to Shell sort replacing
> > Insertion sort in it. My attempts failed with modified versions of
> > Insertion
> > sort, but I succeed in programming double_sel_sort (a modification of
> > normal
> > Selection sort working from both ends) and new_sort, a novel approach
> > (also
> > a modification of Selection sort) which takes advantage of the
> > previous-to-best key to reduce the search range. These sorts are
> > marginally
> > faster than Insertion sort if the number of items is not too low.
> > But they should not be used to replace Insertion sort inside Shell sort,
> > because they are slower than it when the sequence is nearly sorted.
> > Shell
> > sort takes advantage of this fact especially in its last pass, identical
> > to
> > Insertion sort, precisely because the sequence is then nearly sorted.
> > So, I tried Bubble sort, that is the fastest one for nearly sorted
> > sequences, and the generated a new_shell_sort based on Bubble sort. It
> > is
> > not faster, but rather slower than the original Shell sort.
> > Well, after this rant, you will find the code for all these sorts, plus
> > two
> > slightly improved versions of Bubble sort, and Selection sort, one of
> > the
> > basic ones (Rob: why is it not included in Allsorts.ex?).
> > The challenge to find faster sorts remain open. I hope to have
> > contributed
> > some ideas on this subject.
>
> <SNIP>
>
> Actually I have a question. Are you simply trying to improve the speed
> of the sort? Or trying to improve the performance of the shell sort
> only?
>
> Seems to me the best way to improve its speed is to use a different algo
> altogether. Like a quicksort:
>
> function QuickSort( sequence Data )
> sequence Left
> sequence Right
> integer Pivot
> integer NextVal
>
> -- initialize
> Left = {}
> Right = {}
> Pivot = Data[ 1 ]
>
> -- partition both halves
> for ListIter = 2 to length(Data) do
> NextVal = Data[ ListIter ]
> if NextVal < Pivot then
> Left &= NextVal
> else
> Right &= NextVal
> end if
> end for
>
> -- sort partions
> if length(Left) > 1 then Left = QuickSort(Left) end if
> if length(Right) > 1 then Right = QuickSort(Right) end if
>
> -- join back
> return( Left & Pivot & Right )
> end function
>
>
> I have not tested this for speed but it should be very compariable to
> just about anything else out there...
>
>
>
>
3. Re: Challenge for Sort programs.
- Posted by rforno at tutopia.com
Jun 05, 2002
This is a multi-part message in MIME format.
------=_NextPart_000_0010_01C20CA5.3F1568A0
charset="iso-8859-1"
Hi, Karl!
I tested your quick_sort, and found it gives the wrong result {1, 3, 2} for
input {3, 1, 2}. Obviously some fix has to be applied. I am attaching a
little program that allows full testing of sort routines.
Anyway, your quick_sort routine uses a method similar to the ones in
allsorts.ex (hybrid_sort, great_sort, best_sort), that are usually better
than standard Shell sort and standard quick_sort. Maybe these types of sort
should be the ones to be used to get more performance, but I was trying to
develop a "monolithic" sort with better performance than other monolithic
ones. Shell_sort using either insertion or bubble technique is, to my
understanding, this type of sort. Bubble technique proved (up to this point)
to be worse than insertion technique.
I was also trying that the new sort routines are "stable". "Stable" means
that, if transformed into an Index sorts as opposed to a Data sort, for
equal keys the original order is preserved. This is important if you sort
two "parallel" sequences by the key of one of them.
Regards.
----- Original Message -----
From: <kbochert at ix.netcom.com>
To: "EUforum" <EUforum at topica.com>
Sent: Tuesday, June 04, 2002 3:48 AM
Subject: RE: Challenge for Sort programs.
You wrote on 6/3/02 3:47:47 PM:
>
>Hi all:
>I was attempting to improve the performance of the Shell sort that comes
>with Euphoria.
<snip>
>Well, after this rant, you will find the code for all these sorts, plus two
>slightly improved versions of Bubble sort, and Selection sort, one of the
>basic ones (Rob: why is it not included in Allsorts.ex?).
>The challenge to find faster sorts remain open. I hope to have contributed
>some ideas on this subject.
This got me thinking about sorts and how the fastest of all sorts is the
partition sort, also known as quicksort. I wrote one for Euphoria, and
although
it does out-perform the others, its advantage is not as great as I might
have expected. I just wrote it with no concern for Euphoria optimizations,
so
there might be room for considerable improvement. I have also not tested it
on
a variety of data (of course). I am attaching the code
for the quicksort, a test harness and the code from your message.
Karl Bochert
------=_NextPart_000_0010_01C20CA5.3F1568A0
Content-Type: text/plain;
name="test.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename="test.txt"
include genfunc.e
procedure test_sort(integer n)
sequence x, s, set
integer quant
for size = 1 to n do
quant = 1
for j = 1 to size do
quant *= j
end for
set = IndexGenerator(size)
for which = 1 to quant do
s = Permutation(set, size, which)
--if equal(s, {3, 1, 2}) then
--trace(1)
--end if
x = q_sort(s)
if not equal(x, set) then
puts(1, "Error\n")
? s
? x
abort(1)
end if
end for
end for
end procedure
--trace(1)
test_sort(8)
------=_NextPart_000_0010_01C20CA5.3F1568A0--
4. Re: Challenge for Sort programs.
-------Phoenix-Boundary-07081998-
You wrote on 6/5/02 10:25:38 AM:
>
>Hi, Karl!
>I tested your quick_sort, and found it gives the wrong result {1, 3, 2} for
>input {3, 1, 2}. Obviously some fix has to be applied.
This sequence works fine. I suspect you are using the first version I
sent which did indeed have a bug. I have since sent a fixed version.
> I am attaching a
>little program that allows full testing of sort routines.
>Anyway, your quick_sort routine uses a method similar to the ones in
>allsorts.ex (hybrid_sort, great_sort, best_sort), that are usually better
>than standard Shell sort and standard quick_sort. Maybe these types of sort
>should be the ones to be used to get more performance, but I was trying to
>develop a "monolithic" sort with better performance than other monolithic
>ones. Shell_sort using either insertion or bubble technique is, to my
>understanding, this type of sort. Bubble technique proved (up to this
>point) to be worse than insertion technique.
The general principle is that the speed of a sort routine depends on the
average distance that an element moves during an exchange. The quicksort
is very fast because it begins at opposite ends of the array to find
exchages as far apart as possible. Bubble, shell and insertion tend to
only move elements short distances and therefore must be slow.
I tested the sorts in allsorts.e.
Of all of them, great_sort clearly appeared to be the fastest,
by about 10% ( this makes it 10% faster thane sort() in the
Euphoria library).
I tested my qsort against great_sort, and it appeared much poorer
when using the test in allsorts.e. Using my 'short sequences'
test, it appeared competitive, ranging from about 5% slower on small
sequences to about 5% faster on large (above 10,000 elements).
The large difference between the tests is a mystery.
great_sort can be improved by 5-10% by setting 'hybrid_limit' to 12.
I rewrote qsort using lessons learned from great_sort, and the result
is noticably improved -- 10% faster than great_sort for 200 elements,
20% faster at 1600 elements. Still tests at half the speed of
great_sort on the integers test from allsorts though!
>I was also trying that the new sort routines are "stable". "Stable" means
>that, if transformed into an Index sorts as opposed to a Data sort, for
>equal keys the original order is preserved. This is important if you sort
>two "parallel" sequences by the key of one of them.
great_sort seems to NOT be 'stable'. I wonder if this quality is
incompatible with partition sorts=3F
-------Phoenix-Boundary-07081998---
5. Re: Challenge for Sort programs.
- Posted by rforno at tutopia.com
Jun 07, 2002
I know that bubble sort, insertion sort and others are not efficient, taking
O(n ^2) time as opposed to O(n * log(n)) time for the efficient ones like
Shell sort, quick sort, heap sort and merge sort, but in Shell sort you use
an inefficient technique in each step, finishing with a gap of 1 unit where
the inefficient sub-sort is applied, because these inefficient techniques
work well with nearly-sorted data. Of the inefficient techniques, the only
ones I know that benefit from nearly-sorted data are insertion sort and
bubble sort. Standard Shell sort uses insertion sort, so I tried instead
bubble sort with poorer results. I would not try for example selection sort,
because it is indifferent to the previous sorting state of the data.
Regards.
----- Original Message -----
From: <kbochert at ix.netcom.com>
To: "EUforum" <EUforum at topica.com>
Sent: Thursday, June 06, 2002 11:19 AM
Subject: Re: Challenge for Sort programs.
You wrote on 6/5/02 10:25:38 AM:
>
>Hi, Karl!
>I tested your quick_sort, and found it gives the wrong result {1, 3, 2} for
>input {3, 1, 2}. Obviously some fix has to be applied.
This sequence works fine. I suspect you are using the first version I
sent which did indeed have a bug. I have since sent a fixed version.
> I am attaching a
>little program that allows full testing of sort routines.
>Anyway, your quick_sort routine uses a method similar to the ones in
>allsorts.ex (hybrid_sort, great_sort, best_sort), that are usually better
>than standard Shell sort and standard quick_sort. Maybe these types of sort
>should be the ones to be used to get more performance, but I was trying to
>develop a "monolithic" sort with better performance than other monolithic
>ones. Shell_sort using either insertion or bubble technique is, to my
>understanding, this type of sort. Bubble technique proved (up to this
>point) to be worse than insertion technique.
The general principle is that the speed of a sort routine depends on the
average distance that an element moves during an exchange. The quicksort
is very fast because it begins at opposite ends of the array to find
exchages as far apart as possible. Bubble, shell and insertion tend to
only move elements short distances and therefore must be slow.
I tested the sorts in allsorts.e.
Of all of them, great_sort clearly appeared to be the fastest,
by about 10% ( this makes it 10% faster thane sort() in the
Euphoria library).
I tested my qsort against great_sort, and it appeared much poorer
when using the test in allsorts.e. Using my 'short sequences'
test, it appeared competitive, ranging from about 5% slower on small
sequences to about 5% faster on large (above 10,000 elements).
The large difference between the tests is a mystery.
great_sort can be improved by 5-10% by setting 'hybrid_limit' to 12.
I rewrote qsort using lessons learned from great_sort, and the result
is noticably improved -- 10% faster than great_sort for 200 elements,
20% faster at 1600 elements. Still tests at half the speed of
great_sort on the integers test from allsorts though!
>I was also trying that the new sort routines are "stable". "Stable" means
>that, if transformed into an Index sorts as opposed to a Data sort, for
>equal keys the original order is preserved. This is important if you sort
>two "parallel" sequences by the key of one of them.
great_sort seems to NOT be 'stable'. I wonder if this quality is
incompatible with partition sorts?
6. Re: Challenge for Sort programs.
- Posted by a.tammer at hetnet.nl
Jun 08, 2002
Hi Sorters,
won't enter in this competition.
Only real fast sort know to me :
binary sort.
Will be possible on whatever data
for AFAIK
why do other sort-algorithms then?
IF
taking simplest and fastest approach
THEN
got more time to do other things
ENDIF
a@t