1. Challenge for Sort programs.

Hi all:
I was attempting to improve the performance of the Shell sort that comes
with Euphoria. One of the ways is substituting the divisor 3 by some other,
especially for non-integers in the range 2.5 - 3.5. 2.6, 2.7 and 3.4 have
proven to be good.. Some other tactics have been tried, with no great
success.
As you know, Shell sort is based on Insertion sort, one of the simple ones,
with times O(n ^ 2), instead of O(n log(n)) as the efficient ones.
Then I started seeking to improve the simple sorts or develop new simple
sorting methods, with the aim to apply then to Shell sort replacing
Insertion sort in it. My attempts failed with modified versions of Insertion
sort, but I succeed in programming double_sel_sort (a modification of normal
Selection sort working from both ends) and new_sort, a novel approach (also
a modification of Selection sort) which takes advantage of the
previous-to-best key to reduce the search range. These sorts are marginally
faster than Insertion sort if the number of items is not too low.
But they should not be used to replace Insertion sort inside Shell sort,
because they are slower than it when the sequence is nearly sorted. Shell
sort takes advantage of this fact especially in its last pass, identical to
Insertion sort, precisely because the sequence is then nearly sorted.
So, I tried Bubble sort, that is the fastest one for nearly sorted
sequences, and the generated a new_shell_sort based on Bubble sort. It is
not faster, but rather slower than the original Shell sort.
Well, after this rant, you will find the code for all these sorts, plus two
slightly improved versions of Bubble sort, and Selection sort, one of the
basic ones (Rob: why is it not included in Allsorts.ex?).
The challenge to find faster sorts remain open. I hope to have contributed
some ideas on this subject.

-------------------------------------------------------

function new_shell_sort(sequence x)
-- Shell sort based on bubble sort
    integer gap, last, limit, flip
    object temp
    last = length(x)
    gap = floor(last / 3) + 1
    while 1 do
        for j = 1 to gap do
            flip = last
            while flip do
                limit = flip
                flip = 0
                for i = j to limit - gap by gap do
                    temp = x[i + gap]
                    if compare(temp, x[i]) < 0 then
                        x[i + gap] = x[i]
                        x[i] = temp
                        flip = i
                    end if
                end for
            end while
        end for
        if gap = 1 then
            return x
        else
            gap = floor(gap / 3) + 1
        end if
    end while
end function

function new_sort(sequence x)
    integer len
    object min, temp
    integer first, second
    len = length(x)
    if len <= 1 then
        return x
    end if
    first = 1
    second = 1
    min = x[1]
    for i = 1 to len - 1 do
        for j = first + 1 to len do
            if compare(x[j], min) < 0 then
                second = first
                first = j
                min = x[j]
            end if
        end for
        if first = second then
            x[first] = x[i]
            x[i] = min
            first = i + 1
            second = first
            min = x[first]
        else
            temp = x[second]
            x[first] = x[i]
            x[i] = min
            min = temp
            if second = i then
                second = first
            else
                first = second
            end if
        end if
    end for
    return x
end function

function new_bubble_sort2(sequence x)
    object temp, temp1
    integer flip, limit, j
    flip = length(x)
    while flip do
        limit = flip
        flip = 0
        for i = 1 to limit - 1 do
            j = i + 1
            temp = x[j]
            temp1 = x[i]
            if compare(temp, temp1) < 0 then
                x[j] = temp1
                x[i] = temp
                flip = i
            end if
        end for
    end while
    return x
end function

function new_bubble_sort(sequence x)
    object temp
    integer flip, limit, j
    flip = length(x)
    while flip do
        limit = flip
        flip = 0
        for i = 1 to limit - 1 do
            j = i + 1
            temp = x[j]
            if compare(temp, x[i]) < 0 then
                x[j] = x[i]
                x[i] = temp
                flip = i
            end if
        end for
    end while
    return x
end function

function double_sel_sort(sequence x)
    object min, max
    integer p, q, i, len
    len = length(x)
    i = 1
    while len > i do
        p = i
        min = x[p]
        q = i
        max = min
        for j = i + 1 to len do
            if compare(x[j], min) < 0 then
                p = j
                min = x[p]
            elsif compare(x[j], max) > 0 then
                q = j
                max = x[q]
            end if
        end for
        x[p] = x[i]
        x[i] = min
        if q = i then
            x[p] = x[len]
        else
            x[q] = x[len]
        end if
        x[len] = max
        i += 1
        len -= 1
    end while
    return x
end function

function selection_sort(sequence x)
    object temp
    integer k, len
    len = length(x)
    for i = 1 to len - 1 do
        k = i
        temp = x[k]
        for j = i + 1 to len do
            if compare(x[j], temp) < 0 then
                k = j
                temp = x[k]
            end if
        end for
        x[k] = x[i]
        x[i] = temp
    end for
    return x
end function

new topic     » topic index » view message » categorize

2. Re: Challenge for Sort programs.

Quicksort is undoubtedly one of the fastest sorts. Knuth proves this is the
best one under MIX (a virtual assembly program he invented).
But you can check the relative performances of several sorts included in
\euphoria\demo\allsorts.ex and verify this is not the case here, at least
with its implementation under Euphoria. Moreover, quicksort has a drawback:
if applied to data that is already sorted or nearly sorted, it is one of the
slowest, even slower than the inefficient ones (this last trait can be
corrected selecting the first element at random each time).
Then, with the exception of some eclectic sorts in allsorts.ex, Shell sort
is the fastest. Heapsort (not in this file) would be one of the candidates,
but someone has to code it in Euphoria.
I already attained better performance than Shell sort with SortUp, included
in my package General Functions. But it has also some drawbacks: 1) It uses
much more storage; 2) Its code is much longer and complex; 3) Its speed
advantage starts to show at about 500 elements. I has, however, one big
advantage: for already sorted or nearly sorted data, either in ascending or
descending order, is much faster than any one... except the disregarded
bubble sort.
So, what I am trying to attain is a sort faster than the one provided as
standard with Euphoria (Shell sort) and that does not have the drawbacks of
the ones I mentioned before.
Regards.
----- Original Message -----
From: "Don Phillips" <Graebel at hotmail.com>
To: "EUforum" <EUforum at topica.com>
Sent: Monday, June 03, 2002 10:48 PM
Subject: RE: Challenge for Sort programs.


>
>
> rforno at tutopia.com wrote:
> > Hi all:
> > I was attempting to improve the performance of the Shell sort that comes
> > with Euphoria. One of the ways is substituting the divisor 3 by some
> > other,
> > especially for non-integers in the range 2.5 - 3.5. 2.6, 2.7 and 3.4
> > have
> > proven to be good.. Some other tactics have been tried, with no great
> > success.
> > As you know, Shell sort is based on Insertion sort, one of the simple
> > ones,
> > with times O(n ^ 2), instead of O(n log(n)) as the efficient ones.
> > Then I started seeking to improve the simple sorts or develop new simple
> > sorting methods, with the aim to apply then to Shell sort replacing
> > Insertion sort in it. My attempts failed with modified versions of
> > Insertion
> > sort, but I succeed in programming double_sel_sort (a modification of
> > normal
> > Selection sort working from both ends) and new_sort, a novel approach
> > (also
> > a modification of Selection sort) which takes advantage of the
> > previous-to-best key to reduce the search range. These sorts are
> > marginally
> > faster than Insertion sort if the number of items is not too low.
> > But they should not be used to replace Insertion sort inside Shell sort,
> > because they are slower than it when the sequence is nearly sorted.
> > Shell
> > sort takes advantage of this fact especially in its last pass, identical
> > to
> > Insertion sort, precisely because the sequence is then nearly sorted.
> > So, I tried Bubble sort, that is the fastest one for nearly sorted
> > sequences, and the generated a new_shell_sort based on Bubble sort. It
> > is
> > not faster, but rather slower than the original Shell sort.
> > Well, after this rant, you will find the code for all these sorts, plus
> > two
> > slightly improved versions of Bubble sort, and Selection sort, one of
> > the
> > basic ones (Rob: why is it not included in Allsorts.ex?).
> > The challenge to find faster sorts remain open. I hope to have
> > contributed
> > some ideas on this subject.
>
> <SNIP>
>
> Actually I have a question.  Are you simply trying to improve the speed
> of the sort?  Or trying to improve the performance of the shell sort
> only?
>
> Seems to me the best way to improve its speed is to use a different algo
> altogether.  Like a quicksort:
>
> function QuickSort( sequence Data )
> sequence Left
> sequence Right
> integer Pivot
> integer NextVal
>
> -- initialize
> Left = {}
> Right = {}
> Pivot = Data[ 1 ]
>
> -- partition both halves
> for ListIter = 2 to length(Data) do
> NextVal = Data[ ListIter ]
> if NextVal < Pivot then
> Left &= NextVal
> else
> Right &= NextVal
> end if
> end for
>
> -- sort partions
> if length(Left) > 1 then Left = QuickSort(Left) end if
> if length(Right) > 1 then Right = QuickSort(Right) end if
>
> -- join back
> return( Left & Pivot & Right )
> end function
>
>
> I have not tested this for speed but it should be very compariable to
> just about anything else out there...
>
>
>
>

new topic     » goto parent     » topic index » view message » categorize

3. Re: Challenge for Sort programs.

This is a multi-part message in MIME format.

------=_NextPart_000_0010_01C20CA5.3F1568A0
	charset="iso-8859-1"

Hi, Karl!
I tested your quick_sort, and found it gives the wrong result {1, 3, 2} for
input {3, 1, 2}. Obviously some fix has to be applied. I am attaching a
little program that allows full testing of sort routines.
Anyway, your quick_sort routine uses a method similar to the ones in
allsorts.ex (hybrid_sort, great_sort, best_sort), that are usually better
than standard Shell sort and standard quick_sort. Maybe these types of sort
should be the ones to be used to get more performance, but I was trying to
develop a "monolithic" sort with better performance than other monolithic
ones. Shell_sort using either insertion or bubble technique is, to my
understanding, this type of sort. Bubble technique proved (up to this point)
to be worse than insertion technique.
I was also trying that the new sort routines are "stable". "Stable" means
that, if transformed into an Index sorts as opposed to a Data sort, for
equal keys the original order is preserved. This is important if you sort
two "parallel" sequences by the key of one of them.
Regards.
----- Original Message -----
From: <kbochert at ix.netcom.com>
To: "EUforum" <EUforum at topica.com>
Sent: Tuesday, June 04, 2002 3:48 AM
Subject: RE: Challenge for Sort programs.



You wrote on 6/3/02 3:47:47 PM:

>
>Hi all:
>I was attempting to improve the performance of the Shell sort that comes
>with Euphoria.

<snip>

>Well, after this rant, you will find the code for all these sorts, plus two
>slightly improved versions of Bubble sort, and Selection sort, one of the
>basic ones (Rob: why is it not included in Allsorts.ex?).
>The challenge to find faster sorts remain open. I hope to have contributed
>some ideas on this subject.

This got me thinking about sorts and how the fastest of all sorts is the
partition sort, also known as quicksort. I wrote one for Euphoria, and
although
it does out-perform the others, its advantage is not as great as I might
have expected. I just wrote it with no concern for Euphoria optimizations,
so
there might be room for considerable improvement. I have also not tested it
on
a variety of data (of course). I am attaching the code
for the quicksort, a test harness and the code from your message.

Karl Bochert





------=_NextPart_000_0010_01C20CA5.3F1568A0
Content-Type: text/plain;
	name="test.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename="test.txt"

include genfunc.e

procedure test_sort(integer n)
    sequence x, s, set
    integer quant
    for size = 1 to n do
	quant = 1
	for j = 1 to size do
	    quant *= j
	end for
	set = IndexGenerator(size)
	for which = 1 to quant do
	    s = Permutation(set, size, which)
	    --if equal(s, {3, 1, 2}) then
		--trace(1)
	    --end if
	    x = q_sort(s)
	    if not equal(x, set) then
		puts(1, "Error\n")
		? s
		? x
		abort(1)
	    end if
	end for
    end for
end procedure

--trace(1)
test_sort(8)


------=_NextPart_000_0010_01C20CA5.3F1568A0--

new topic     » goto parent     » topic index » view message » categorize

4. Re: Challenge for Sort programs.

-------Phoenix-Boundary-07081998-

You wrote on 6/5/02 10:25:38 AM:

>
>Hi, Karl!
>I tested your quick_sort, and found it gives the wrong result {1, 3, 2} for
>input {3, 1, 2}. Obviously some fix has to be applied.

This sequence works fine. I suspect you are using the first version I
sent which did indeed have a bug. I have since sent a fixed version.

> I am attaching a
>little program that allows full testing of sort routines.
>Anyway, your quick_sort routine uses a method similar to the ones in
>allsorts.ex (hybrid_sort, great_sort, best_sort), that are usually better
>than standard Shell sort and standard quick_sort. Maybe these types of sort
>should be the ones to be used to get more performance, but I was trying to
>develop a "monolithic" sort with better performance than other monolithic
>ones. Shell_sort using either insertion or bubble technique is, to my
>understanding, this type of sort. Bubble technique proved (up to this 
>point) to be worse than insertion technique.

The general principle is that the speed of a sort routine depends on the
average distance that an element moves during an exchange. The quicksort
is very fast because it begins at opposite ends of the array to find
exchages as far apart as possible. Bubble, shell and insertion tend to
only move elements short distances and therefore must be slow.

I tested the sorts in allsorts.e.
Of all of them, great_sort clearly appeared to be the fastest,
by about 10% ( this makes it 10% faster thane sort() in the
Euphoria library).

I tested my qsort against great_sort, and it appeared much poorer
when using the test in allsorts.e. Using my 'short sequences'
test, it appeared competitive, ranging from about 5% slower on small
sequences to about 5% faster on large (above 10,000 elements).
The large difference between the tests is a mystery.

great_sort can be improved by 5-10% by setting 'hybrid_limit' to 12.

I rewrote qsort using lessons learned from great_sort, and the result
is noticably improved -- 10% faster than great_sort for 200 elements,
20% faster at 1600 elements. Still tests at half the speed of
great_sort on the integers test from allsorts though!

>I was also trying that the new sort routines are "stable". "Stable" means
>that, if transformed into an Index sorts as opposed to a Data sort, for
>equal keys the original order is preserved. This is important if you sort
>two "parallel" sequences by the key of one of them.

great_sort seems to NOT be 'stable'. I wonder if this quality is
incompatible with partition sorts=3F


-------Phoenix-Boundary-07081998---

new topic     » goto parent     » topic index » view message » categorize

5. Re: Challenge for Sort programs.

I know that bubble sort, insertion sort and others are not efficient, taking
O(n ^2) time as opposed to O(n * log(n)) time for the efficient ones like
Shell sort, quick sort, heap sort and merge sort, but in Shell sort you use
an inefficient technique in each step, finishing with a gap of 1 unit where
the inefficient sub-sort is applied, because these inefficient techniques
work well with nearly-sorted data. Of the inefficient techniques, the only
ones I know that benefit from nearly-sorted data are insertion sort and
bubble sort. Standard Shell sort uses insertion sort, so I tried instead
bubble sort with poorer results. I would not try for example selection sort,
because it is indifferent to the previous sorting state of the data.
Regards.
----- Original Message -----
From: <kbochert at ix.netcom.com>
To: "EUforum" <EUforum at topica.com>
Sent: Thursday, June 06, 2002 11:19 AM
Subject: Re: Challenge for Sort programs.



You wrote on 6/5/02 10:25:38 AM:

>
>Hi, Karl!
>I tested your quick_sort, and found it gives the wrong result {1, 3, 2} for
>input {3, 1, 2}. Obviously some fix has to be applied.

This sequence works fine. I suspect you are using the first version I
sent which did indeed have a bug. I have since sent a fixed version.

> I am attaching a
>little program that allows full testing of sort routines.
>Anyway, your quick_sort routine uses a method similar to the ones in
>allsorts.ex (hybrid_sort, great_sort, best_sort), that are usually better
>than standard Shell sort and standard quick_sort. Maybe these types of sort
>should be the ones to be used to get more performance, but I was trying to
>develop a "monolithic" sort with better performance than other monolithic
>ones. Shell_sort using either insertion or bubble technique is, to my
>understanding, this type of sort. Bubble technique proved (up to this
>point) to be worse than insertion technique.

The general principle is that the speed of a sort routine depends on the
average distance that an element moves during an exchange. The quicksort
is very fast because it begins at opposite ends of the array to find
exchages as far apart as possible. Bubble, shell and insertion tend to
only move elements short distances and therefore must be slow.

I tested the sorts in allsorts.e.
Of all of them, great_sort clearly appeared to be the fastest,
by about 10% ( this makes it 10% faster thane sort() in the
Euphoria library).

I tested my qsort against great_sort, and it appeared much poorer
when using the test in allsorts.e. Using my 'short sequences'
test, it appeared competitive, ranging from about 5% slower on small
sequences to about 5% faster on large (above 10,000 elements).
The large difference between the tests is a mystery.

great_sort can be improved by 5-10% by setting 'hybrid_limit' to 12.

I rewrote qsort using lessons learned from great_sort, and the result
is noticably improved -- 10% faster than great_sort for 200 elements,
20% faster at 1600 elements. Still tests at half the speed of
great_sort on the integers test from allsorts though!

>I was also trying that the new sort routines are "stable". "Stable" means
>that, if transformed into an Index sorts as opposed to a Data sort, for
>equal keys the original order is preserved. This is important if you sort
>two "parallel" sequences by the key of one of them.

great_sort seems to NOT be 'stable'. I wonder if this quality is
incompatible with partition sorts?

new topic     » goto parent     » topic index » view message » categorize

6. Re: Challenge for Sort programs.

Hi Sorters,

won't enter in this competition.
Only real fast sort know to me :
binary sort.
Will be possible on whatever data
for AFAIK
why do other sort-algorithms then?
IF
  taking simplest and fastest approach
THEN
  got more time to do other things
ENDIF

a@t

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu