1. sort constants confusing

http://openeuphoria.org/docs/std_sort.html#_2964_sorting

I could not work out why there were 4 sort options. It reads, at best, like you have to use ASCENDING for sort() and NORMAL_ORDER for custom_sort().

As soon as I looked at the source, the penny dropped instantly.

The documentation would be much clearer if it said...

The constants NORMAL_ORDER and REVERSE_ORDER are just aliases for ASCENDING and DESCENDING respectively, and behave identically.

Better yet, just bin two of them (I don't much care which).

Pete

new topic     » topic index » view message » categorize

2. Re: sort constants confusing

Agreed. Why they exist is a mystery. In any case I would have expected normal/abnormal or forward/reverse pairs instead.

new topic     » goto parent     » topic index » view message » categorize

3. Re: sort constants confusing

DerekParnell said...

Agreed. Why they exist is a mystery. In any case I would have expected normal/abnormal or forward/reverse pairs instead.

Stranger still is this sheepish note in the documentation for sort(), custom_sort() and sort_columns():

"This function uses the "Shell" sort algorithm. This sort is not "stable", i.e. elements that are considered equal might change position relative to each other"

Really? Does \std still have unstable sort routines after all these years or was it fixed and someone just needs to update the documentation? I thought we had thrashed this out in 2014? In fact, it was indicated as far back as 2011 but nobody did anything about it.

Spock

new topic     » goto parent     » topic index » view message » categorize

4. Re: sort constants confusing

petelomax said...

I could not work out why there were 4 sort options. It reads, at best, like you have to use ASCENDING for sort() and NORMAL_ORDER for custom_sort().

As soon as I looked at the source, the penny dropped instantly.

The constants NORMAL_ORDER and REVERSE_ORDER are just aliases for ASCENDING and DESCENDING respectively, and behave identically.

DerekParnell said...

Agreed. Why they exist is a mystery. In any case I would have expected normal/abnormal or forward/reverse pairs instead.

I traced the history of it.

ASCENDING/DESCENDING was CChris's idea - before that the code only sorted in ascending - http://scm.openeuphoria.org/hg/euphoria/annotate/d63e0d759325/include/std/sort.e#16

But he didn't invent the term - the initial commit for the file has the term 'ascending' in it - http://scm.openeuphoria.org/hg/euphoria/annotate/076e4e3c0f12/include/std/sort.e

I checked my copy of sort.e from RDS Eu 2.3 and it uses the same term. So ASCENDING goes back to RobCraig.

Derek was the one who added the custom_sort() aliases - and the misleading comments to go with it - http://scm.openeuphoria.org/hg/euphoria/annotate/1debb13c40cf/include/std/sort.e#24

Derek, why did you add a normal/reverse pair instead of your expected forward/reverse pair ?

new topic     » goto parent     » topic index » view message » categorize

5. Re: sort constants confusing

jimcbrown said...

Derek, why did you add a normal/reverse pair instead of your expected forward/reverse pair ?

No idea now. Probably stupidity but that is not the salient lesson here, in my opinion.

The better question would be to ask why did such poor code get to stay in the distribution? I suspect that the answer to that relates to the lack of peer review before release packaging.

new topic     » goto parent     » topic index » view message » categorize

6. Re: sort constants confusing

suggested current
rising ascending, normal_order
falling descending, reverse_order

Can we standardize on rising and falling? These are the simplest English words I can suggest.

_tom

new topic     » goto parent     » topic index » view message » categorize

7. Re: sort constants confusing

_tom said...

Can we standardize on rising and falling?

Let's not. There is nothing whatsoever wrong with any of the terms/constant names as given, the problem is just the implication that they are somehow different.

Having four constants implies (to my tiny brain) that there are four separate meanings, when in fact there are only two, which is as clear as can be in the source, but not in the documentation.

It is fair to say that someone would only ever notice this if they had not just looked at the source file.

Assuming that it is not practical to change eu2doc (or whatever it is called) to stop it from ripping out the "=" parts,
I think the simplest solution would just be to change, in std/sort.e itself:

--** The normal sort order used by the custom comparison routine. 
NORMAL_ORDER = ASCENDING, 

to

--** The normal sort order used by the custom comparison routine. (same as ASCENDING) 
NORMAL_ORDER = ASCENDING, 

and

--** Reverses the sense of the order returned by a custom comparison routine. 
REVERSE_ORDER = DESCENDING 

to

--** Reverses the sense of the order returned by a custom comparison routine. (same as DESCENDING) 
REVERSE_ORDER = DESCENDING 

I retract my suggestion to "bin two of em"

Pete

new topic     » goto parent     » topic index » view message » categorize

8. Re: sort constants confusing

Simplify and add lightness. For an even smaller brain, ascending is more complex then rising; and such with descending vs falling. Nothing wrong with a total rewrite and rethink of the library.

We must include msort.e http://www.rapideuphoria.com/msort.zip in the standard library; possibly as a separate .e file.

Then, a clean way to present:

  • qsort (quick an dirty unstable sort)
  • insertion sort; tree/heap management (to maintain a sorted list)
  • column sorting
  • ????

I just downloaded all of the sort files from the Archive; this will take time to "sort out."

_tom

new topic     » goto parent     » topic index » view message » categorize

9. Re: sort constants confusing

_tom said...

I just downloaded all of the sort files from the Archive; this will take time to "sort out."

From March this year: http://rosettacode.org/wiki/Compare_sorting_algorithms%27_performance#Phix

new topic     » goto parent     » topic index » view message » categorize

10. Re: sort constants confusing

Thanks Pete, now it will take even longer to "sort out."

There is another dimension to this stuff where an "unfashionable" algorithm is better on a small data-set (or some specific dataset) than a "fashionable" algorithm.

_tom

new topic     » goto parent     » topic index » view message » categorize

11. Re: sort constants confusing

_tom said...

Thanks Pete, now it will take even longer to "sort out."

There is another dimension to this stuff where an "unfashionable" algorithm is better on a small data-set (or some specific dataset) than a "fashionable" algorithm.

_tom

The central routine in msort.zip first checks the size of the data set and then selects one of two options:

n < 200 (call fast insertion sort)
else (call merge sort supplemented with fast insertion sort)

So whatever n there is the sort performance is always good, always stable. And without the stability property sorting by column is not guaranteed to be correct - an important goal in the real-world. Quicksort and Shellsort [Edit: Heapsort too] are not naturally stable sorts.

It would be interesting to compare the performance of the Phix code on Rosetta versus msort but.. I couldn't get the former to work (something about blah blah Visual .. 2013 blah blah) so I gave up.

Spock

new topic     » goto parent     » topic index » view message » categorize

12. Re: sort constants confusing

Spock said...

I couldn't get the former to work (something about blah blah Visual .. 2013 blah blah) so I gave up.

Yup. That would be msvcr120.dll not found, needed to run IUP. Sorry about that, not much I can do better.
It does at least suggest https://www.microsoft.com/en-us/download/details.aspx?id=40784 (a 6MB download)

Pete

new topic     » goto parent     » topic index » view message » categorize

13. Re: sort constants confusing

petelomax said...
Spock said...

I couldn't get the former to work (something about blah blah Visual .. 2013 blah blah) so I gave up.

Yup. That would be msvcr120.dll not found, needed to run IUP. Sorry about that, not much I can do better.
It does at least suggest https://www.microsoft.com/en-us/download/details.aspx?id=40784 (a 6MB download)

Pete

I installed it but the demo is still not running. Copying the dll to the various sub-folders of Phix didn't work either. sad

Spock

new topic     » goto parent     » topic index » view message » categorize

14. Re: sort constants confusing

_tom said...
suggested current
rising ascending, normal_order
falling descending, reverse_order

Can we standardize on rising and falling? These are the simplest English words I can suggest.


I hope not. I've been giving this a bit more thought and I'm starting to think that normal / reverse are actually good choices.

In any given language, there is a collating order for characters (graphemes) that is accepted as the 'normal' way to do it, and the most common variation to the normal sequence is to 'reverse' it.

new topic     » goto parent     » topic index » view message » categorize

15. Re: sort constants confusing

DerekParnell said...
_tom said...
suggested current
rising ascending, normal_order
falling descending, reverse_order

Can we standardize on rising and falling? These are the simplest English words I can suggest.


I hope not. I've been giving this a bit more thought and I'm starting to think that normal / reverse are actually good choices.

In any given language, there is a collating order for characters (graphemes) that is accepted as the 'normal' way to do it, and the most common variation to the normal sequence is to 'reverse' it.

OK.

We can standardize on NATURAL and REVERSE; the other choices will be there just there to help define "natural." We naturally write from left to right, hence small to large. But, we are accustomed to writing numbers themselves in their "natural" Arabic form!

  • Output is in natural order which is "sorted from small to large (ascending or rising)."
  • The optional output is reverse order which is "sorted from large to small (descending or falling)."

_tom

new topic     » goto parent     » topic index » view message » categorize

16. Re: sort constants confusing

_tom said...
DerekParnell said...
_tom said...
suggested current suggested current
rising ascending, normal_order [quote _tom] [quote DerekParnell] [quote _tom] rising
ascending, normal_order
falling
descending, reverse_order

Can we standardize on rising and falling? These are the simplest English words I can suggest. [/quote]


I hope not. I've been giving this a bit more thought and I'm starting to think that normal / reverse are actually good choices.

In any given language, there is a collating order for characters (graphemes) that is accepted as the 'normal' way to do it, and the most common variation to the normal sequence is to 'reverse' it.

[/quote]

OK.

We can standardize on NATURAL and REVERSE; the other choices will be there just there to help define "natural." We naturally write from left to right, hence small to large. But, we are accustomed to writing numbers themselves in their "natural" Arabic form!

  • Output is in natural order which is "sorted from small to large (ascending or rising)."
  • The optional output is reverse order which is "sorted from large to small (descending or falling)."

_tom

[/quote]

falling descending, reverse_order

Can we standardize on rising and falling? These are the simplest English words I can suggest.


I hope not. I've been giving this a bit more thought and I'm starting to think that normal / reverse are actually good choices.

In any given language, there is a collating order for characters (graphemes) that is accepted as the 'normal' way to do it, and the most common variation to the normal sequence is to 'reverse' it.

OK.

We can standardize on NATURAL and REVERSE; the other choices will be there just there to help define "natural." We naturally write from left to right, hence small to large. But, we are accustomed to writing numbers themselves in their "natural" Arabic form!

  • Output is in natural order which is "sorted from small to large (ascending or rising)."
  • The optional output is reverse order which is "sorted from large to small (descending or falling)."

_tom

natural is not a good choice. usually refers to a sort in numeric order try sorting 1, 2, 10, 20 you can't do it.

the normal ASCHII order is 1,10,2,20 there is a natural sort in the archive

https://en.wikipedia.org/wiki/Natural_sort_order

new topic     » goto parent     » topic index » view message » categorize

17. Re: sort constants confusing

DerekParnell said...
_tom said...

Can we standardize on rising and falling? These are the simplest English words I can suggest.

In any given language, there is a collating order for characters (graphemes) that is accepted as the 'normal' way to do it, and the most common variation to the normal sequence is to 'reverse' it.

True. The normal order for sorting is ascending. Sort({3,1,5,2,4}) ought to result in {1,2,3,4,5} without any need to explicitly specify the order.

The algorithm(s) in msort.zip all produce ascending order results. To have it descending I simply reverse the result, eg:

s = reverse( sort(s, n) ) -- n is column ['coz I have lots of column sorting processes] 

However, because of the stability principle, this is not strictly the same as, eg:

s = sort(s, n, REVERSE) 

Nevertheless, YAGNI and Occam's razor

So in the end that would lead to, eg:

s = sort(s) 
s = sort_reverse(s) 

An optional parameter could give the desired column sorting.

Spock

new topic     » goto parent     » topic index » view message » categorize

18. Re: sort constants confusing

I like the following forms of the sort command:

a=sort(b, UP) 
b=sort(a, DN) -- or DOWN 
a=sort(b, TO_MAX) 
b=sort(a, TO_MIN) 

They are just self explained, IMHO.

Regards,


kinz

new topic     » goto parent     » topic index » view message » categorize

19. Re: sort constants confusing

Base on the input from above...

My suggestion:

  • leave the std/sort.e alone for compatibility reasons
    eventually deprecate it?
  • in the demo folder: provide a set of sort algorithms (from allsorts and rosetta code) for those that have special requirements
  • for stable sorting add: http://www.rapideuphoria.com/msort.zip from Mike
  • for natural sorting add: http://www.rapideuphoria.com/nat_sort.zip from Andy Serpa
  • The new sorting.e library will work on OE and Phix.

My working sorting definitions are:

stable

  • as found in msort but not in std/sort.e

rank

  • as output by the compare function (larger or smaller) or custom_compare

direction

  • UP or DN (default small to large )

order

  • the resulting sorted sequence
    • default is numerical (mathematical number line)
    • ASCII order for historical reasons
    • unicode order is what UTF8 gives you
    • natural order (as in nat_sort.zip)
    • user defined for things like sorting playing cards

_tom

new topic     » goto parent     » topic index » view message » categorize

20. Re: sort constants confusing

_tom said...

Base on the input from above...

My suggestion: .. for stable sorting add: http://www.rapideuphoria.com/msort.zip from Mike

Msort is also highly performant.

Spock

new topic     » goto parent     » topic index » view message » categorize

21. Re: sort constants confusing

A demo of the new sorting include is: http://openeuphoria.org/pastey/293.wc

It provides stable sorting, column sorting, index sorting, along with number and natural order options. All sorts are in the up direction.

After changing a few identifiers this works on OE and Phix.

_tom


Forked into: Natural sort comparison

new topic     » goto parent     » topic index » view message » categorize

22. Re: sort constants confusing

Tom wrote:

_tom said...

My working sorting definitions are:

Let's see.

_tom said...

stable

  • as found in msort but not in std/sort.e

"Stable" doesn't seem to be one of the definitions at all.
Unstable is just bad. So "stable" is good? Or what?
So, maybe, msort is just one of the specific sortings?
And requires more specific definition?

_tom said...

rank

  • as output by the compare function (larger or smaller) or custom_compare

Try to find rank in The Archive of RDS.
I found just "Frank Dowling".
Just 2 entries.
So, rank is not the Euphoria language word at all.
BTW, for example, I am the retired captain of the second rank
of Russian Navy, former submariner. Maybe....??? smile

_tom said...

direction

  • UP or DN (default small to large )

In classic EU (earlier 4), it is ORDER,
not direction. Direction is extra parameter.
Why?
EU works with numbers and only with numbers
and their values.
ERM reads:
"There aren't really any "characters" in Euphoria, just numbers (atoms)."

So, EU has just 2 kinds of sortings:
usual (just for numbers)
and
special (or specific) for the user defined types
(strings encoded as ascii, utf, koi8-r, unicode, etc etc).

Main EU sorting is usual, just for numbers, in two orders,
just "up" and "down".

_tom said...

order

  • the resulting sorted sequence
    • default is numerical (mathematical number line)
    • ASCII order for historical reasons
    • unicode order is what UTF8 gives you
    • natural order (as in nat_sort.zip)
    • user defined for things like sorting playing cards

No. After "default", i.e. usual, these are specific sortings, not orders.

Regards


kinz

new topic     » goto parent     » topic index » view message » categorize

23. Re: sort constants confusing

The English language is not helping at all. We are using words with fuzzy meanings. The word "natural" seems to be a particular problem when used to describe sorting.

The solution is going to be that a few words will be chosen and then defined for use within Euphoria--ignoring the use (and misuse) of the same words elsewhere.

The idea of "stable sorting" is important. Download http://www.rapideuphoria.com/msort.zip. Spock has explained "stable" clearly in the docs that come with Mike Sort.

Here is another idea for your consideration:

To sort values is "to order values into numerical order from small to large."

To collate values is "to order values into a logical order from small to large."

  • We sort numbers

There are no arguments as to what number value comes next. Some people call this "natural order". Sorting "up" from small to large is also "natural."

the eu:compare function will sort number values without problem

  • we collate alphanumeric values

Collated data does not have to be in number order. When ascii encoded characters are sorted in numerical order the alphabetic order may not agree with everyone's idea of the correct order. Thus we want a "custom sort" that puts characters, into what is for characters, a "natural order". Some call this "lexical order" to distinguish it from number sorting.

We need a custom algorithm to collate values. Collating must consider ascii and utf encoding for characters.

The problem is if I just use "natural" you can not be sure what I mean unless you look at a definition I create for exclusive use in OE.

_tom

new topic     » goto parent     » topic index » view message » categorize

24. Re: sort constants confusing

_tom said...

The English language is not helping at all...

Yes, Russian is much better, but English is not that bad too.
Do not try to define stable, natural, logical, ascii etc orders.
We have just two orders - UP and DOWN (DN), period.
Instead, describe the sortings using as many words as you need.
And dont worry be happy smile


Regards


kinz

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu