Re: Phix:Sort Columns Not Working Like Expected

new topic     » goto parent     » topic index » view thread      » older message » newer message
Spock said...

I downloaded the latest version of Phix to examine the sort routines. It appears that sort_columns() is a wrapper for custom_sort() which uses Shellsort which is acknowledged in the source as being unstable.

For Orac I dumped RDS' Shellsort in favor of a Merge/Insertion sort hybrid (See Timsort) to guarantee stability. Without stability it is risky to do multi-column sorting as the result might not always be 100% correct. For the business programs used in our office correctness is absolutely essential.

Given the popularity of Phix I recommend the same or equivalent be done to it. The above hybrid is reasonably performant.

Spock

I think the best thing we've got so far in terms of stable sorts is pihism_sort(): https://openeuphoria.org/forum/131271.wc#131305 (also part of demo/rosetta/CompareSortingAlgorithms.exw)

I'm open to offers, but not about to embark on anything regarding stable sorting on me lonesome.

Pete

PS: To summarise that drdobbs link, if you have say {x,y} of {{3,4},{4,4},{5,4}} and sort on x,y all is fine. Likewise if you sort on y,x. However (what he did not explain very clearly) if you sort on x, then entirely after the fact re-sort the whole thing again on (just) y, there is an expectation the x-ordering would be preserved. I will concede my shrug of "well sort it by y,x on that second pass then" is fine for unbreakable business logic rules, but doesn't really hold when it is more of a GUI-interaction expectation, for instance. Note however that you can always make a tagsort stable by using the item indexes at the end of a custom comparator, and I have just changed the (private to sort.e) standard routine for that as follows:

function tagsort(integer i, integer j, sequence data) 
-- standard function for a standard tagsort 
--  return compare(data[i],data[j]) 
    integer c = compare(data[i],data[j]) 
    if c=0 then c = compare(i,j) end if 
    return c 
end function 

Also, I am not convinced that the return 0 at the end of column_compare() in sort.e does not make sort_columns() stable anyway: at least I cannot reproduce an issue along the lines of my understanding as just outlined above.

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu