Re: Natural sort comparison
- Posted by petelomax Jul 21, 2017
- 2119 views
My preferred sort API might be (this is all draft, there may be niggles)
sequence res = sort(sequence s, sequence options={})
where s is the data to be sorted and options is an even-length sequence with odd elements being recognised literal strings and even elements actual settings, eg
?sort(s,{"direction",-1,"compare",routine_id("my_compare")})
The available options would include at least:
even element | odd element |
---|---|
"compare" | routine_id("my_compare") where my_compare is a function that accepts two elements of s |
"direction" | +/-1 (look ma, no names to argue over) |
"preproc" | routine_id("my_preprocessor"), a function that accepts (idx, si), or a column set as per "return" |
"return" | -2, or {0..length(s[i])} ie: all, or a set of columns (aka retset) |
As noted by Mike, "preproc" could be significantly more efficient than "compare"; it would be an error to have both.
In fact we might drop "compare" and only offer "preproc", and I now think we should.
This interface could/should effectively make custom_sort obsolete (albeit fiddly to convert compare->preproc).
I would use -2 to indicate "preproc not set", rather than -1 which could conflict with/hide errors from routine_id failures.
A preproc/retset element of 0 means "index".
I would also make it thread-safe, meaning you cannot have the file-level SortType/Column/Keys/CompareRtn.
The innards might look a bit like this:
keys = s if preproc!=-2 then for i=1 to length(keys) do if integer(preproc) then keys[i] = call_func(preproc,{i,keys[i]}) else keys[i] = get_columns(keys[i],i,preproc) end if end for end if keys = actual_sort(keys) if retset!=-2 then for i=1 to length(keys) do keys[i] = get_columns(keys[i],i,retset) end for end if return keys -- **NB**
Note that if [parts of] the original s need to be returned, preproc must include them in keys, and "return" specified to extract them from and throw away the other working storage that formed part of keys[i].
A stable sort could be achieved by incorporating the index into keys.
I haven't attempted any of this at all, just a few ideas.
Pete