1. RE: Nat_sort problems

Tony Steward wrote:
> 
> 
> Hello all,
> When I use nat_sort on the following sequence My program crashes with 
> this error
> 
> C:\EUPHORIA\ExInclude\\\\nat_sort.e:171 in function nat_compare() 
> type_check failure, x2num is 1234567890 
> 

The number simply got too big -- outside integer range.  I never 
considered that.  Just change the declarations for "x1num" & "x2num" to 
atom (from integer) in nat_compare() and it should take care of it.  
Change them also in nat_compare_str().

Also, a reminder that nat_sort_str() is faster than plain nat_sort() if 
you know before-hand that all elements to be sorted are character 
strings (which is the usual case for most apps).


-- Andy

new topic     » topic index » view message » categorize

2. RE: Nat_sort problems

Derek Parnell wrote:
> 
> 
> Tony,
> changing x2num, x1num to integers doesn't work either because atoms 
> can't hold very large numbers either. 
> 
> Another technique will be required for this problem. 
> 
> There are big-number libraries available so you might have to use these 
> but I suspect a different approach to the natural sort algorithm might 
> be be faster. Just not sure yet.
> 

How big are these numbers gonna be?  nat_sort.e is made for sorting 
filenames & such, not doing string math.  It ought to work for anything 
within atom range (after redeclaring x1num & x2num as atoms), although 
it won't work with representations (in the original string) like "1e20". 
 However, it should work fine with strings like "100000000000000000000". 
 The original version just left the strings as strings and padded the 
shorter one with leading zeros for comparison, but it was much slower, 
at least the way I did it back then...

new topic     » goto parent     » topic index » view message » categorize

3. RE: Nat_sort problems

Derek Parnell wrote:
> 
> 
> On Mon, 11 Aug 2003 02:07:43 +0100 (08/11/03 11:07:43)
> , Pete Lomax <petelomax at blueyonder.co.uk> wrote:
> 
> >
> > On Sat,  9 Aug 2003 12:09:03 +0000, Tony Steward
> > <tsteward at dodo.com.au> wrote:
> >
> >> Hello all,
> >> When I use nat_sort on the following sequence My program crashes with 
> >> this error
> >>
> >> C:\EUPHORIA\ExInclude\\\\nat_sort.e:171 in function nat_compare() 
> >> type_check failure, x2num is 1234567890 Below is the actual data order 
> >> contains.
> >>
> >> order = {"104/RING",
> >> 		"123456789012345",
> > ^ there lies the root.
> > Change the definition of x1num and x2num to atom, it works fine.
> > That's just a quick fix, though:
> 
> Changing it to an atom doesn't always work. Sure it no longer crashes, 
> but 
> it doesn't aways sort correctly either.
> 
> For example, "123456789012345" and "123456789012399" sort to the same 
> location.
> 

They do?  Works fine with my system.

? nat_compare("123456789012345","123456789012345")
? nat_compare("123456789012399","123456789012399")
? nat_compare("123456789012399","123456789012345")
? nat_compare("123456789012345","123456789012399")

Output:

0
0
1
-1


? nat_compare_str("123456789012345","123456789012345")
? nat_compare_str("123456789012399","123456789012399")
? nat_compare_str("123456789012399","123456789012345")
? nat_compare_str("123456789012345","123456789012399")

0
0
1
-1

All correct.  Sorting of those strings is correct also.


Maybe I fixed a bug and forget to send an update?  If you have a 
specific sequence that doesn't seem to be working, let's see it.  Here 
is my complete version of nat_sort.e (after changing x1num & x2num to 
atoms):


---------------------------------------------------------------------

-----------------------------------------------------------------------
-- <Release notes snipped>

include sort.e

-- compares any two objects using the natural order algorithm
global function nat_compare(object x1, object x2)
integer x1pt, x2pt, x1len, x2len, number, comp, string
atom x1num, x2num
object x1char, x2char

    if sequence(x1) and sequence(x2) then

        x1len = length(x1)
        x2len = length(x2)

        -- if both are strings
        -- inline string type-checking for speed
        string = 1
        for i = 1 to x1len do
            x1char = x1[i]
            if not integer(x1char)
            or x1char > 255
            or (x1char < ' ' and not find(x1char,"\t\n\r")) then
                -- not a string
                string = 0
                exit
            end if
        end for

        if string then
            string = 1
            for i = 1 to x2len do
                x2char = x2[i]
                if not integer(x2char)
                or x2char > 255
                or (x2char < ' ' and not find(x2char,"\t\n\r")) then
                    -- not a string
                    string = 0
                    exit
                end if
            end for
        end if

        if not string then
            -- recurse
            for i = 1 to x1len do
                if i > x2len then
                    return 1
                end if
                comp = nat_compare(x1[i],x2[i])
                if comp != 0 then
                    return comp
                end if
            end for
            return compare(x1len,x2len)
        end if

        -- both are strings
        x1pt = 1
        x2pt = 1

        -- number flag
        number = 0

        while x1pt <= x1len do

            if number then
                -- build numbers
                x1num = x1char - '0'
                while x1pt < x1len do
                    x1pt += 1
                    x1char = x1[x1pt]
                    if (x1char >= '0' and x1char <= '9') then
                        x1num = (x1num * 10) + (x1char - '0')
                    else
                        x1pt -= 1
                        exit
                    end if
                end while

                x2num = x2char - '0'
                while x2pt < x2len do
                    x2pt += 1
                    x2char = x2[x2pt]
                    if (x2char >= '0' and x2char <= '9') then
                        x2num = (x2num * 10) + (x2char - '0')
                    else
                        x2pt -= 1
                        exit
                    end if
                end while

                -- compare as atoms
                comp = compare(x1num,x2num)
                if comp != 0 then
                    return comp
                end if

                -- still equal, keep going
                number = 0
                x1pt += 1
                x2pt += 1

            else

                if x2pt > x2len then
                    -- equal up to point x2 stops
                    -- x1 is longer, therefore greater
                    return 1
                end if

                x1char = x1[x1pt]
                x2char = x2[x2pt]

                if
                (x1char >= '0' and x1char <= '9')
                and
                (x2char >= '0' and x2char <= '9')
                then
                    -- first digit of number found, set flag
                    number = 1
                else
                    comp = compare(x1char,x2char)
                    if comp != 0 then
                        return comp
                    end if
                    x1pt += 1
                    x2pt += 1
                end if
            end if
        end while
        -- equal up to point x1 stops, longer (if either) is greater
        return compare(x1len,x2len)
    else
        -- one or both top-level elements are atoms
        -- compare normally
        return compare(x1,x2)
    end if

end function




-- compares two character strings using natural ordering
-- (eliminates string type checking, recursion, etc. = faster)
global function nat_compare_str(sequence x1, sequence x2)
integer x1pt, x2pt, x1len, x2len, number, comp
atom x1num, x2num
object x1char, x2char

    x1len = length(x1)
    x2len = length(x2)

    x1pt = 1
    x2pt = 1

    -- number flag
    number = 0

    while x1pt <= x1len do

        if number then
            -- build numbers
            x1num = x1char - '0'
            while x1pt < x1len do
                x1pt += 1
                x1char = x1[x1pt]
                if (x1char >= '0' and x1char <= '9') then
                    x1num = (x1num * 10) + (x1char - '0')
                else
                    x1pt -= 1
                    exit
                end if
            end while

            x2num = x2char - '0'
            while x2pt < x2len do
                x2pt += 1
                x2char = x2[x2pt]
                if (x2char >= '0' and x2char <= '9') then
                    x2num = (x2num * 10) + (x2char - '0')
                else
                    x2pt -= 1
                    exit
                end if
            end while

            -- compare as atoms
            comp = compare(x1num,x2num)
            if comp != 0 then
                return comp
            end if

            -- still equal, keep going
            number = 0
            x1pt += 1
            x2pt += 1

        else

            if x2pt > x2len then
                -- equal up to point x2 stops
                -- x1 is longer, therefore greater
                return 1
            end if

            x1char = x1[x1pt]
            x2char = x2[x2pt]

            if
            (x1char >= '0' and x1char <= '9')
            and
            (x2char >= '0' and x2char <= '9')
            then
                -- first digit of number found, set flag
                number = 1
            else
                comp = compare(x1char,x2char)
                if comp != 0 then
                    return comp
                end if
                x1pt += 1
                x2pt += 1
            end if
        end if
    end while
    -- equal up to point x1 stops, longer (if either) is greater
    return compare(x1len,x2len)
end function


-- for sorting any sequence of any depth
global function nat_sort(sequence s)
    return custom_sort(routine_id("nat_compare"),s)
end function

-- for sorting sequences where all elements are character strings
global function nat_sort_str(sequence s)
    return custom_sort(routine_id("nat_compare_str"),s)
end function

----------------------------------------------------------------------

new topic     » goto parent     » topic index » view message » categorize

4. RE: Nat_sort problems

> 
> You are correct Andy. I made a mistake when I did a quick test. I 
> displayed the results incorrectly without looking at it very closely.
> 
> Sorry about that. 
> 

While we are talking about nat_sort, I offer the following very slight 
optimization:

Declare a constant at the top of the file:

constant WHITESPACE = "\t\n\r"

and then replace the two instances of "\t\n\r" in nat_compare() with 
WHITESPACE.  I believe Rob once mentioned the fact the such inline use 
of sequences is not free -- they have to be recreated each time -- so if 
that is true that ought to speed up the string-checking by a very tiny 
amount.  (String-checking is mainly why nat_compare() is slower than 
nat_compare_str().)

-- Andy

new topic     » goto parent     » topic index » view message » categorize

5. RE: Nat_sort problems

Robert Craig wrote:
> 
> 
> Andy Serpa wrote:
> > While we are talking about nat_sort, I offer the following very slight 
> > optimization:
> > 
> > Declare a constant at the top of the file:
> > 
> > constant WHITESPACE = "\t\n\r"
> > 
> > and then replace the two instances of "\t\n\r" in nat_compare() with 
> > WHITESPACE.  I believe Rob once mentioned the fact the such inline use 
> > of sequences is not free -- they have to be recreated each time ...
> 
> That doesn't apply to strings "...", just more general
> sequences that you construct with {....}
> 
> String sequences are created only once,
> when the program is parsed (loaded into memory).
> After that, a pointer refers to the sequence.
> 
> Of course from a readability/maintainability point of view
> using a named constant is better than repeating the same
> value in many places. You might also save memory and
> improve the hit rate of the data cache,
> but now we're getting into really trivial issues.
> 

Aha... is is actually the string designation (use of quotes) that makes 
the difference, or just any "byte-sized" sequence of integers?  When you 
first mentioned it, it was to do with peek(), using {addr,len} where 
you've got a 32-bit address and an integer together...

new topic     » goto parent     » topic index » view message » categorize

6. RE: Nat_sort problems

Robert Craig wrote:
> 
> 
> Andy Serpa wrote:
> > Aha... is is actually the string designation (use of quotes) that makes 
> > the difference, or just any "byte-sized" sequence of integers?  When you 
> > 
> > first mentioned it, it was to do with peek(), using {addr,len} where 
> > you've got a 32-bit address and an integer together...
> 
> At the moment, I save a single copy of each string sequence "..."
> at parse time, but I never save a copy of {...} expressions
> - they are recreated each time. Pete Lomax was right.
> I could do the same for {...} expressions where each element
> is a constant value, but I don't. I may do it in the future.
> 

Not complaining... just like to know all the tricks...

new topic     » goto parent     » topic index » view message » categorize

7. RE: Nat_sort problems

Hello all,
Thanks for all the input. I had changed x2num and x1num to atoms before
posting my problem, only I didn't realize they were re-declared in a
second function.

I have now made them atoms in both functions and all is well, thankyou.

I never expected one of my customers to enter data in this way, and it
will probably never happen again (I hope).

So problem solved

Thanks
Tony Steward

> -----Original Message-----
> From: Andy Serpa [mailto:ac at onehorseshy.com]
> Sent: Monday, 11 August 2003 5:25 PM
> To: EUforum
> Subject: RE: Nat_sort problems
> 
> 
> Derek Parnell wrote:
> >
> >
> > Tony,
> > changing x2num, x1num to integers doesn't work either because atoms
> > can't hold very large numbers either.
> >
> > Another technique will be required for this problem.
> >
> > There are big-number libraries available so you might have to use
these
> > but I suspect a different approach to the natural sort algorithm
might
> > be be faster. Just not sure yet.
> >
> 
> How big are these numbers gonna be?  nat_sort.e is made for sorting
> filenames & such, not doing string math.  It ought to work for
anything
> within atom range (after redeclaring x1num & x2num as atoms), although
> it won't work with representations (in the original string) like
"1e20".
>  However, it should work fine with strings like
"100000000000000000000".
>  The original version just left the strings as strings and padded the
> shorter one with leading zeros for comparison, but it was much slower,
> at least the way I did it back then...
> 
> --^----------------------------------------------------------------
> This email was sent to: tsteward at dodo.com.au
> 
> 
> TOPICA - Start your own email discussion group. FREE!
>

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu