Re: data analysis

new topic     » goto parent     » topic index » view thread      » older message » newer message

On 17 Jan 2001, at 14:02, Kat wrote:

> On 17 Jan 2001, at 0:55, Kat wrote:
>
> > David and Graeme, all i can say is wow, and thanks. David's presentation of
> > the data is going to be easier to use first, but in the following test,
> > David's didn't resync properly, and i think cause it is it trying to resync
> > always on s2. In Sweigsdunka vs Zweigsdanka, it sync'd and  found "weigsd",
> > but then didn't resync properly on the "nka". It took the first 'a' from
> > Zweigsdanka and went looking for it in Sweigsdunka, not finding it untill
> > the end of the word. It thereby missed the common "nka". Swaping the words
> > around didn't help David's code's results at all, but messed up Graeme's
> > code's results in a new way. Is it possible to force which word is the
> > primary sync in your code, David, in a way i can spec while it's running?
> > Mostly, i'd be looking for the result with the fewest number of differences.
> > /me is still studying the code....
>
> Changing MaxGap to 2 made it resync faster, but i'm not sure yet that
> passing maxgap to diff() along with the words is the right answer yet..

So i passed both min and max gap to it, and diff() returns a list of unique
results, this way an analysis of the returns gives max results without too
much of an info overload. I still don't see how to change the MaxGap in the
middle of a word, without causing problems, such as making the passed
specs and results more complex, and maybe this is not needed, so i'll put
it on the back burner for now. I found the range of MinG= 2 to MaxG = 6
works best in the following code on the words i tested:

-- almost entirely David Cuny's code


global function diff( sequence s1, sequence s2, integer MinG, integer
MaxG )

    integer at1, at2, sync1, sync2
    sequence result, bigresult

    bigresult = ""


for MaxGap = MinG to MaxG do
    result = ""
    at1 = 0
    at2 = 0

    -- process until the end of one string
    while 1 do

        -- move ahead
        at1 += 1
        at2 += 1

        -- past end of one string?
        if at1 > length( s1 )
        or at2 > length( s2 ) then
            exit
        end if

        -- same?
        if s1[at1] = s2[at2] then
            result &= s1[at1]
        else
            -- attempt to resync
            while 1 do

            -- find closest sync point
                 sync2 = find( s1[at1], s2[at2..length(s2)] )

                -- too far?
                  if sync2 > 0 and sync2 < MaxGap then
                    sync2 += at2 - 1
                  else
                    sync2 = 9999
                  end if


                -- find closest sync
                sync1 = find( s2[at2], s1[at1..length(s1)] )

                -- too far?
                if sync1 > 0 and sync1 < MaxGap then
                    sync1 += at1 - 1
                else
                    sync1 = 9999
                end if


                -- evaluate sync
                if sync1 = 9999
                and sync2 = 9999 then
                    -- no sync
                    result &= sprintf( "[%s,%s]", {s1[at1],s2[at2]} )

                    -- at end?
                    if at1 = length( s1 )
                    or at2 = length( s2 ) then
                        exit
                    end if

                    -- skip
                    at1 += 1
                    at2 += 1


                elsif sync1 < sync2 then
                    -- match on sync1
                    for i = at1 to sync1-1 do
                        result &= sprintf( "[%s,]", {s1[i]} )
                    end for

                    -- sync
                    at1 = sync1
                    result &= s1[at1]

                    -- leave loop
                    exit


                else
                    -- match on sync2
                    for i = at2 to sync2-1 do
                        result &= sprintf( "[,%s]", {s2[i]})
                    end for

                    -- sync
                    at2 = sync2
                    result &= s2[at2]

                    -- leave loop
                    exit


                end if

            end while

        end if

    end while



    -- remainder?
    if at1 <= length( s1 ) then
        for i = at1 to length(s1) do
            result &= sprintf( "[%s,]", {s1[i]} )
        end for
    elsif at2 <= length( s2 ) then
        for i = at2 to length(s2) do
            result &= sprintf( "[,%s]", {s2[i]} )
        end for

    end if


    if ( match(result,bigresult) = 0 ) then
     bigresult &= "\n" & result
-- i used "\n" just to make it easy to puts() it
    end if

  end for

    return bigresult

end function -- diff( sequence s1, sequence s2, integer MinG, integer
MaxG )

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu