Re: data analysis
On 17 Jan 2001, at 14:02, Kat wrote:
> On 17 Jan 2001, at 0:55, Kat wrote:
>
> > David and Graeme, all i can say is wow, and thanks. David's presentation of
> > the data is going to be easier to use first, but in the following test,
> > David's didn't resync properly, and i think cause it is it trying to resync
> > always on s2. In Sweigsdunka vs Zweigsdanka, it sync'd and found "weigsd",
> > but then didn't resync properly on the "nka". It took the first 'a' from
> > Zweigsdanka and went looking for it in Sweigsdunka, not finding it untill
> > the end of the word. It thereby missed the common "nka". Swaping the words
> > around didn't help David's code's results at all, but messed up Graeme's
> > code's results in a new way. Is it possible to force which word is the
> > primary sync in your code, David, in a way i can spec while it's running?
> > Mostly, i'd be looking for the result with the fewest number of differences.
> > /me is still studying the code....
>
> Changing MaxGap to 2 made it resync faster, but i'm not sure yet that
> passing maxgap to diff() along with the words is the right answer yet..
So i passed both min and max gap to it, and diff() returns a list of unique
results, this way an analysis of the returns gives max results without too
much of an info overload. I still don't see how to change the MaxGap in the
middle of a word, without causing problems, such as making the passed
specs and results more complex, and maybe this is not needed, so i'll put
it on the back burner for now. I found the range of MinG= 2 to MaxG = 6
works best in the following code on the words i tested:
-- almost entirely David Cuny's code
global function diff( sequence s1, sequence s2, integer MinG, integer
MaxG )
integer at1, at2, sync1, sync2
sequence result, bigresult
bigresult = ""
for MaxGap = MinG to MaxG do
result = ""
at1 = 0
at2 = 0
-- process until the end of one string
while 1 do
-- move ahead
at1 += 1
at2 += 1
-- past end of one string?
if at1 > length( s1 )
or at2 > length( s2 ) then
exit
end if
-- same?
if s1[at1] = s2[at2] then
result &= s1[at1]
else
-- attempt to resync
while 1 do
-- find closest sync point
sync2 = find( s1[at1], s2[at2..length(s2)] )
-- too far?
if sync2 > 0 and sync2 < MaxGap then
sync2 += at2 - 1
else
sync2 = 9999
end if
-- find closest sync
sync1 = find( s2[at2], s1[at1..length(s1)] )
-- too far?
if sync1 > 0 and sync1 < MaxGap then
sync1 += at1 - 1
else
sync1 = 9999
end if
-- evaluate sync
if sync1 = 9999
and sync2 = 9999 then
-- no sync
result &= sprintf( "[%s,%s]", {s1[at1],s2[at2]} )
-- at end?
if at1 = length( s1 )
or at2 = length( s2 ) then
exit
end if
-- skip
at1 += 1
at2 += 1
elsif sync1 < sync2 then
-- match on sync1
for i = at1 to sync1-1 do
result &= sprintf( "[%s,]", {s1[i]} )
end for
-- sync
at1 = sync1
result &= s1[at1]
-- leave loop
exit
else
-- match on sync2
for i = at2 to sync2-1 do
result &= sprintf( "[,%s]", {s2[i]})
end for
-- sync
at2 = sync2
result &= s2[at2]
-- leave loop
exit
end if
end while
end if
end while
-- remainder?
if at1 <= length( s1 ) then
for i = at1 to length(s1) do
result &= sprintf( "[%s,]", {s1[i]} )
end for
elsif at2 <= length( s2 ) then
for i = at2 to length(s2) do
result &= sprintf( "[,%s]", {s2[i]} )
end for
end if
if ( match(result,bigresult) = 0 ) then
bigresult &= "\n" & result
-- i used "\n" just to make it easy to puts() it
end if
end for
return bigresult
end function -- diff( sequence s1, sequence s2, integer MinG, integer
MaxG )
|
Not Categorized, Please Help
|
|