1. data analysis
- Posted by Grape_ Vine_ <g__vine at HOTMAIL.COM> Jan 15, 2001
- 501 views
- Last edited Jan 16, 2001
I am trying to look at a set amount of data and figure out a repeting data pattern, any idea where i can get some routiens that already do this? None of mine work =( J Reeves Grape Vine ICQ# 13728824 _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com
2. Re: data analysis
- Posted by Graeme <graemeburke at CROSSWINDS.NET> Jan 16, 2001
- 450 views
At 08:51 PM 15/01/01 -0800, you wrote: >I am trying to look at a set amount of data and figure out a repeting data >pattern, any idea where i can get some routiens that already do this? None >of mine work =( > > > >J Reeves >Grape Vine if I understand the question correctly, this should help. It copies dat to s then searches s for characters. if it isn't a pattern, the found character is erased in s (by adding one to it) then it searches again. dat is preserved to test for patterns. This could be optimised by shortening s for subsequent searches. It still needs some checking, passing it the wrong sequence will crash it with a 'slice ends past seq end' error, but it should get you of on the right foot. Graeme function get_pattern(sequence dat) integer f sequence s,test s=dat s[1]+=1 while 1 do f=find(dat[1],s) if not f then return "No Pattern" end if if equal(dat[1..f-1],dat[f..f*2-2]) then test=dat[1..f-1] for x=1 to floor(length(dat)/length(test))+1 do test&=dat[1..f-1] end for if equal(test[1..length(dat)],dat) then return dat[1..f-1] end if s[f]+=1 else s[f]+=1 end if end while end function puts(1,get_pattern("asdfasdfasdfasdf")&"\n") puts(1,get_pattern("aaajaaajaaajaaaj")&"\n") puts(1,get_pattern("efvineifvuhdsdfsdf")&"\n") ----------------------------------------------------
3. Re: data analysis
- Posted by Kat <gertie at PELL.NET> Jan 16, 2001
- 446 views
Speaking of such, i got started on Metaphone again, and have the same complaint i had about Soundex: they both throw away too much data. So i am writing other codes into the Metaphone code, to present a bit more of the data, analyzed differently. And fine-tuning the Metaphone code. Testing on irc now, when i am happier with it, i'll send it to RDS. If i can't use it, i prolly won't send it. What i want to do is have Tiggr compare a few yrs of irc logs to a 130K word dictionary, and Metaphonize those words not in the dictionary, and see if she can find a match in the Metaphonized dictionary. Any suggestions? Kat
4. Re: data analysis
- Posted by Kat <gertie at PELL.NET> Jan 16, 2001
- 459 views
Graeme, since you did that amazing pattern finding code, i was wondering what your thoughts would be on this problem... given: <any> !m pittsbourgh <[Tiggr]> pittsbourgh = PITSBERG CVCCCVCC CVCVC 11312 <any> !m pittsburg <[Tiggr]> pittsburg = PITSBERG CVCCCVCC CVCVC 11312 <any> !m pittsburgh <[Tiggr]> pittsburgh = PITSBERG CVCCCVCC CVCVC 11312 <any> !m pittsberg <[Tiggr]> pittsberg = PITSBERG CVCCCVCC CVCVC 11312 there is still one piece of relavant data missing: word comparisons, for the patterns, for in the above instances, "pittsbourgh" vs "pittsburgh" might return (s=pittsb,d=o,s=urgh) or some representation containing the data that "pittsb" is the same in both, "o" is different, and "urgh" is the same in both. The location within the word is not so important at the moment; and i have made the decision that the first word is the longer of the two, just to standardize. As you can see in the above printout, i currently have 3 pieces of info: the Metaphoneized word (essentially a rough usa phonetic spelling), then the placement of consonants and vowels, and finally that placement with redundancies removed, and i'd like to be able to pass any of the three to the comparison code. The above example is possibly easy, since there is a big section of each word at the very start that is the same, my problem is in the resync of the comparison after that 'o'. Is this possible without the puter being bogged down all day in the comparison? Kat
5. Re: data analysis
- Posted by Graeme <graemeburke at CROSSWINDS.NET> Jan 17, 2001
- 462 views
At 02:14 PM 16/01/01 -0800, you wrote: >Kat wrote: > >> Graeme, since you did that amazing >> pattern finding code, i was wondering >> what your thoughts would be on this problem... > >I'm obviously not Graeme, but I couldn't resist. This routine will compare >two strings, returning the differences between them. For example: > > return result > >end function > Bah! Got me. mine's still about half a hour away. :) (Just woke up), I have a feeling it might be a fair bit quicker than that one, though. Graeme. ----------------------------------------------------
6. Re: data analysis
- Posted by "Cuny, David at DSS" <David.Cuny at DSS.CA.GOV> Jan 16, 2001
- 462 views
Kat wrote: > Graeme, since you did that amazing > pattern finding code, i was wondering > what your thoughts would be on this problem... I'm obviously not Graeme, but I couldn't resist. This routine will compare two strings, returning the differences between them. For example: diff( "abczz", "dbnz" ) will return: "[a,d]b[c,n]z[z,]" The notation is a bit funky, but you can adjust it as you see fit: 1. 'x' means that 'x' was found in both strings. 2. '[x,]' means that 'x' was found in the first string, but not the second. 3. '[,x]' means that 'x' was found in the second string, but not the first. 4. [x,y] means that 'x' was found in the first string, but not the second and 'y' was found in the second string but not the first. -- David Cuny -- compare two strings, return differences function diff( sequence s1, sequence s2 ) integer at1, at2, len, sync1, sync2 sequence result result = "" -- find shortest string if length( s1 ) > length( s2 ) then len = length( s1 ) else len = length( s2 ) end if at1 = 0 at2 = 0 -- process until the end of one string while 1 do -- move ahead at1 += 1 at2 += 1 -- past end of one string? if at1 > length( s1 ) or at2 > length( s2 ) then exit end if -- same? if s1[at1] = s2[at2] then result &= s1[at1] else -- attempt to resync while 1 do -- find closest sync sync1 = 9999 for i = at1+1 to length(s1) do if s2[at2] = s1[i] then sync1 = i exit end if end for sync2 = 9999 for i = at2+1 to length(s2) do if s1[at1] = s2[i] then sync2 = i exit end if end for -- result of sync if sync1 = 9999 and sync2 = 9999 then -- no sync result &= sprintf( "[%s,%s]", {s1[at1],s2[at2]} ) -- at end? if at1 = length( s1 ) or at2 = length( s2 ) then exit end if -- skip at1 += 1 at2 += 1 elsif sync1 < sync2 then -- match on sync1 for i = at1 to sync1-1 do result &= sprintf( "[%s,]", {s1[i]} ) end for -- sync at1 = sync1 result &= s1[at1] -- leave loop exit else -- match on sync2 for i = at2 to sync2-1 do result &= sprintf( "[,%s]", {s2[i]}) end for -- sync at2 = sync2 result &= s2[at2] -- leave loop exit end if end while end if end while -- remainder? if at1 <= length( s1 ) then for i = at1 to length(s1) do result &= sprintf( "[%s,]", {s1[i]} ) end for elsif at2 <= length( s2 ) then for i = at2 to length(s2) do result &= sprintf( "[,%s]", {s2[i]} ) end for end if return result end function
7. Re: data analysis
- Posted by Grape_ Vine_ <g__vine at HOTMAIL.COM> Jan 16, 2001
- 459 views
thanx for the help guys, Ill try them and see if one/both help. Mainly what i am doing it bit pattern matching tho it is in sequence's of 1/0 X 8 J Reeves Grape Vine ICQ# 13728824 _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com
8. Re: data analysis
- Posted by Graeme <graemeburke at CROSSWINDS.NET> Jan 17, 2001
- 458 views
--=====================_979651966==_ At 08:44 AM 17/01/01 +1100, you wrote: >At 02:14 PM 16/01/01 -0800, you wrote: >>Kat wrote: >> >>> Graeme, since you did that amazing >>> pattern finding code, i was wondering >>> what your thoughts would be on this problem... >> >>I'm obviously not Graeme, but I couldn't resist. This routine will compare >>two strings, returning the differences between them. For example: >> > >> return result >> >>end function >> > >Bah! Got me. mine's still about half a hour away. :) >(Just woke up), I have a feeling it might be a fair >bit quicker than that one, though. > >Graeme. OK so here it is: I've tested David's and my own solution with the following test data: {"this","that"}, {"actually","actaully"}, {"Pittsborough","Pittsburg"}, {"Pittsberg","Pittsburg"}, {"shello","hellos"}, {"fhfellos","shello"} results as follows ( p=pass , f=fail ) DC GB test1 p p test2 p p test3 p p test4 f p test5 p f test6 f f Depending on how harsh you're feeling, test3 could also be considered a failure for both. ('r' not matched) Benchmarking indicates that my routine is about 4.5 x faster than David's (on a P3) ( hopefully this will prompt David to write a routine using match() that is faster and better than both of the above ) routines & test prog attached Graeme --=====================_979651966==_ x-mac-type="705A4950"; x-mac-creator="705A4950"
9. Re: data analysis
- Posted by "Cuny, David at DSS" <David.Cuny at DSS.CA.GOV> Jan 16, 2001
- 463 views
Graeme wrote: > results as follows Thanks, I've made a couple of corrections: 1. Replaced the buggy code ('at1+1' should have read 'at+1') with find(). This corrected a bug, and made my code only about half as slow as Graeme's. Thanks for the hint! 2. Added a MaxGap value that rejects matches past a certain length. This allows matches text like the final example's. There are better ways to decide this, but it was quick and dirty. Thanks! -- David Cuny constant MaxGap = 6 global function diff( sequence s1, sequence s2 ) integer at1, at2, len, sync1, sync2 sequence result result = "" -- find shortest string if length( s1 ) > length( s2 ) then len = length( s1 ) else len = length( s2 ) end if at1 = 0 at2 = 0 -- process until the end of one string while 1 do -- move ahead at1 += 1 at2 += 1 -- past end of one string? if at1 > length( s1 ) or at2 > length( s2 ) then exit end if -- same? if s1[at1] = s2[at2] then result &= s1[at1] else -- attempt to resync while 1 do -- find closest sync sync1 = find( s2[at2], s1[at1..length(s1)] ) -- too far? if sync1 > 0 and sync1 < MaxGap then sync1 += at1 - 1 else sync1 = 9999 end if find closest sync point sync2 = find( s1[at1], s2[at2..length(s2)] ) -- too far? if sync2 > 0 and sync2 < MaxGap then sync2 += at2 - 1 else sync2 = 9999 end if -- evaluate sync if sync1 = 9999 and sync2 = 9999 then -- no sync result &= sprintf( "[%s,%s]", {s1[at1],s2[at2]} ) -- at end? if at1 = length( s1 ) or at2 = length( s2 ) then exit end if -- skip at1 += 1 at2 += 1 elsif sync1 < sync2 then -- match on sync1 for i = at1 to sync1-1 do result &= sprintf( "[%s,]", {s1[i]} ) end for -- sync at1 = sync1 result &= s1[at1] -- leave loop exit else -- match on sync2 for i = at2 to sync2-1 do result &= sprintf( "[,%s]", {s2[i]}) end for -- sync at2 = sync2 result &= s2[at2] -- leave loop exit end if end while end if end while -- remainder? if at1 <= length( s1 ) then for i = at1 to length(s1) do result &= sprintf( "[%s,]", {s1[i]} ) end for elsif at2 <= length( s2 ) then for i = at2 to length(s2) do result &= sprintf( "[,%s]", {s2[i]} ) end for end if return result end function
10. Re: data analysis
- Posted by Kat <gertie at PELL.NET> Jan 17, 2001
- 452 views
David and Graeme, all i can say is wow, and thanks. David's presentation of the data is going to be easier to use first, but in the following test, David's didn't resync properly, and i think cause it is it trying to resync always on s2. In Sweigsdunka vs Zweigsdanka, it sync'd and found "weigsd", but then didn't resync properly on the "nka". It took the first 'a' from Zweigsdanka and went looking for it in Sweigsdunka, not finding it untill the end of the word. It thereby missed the common "nka". Swaping the words around didn't help David's code's results at all, but messed up Graeme's code's results in a new way. Is it possible to force which word is the primary sync in your code, David, in a way i can spec while it's running? Mostly, i'd be looking for the result with the fewest number of differences. /me is still studying the code.... David, a question, why do you find shortest string at the start, and then not do anything with len? Kat
11. Re: data analysis
- Posted by Kat <gertie at PELL.NET> Jan 17, 2001
- 510 views
On 17 Jan 2001, at 0:55, Kat wrote: > David and Graeme, all i can say is wow, and thanks. David's presentation > of the data is going to be easier to use first, but in the following test, > David's didn't resync properly, and i think cause it is it trying to resync > always on s2. In Sweigsdunka vs Zweigsdanka, it sync'd and found "weigsd", > but then didn't resync properly on the "nka". It took the first 'a' from > Zweigsdanka and went looking for it in Sweigsdunka, not finding it untill the > end of the word. It thereby missed the common "nka". Swaping the words around > didn't help David's code's results at all, but messed up Graeme's code's > results in a new way. Is it possible to force which word is the primary sync > in your code, David, in a way i can spec while it's running? Mostly, i'd be > looking for the result with the fewest number of differences. /me is still > studying the code.... Changing MaxGap to 2 made it resync faster, but i'm not sure yet that passing maxgap to diff() along with the words is the right answer yet.. Kat
12. Re: data analysis
- Posted by Kat <gertie at PELL.NET> Jan 17, 2001
- 481 views
On 17 Jan 2001, at 14:02, Kat wrote: > On 17 Jan 2001, at 0:55, Kat wrote: > > > David and Graeme, all i can say is wow, and thanks. David's presentation of > > the data is going to be easier to use first, but in the following test, > > David's didn't resync properly, and i think cause it is it trying to resync > > always on s2. In Sweigsdunka vs Zweigsdanka, it sync'd and found "weigsd", > > but then didn't resync properly on the "nka". It took the first 'a' from > > Zweigsdanka and went looking for it in Sweigsdunka, not finding it untill > > the end of the word. It thereby missed the common "nka". Swaping the words > > around didn't help David's code's results at all, but messed up Graeme's > > code's results in a new way. Is it possible to force which word is the > > primary sync in your code, David, in a way i can spec while it's running? > > Mostly, i'd be looking for the result with the fewest number of differences. > > /me is still studying the code.... > > Changing MaxGap to 2 made it resync faster, but i'm not sure yet that > passing maxgap to diff() along with the words is the right answer yet.. Ok, passing different MaxGaps to diff() gets results, now to see if i can change MaxGap every time sync is lost..... Kat
13. Re: data analysis
- Posted by Kat <gertie at PELL.NET> Jan 17, 2001
- 476 views
- Last edited Jan 18, 2001
On 17 Jan 2001, at 14:02, Kat wrote: > On 17 Jan 2001, at 0:55, Kat wrote: > > > David and Graeme, all i can say is wow, and thanks. David's presentation of > > the data is going to be easier to use first, but in the following test, > > David's didn't resync properly, and i think cause it is it trying to resync > > always on s2. In Sweigsdunka vs Zweigsdanka, it sync'd and found "weigsd", > > but then didn't resync properly on the "nka". It took the first 'a' from > > Zweigsdanka and went looking for it in Sweigsdunka, not finding it untill > > the end of the word. It thereby missed the common "nka". Swaping the words > > around didn't help David's code's results at all, but messed up Graeme's > > code's results in a new way. Is it possible to force which word is the > > primary sync in your code, David, in a way i can spec while it's running? > > Mostly, i'd be looking for the result with the fewest number of differences. > > /me is still studying the code.... > > Changing MaxGap to 2 made it resync faster, but i'm not sure yet that > passing maxgap to diff() along with the words is the right answer yet.. So i passed both min and max gap to it, and diff() returns a list of unique results, this way an analysis of the returns gives max results without too much of an info overload. I still don't see how to change the MaxGap in the middle of a word, without causing problems, such as making the passed specs and results more complex, and maybe this is not needed, so i'll put it on the back burner for now. I found the range of MinG= 2 to MaxG = 6 works best in the following code on the words i tested: -- almost entirely David Cuny's code global function diff( sequence s1, sequence s2, integer MinG, integer MaxG ) integer at1, at2, sync1, sync2 sequence result, bigresult bigresult = "" for MaxGap = MinG to MaxG do result = "" at1 = 0 at2 = 0 -- process until the end of one string while 1 do -- move ahead at1 += 1 at2 += 1 -- past end of one string? if at1 > length( s1 ) or at2 > length( s2 ) then exit end if -- same? if s1[at1] = s2[at2] then result &= s1[at1] else -- attempt to resync while 1 do -- find closest sync point sync2 = find( s1[at1], s2[at2..length(s2)] ) -- too far? if sync2 > 0 and sync2 < MaxGap then sync2 += at2 - 1 else sync2 = 9999 end if -- find closest sync sync1 = find( s2[at2], s1[at1..length(s1)] ) -- too far? if sync1 > 0 and sync1 < MaxGap then sync1 += at1 - 1 else sync1 = 9999 end if -- evaluate sync if sync1 = 9999 and sync2 = 9999 then -- no sync result &= sprintf( "[%s,%s]", {s1[at1],s2[at2]} ) -- at end? if at1 = length( s1 ) or at2 = length( s2 ) then exit end if -- skip at1 += 1 at2 += 1 elsif sync1 < sync2 then -- match on sync1 for i = at1 to sync1-1 do result &= sprintf( "[%s,]", {s1[i]} ) end for -- sync at1 = sync1 result &= s1[at1] -- leave loop exit else -- match on sync2 for i = at2 to sync2-1 do result &= sprintf( "[,%s]", {s2[i]}) end for -- sync at2 = sync2 result &= s2[at2] -- leave loop exit end if end while end if end while -- remainder? if at1 <= length( s1 ) then for i = at1 to length(s1) do result &= sprintf( "[%s,]", {s1[i]} ) end for elsif at2 <= length( s2 ) then for i = at2 to length(s2) do result &= sprintf( "[,%s]", {s2[i]} ) end for end if if ( match(result,bigresult) = 0 ) then bigresult &= "\n" & result -- i used "\n" just to make it easy to puts() it end if end for return bigresult end function -- diff( sequence s1, sequence s2, integer MinG, integer MaxG )
14. Re: data analysis
- Posted by David Cuny <dcuny at LANSET.COM> Jan 17, 2001
- 465 views
- Last edited Jan 18, 2001
Here's an updated version of the pattern matcher. The output is changed a bit (hopefully it's more clear), and I've added some more features. It now checks for swapped letters, and prefers removal over resyncing. Given the test data: "this", "that" "actually", "actaully" "Pittsborough", "Pittsburg" "Pittsberg", "Pittsburg" "shello", "hellos" "fhfellos", "shello" "Sweigsdunka", "Zweigsdanka" It produces: th[is,at] act[ua,au]lly Pittsb[o,u]r[ou,]g[h,] Pittsb[e,u]rg [s,]hello[,s] [f,s]h[f,]ello[s,] [S,Z]weigsd[u,a]nka -- David Cuny constant MaxGap = 6 global function diff( sequence s1, sequence s2 ) integer at1, at2, sync1, sync2, diff1, diff2, best sequence result,left,right result = "" left = "" right = "" at1 = 0 at2 = 0 -- process until the end of one string while 1 do -- move ahead at1 += 1 at2 += 1 -- past end of one string? if at1 > length( s1 ) or at2 > length( s2 ) then exit end if -- same? if s1[at1] = s2[at2] then if length( left ) or length( right ) then result &= sprintf( "[%s,%s]", {left,right}) left = "" right = "" end if result &= s1[at1] else -- attempt to resync while 1 do -- swapped letters? if at1 < length( s1 ) and at2 < length( s2 ) and s1[at1] = s2[at2+1] and s1[at1+1] = s2[at2] then left &= s1[at1..at1+1] right &= s2[at2..at2+1] at1 += 1 at2 += 1 exit end if -- find closest sync diff1 = find( s2[at2], s1[at1..length(s1)] ) -- too far? if diff1 > 0 and diff1 < MaxGap then sync1 = at1 + diff1 - 1 else sync1 = 9999 diff1 = 9999 end if -- find closest sync point diff2 = find( s1[at1], s2[at2..length(s2)] ) -- too far? if diff2 > 0 and diff2 < MaxGap then sync2 = at2 + diff2 - 1 else sync2 = 9999 diff2 = 9999 end if -- better to remove chars? if sync1 != 9999 or sync2 != 9999 then if diff1 < diff2 then best = diff1 else best = diff2 end if for i = 1 to best do if at1+i > length( s1 ) or at2+i > length( s2 ) then exit end if if s1[at1+i-1] = s2[at2+i-1] then -- better match diff1 = i diff2 = i sync1 = at1+i-1 sync2 = at2+i-1 exit end if end for end if -- evaluate sync if sync1 = 9999 and sync2 = 9999 then -- no sync left &= s1[at1] right &= s2[at2] -- at end? if at1 = length( s1 ) or at2 = length( s2 ) then exit end if -- skip at1 += 1 at2 += 1 -- on a match? if s1[at1] = s2[at2] then if length( left ) or length( right ) then result &= sprintf( "[%s,%s]", {left,right}) left = "" right = "" end if result &= s1[at1] exit end if elsif diff1 = diff2 and s1[sync1] = s2[sync2] then -- match to both for i = 0 to diff1-2 do left &= s1[at1+i] right &= s2[at2+i] end for -- sync at1 = sync1 at2 = sync2 if length( left ) or length( right ) then result &= sprintf( "[%s,%s]", {left,right}) left = "" right = "" end if result &= s1[at1] -- leave loop exit elsif diff1 <= diff2 then -- match on sync1 for i = at1 to sync1-1 do left &= s1[i] end for -- sync at1 = sync1 if length( left ) or length( right ) then result &= sprintf( "[%s,%s]", {left,right}) left = "" right = "" end if result &= s1[at1] -- leave loop exit else -- match on sync2 for i = at2 to sync2-1 do right &= s2[i] end for -- sync at2 = sync2 if length( left ) or length( right ) then result &= sprintf( "[%s,%s]", {left,right}) left = "" right = "" end if result &= s2[at2] -- leave loop exit end if end while end if end while -- remainder? if at1 <= length( s1 ) then for i = at1 to length(s1) do left &= s1[i] end for elsif at2 <= length( s2 ) then for i = at2 to length(s2) do right &= s2[i] end for end if if length( left ) or length( right ) then result &= sprintf( "[%s,%s]", {left,right}) end if return result end function
15. Re: data analysis
- Posted by Kat <gertie at PELL.NET> Jan 17, 2001
- 469 views
- Last edited Jan 18, 2001
On 17 Jan 2001, at 19:15, David Cuny wrote: > Here's an updated version of the pattern matcher. The output is changed a bit > (hopefully it's more clear), and I've added some more features. It now checks > for swapped letters, and prefers removal over resyncing. > > Given the test data: > > "this", "that" > "actually", "actaully" > "Pittsborough", "Pittsburg" > "Pittsberg", "Pittsburg" > "shello", "hellos" > "fhfellos", "shello" > "Sweigsdunka", "Zweigsdanka" > > > It produces: > > th[is,at] > act[ua,au]lly > Pittsb[o,u]r[ou,]g[h,] > Pittsb[e,u]rg > [s,]hello[,s] > [f,s]h[f,]ello[s,] > [S,Z]weigsd[u,a]nka Ok, that breaks the feeding of MinG and MaxG to it. I still don't understand the resyncing, but maybe i will one day. Did you see how the code i posted returns a list of results? Comments? Kat
16. Re: data analysis
- Posted by Kat <gertie at PELL.NET> Jan 17, 2001
- 476 views
- Last edited Jan 18, 2001
On 17 Jan 2001, at 19:15, David Cuny wrote: > Here's an updated version of the pattern matcher. The output is changed a bit > (hopefully it's more clear), and I've added some more features. It now checks > for swapped letters, and prefers removal over resyncing. > > Given the test data: > "Pittsborough", "Pittsburg" > It produces: > Pittsb[o,u]r[ou,]g[h,] I can't get it to sync on the 'u' now. Everything else is quite workable! Kat
17. Re: data analysis
- Posted by David Cuny <dcuny at LANSET.COM> Jan 17, 2001
- 464 views
- Last edited Jan 18, 2001
Kat wrote: > I still don't understand the resyncing, > but maybe i will one day. Think of the two strings as queues. The algorithm tries to make sure that the first two items in the queue are the same. If they are, it's a match, and they can be removed from the queue. For example: [C]amero [C]odas Both strings start with 'C', so they are removed from the queue. If the characters are different, the algorithm looks in both queues to see which has the smallest gap: amero = has 'o' in 5th position odas = has 'a' in 3rd position In this case, the gap of 3 is the smallest, so two characters are removed from the second string to get the two in 'sync': amero [od]as Finally, if there is no match with the prior test, both characters are simply removed from the head of each queue: [m]ero [a]s Eventually, both queues are empty, or one of the strings has stuff left over. That was the first version of the algorithm, anyway. I added a couple of additional tests. For one thing, it looks for reversed letters: [ca]t [ac]t For another, if there is a gap, it looks to see if there if it can remove characters from the queue instead. For example: clack dreck The solution of removing the first letters from both queues: [cla]ck [dre]ck is preferred to removing them from only one queue, using the 'sync' routine outlined above: clack [dre]ck <-- less good solution > Did you see how the code i > posted returns a list of results? Changing MaxGap on the fly seems to me to be a bad idea, but if it works better for you, go for it. -- David Cuny
18. Re: data analysis
- Posted by Kat <gertie at PELL.NET> Jan 17, 2001
- 473 views
- Last edited Jan 18, 2001
On 17 Jan 2001, at 20:30, David Cuny wrote: > > Did you see how the code i > > posted returns a list of results? > > Changing MaxGap on the fly seems to me to be a bad idea, but if it works > better for you, go for it. Why is it a bad idea? What i was trying to do was to hit all possible resync points. Kat
19. Re: data analysis
- Posted by David Cuny <dcuny at LANSET.COM> Jan 17, 2001
- 493 views
- Last edited Jan 18, 2001
Kat wrote: > Why is [changing MaxGap] a bad idea? > What i was trying to do was to hit all > possible resync points. It's not, actually. For example, David Cope uses a similar pattern matcher in his EMI program, only it works with musical pitches instead of letters. In his application, the parameters are self tuning: he feeds it music by a composer, and it automatically adjusts parameters of the pattern matcher until the music it regenerates matches the same statistics as the source material. Something else you might want to consider - I recall reading that pattern matching for speech recognition wasn't all that great, until someone decided to use Markov chains to 'guess' what the next word might be. The utterance would be first compared to that set, and if there were no good candidates, a brute force match would be done. Perhaps something similar might work with Tiggr? -- David Cuny
20. Re: data analysis
- Posted by Kat <gertie at PELL.NET> Jan 18, 2001
- 466 views
On 17 Jan 2001, at 22:15, David Cuny wrote: > Kat wrote: > > > Why is [changing MaxGap] a bad idea? > > What i was trying to do was to hit all > > possible resync points. > > It's not, actually. For example, David Cope uses a similar pattern matcher in > his EMI program, only it works with musical pitches instead of letters. In his > application, the parameters are self tuning: he feeds it music by a composer, > and it automatically adjusts parameters of the pattern matcher until the music > it regenerates matches the same statistics as the source material. > > Something else you might want to consider - I recall reading that pattern > matching for speech recognition wasn't all that great, until someone decided > to use Markov chains to 'guess' what the next word might be. The utterance > would be first compared to that set, and if there were no good candidates, a > brute force match would be done. Perhaps something similar might work with > Tiggr? I considered that many yrs ago, but the research i have run across since has shown me that prediction works only in knowledge/theme domains the script has knowledge of. The trick is to either know all the domains under discussion (which is how most of the Turing tests held in Australia are done: the script's authors get to choose the domain for that script's interaction with the judges), or don't use much prediction. Since i use variables for nearly everything, turning on/off the prediction on the fly isn't a problem, and some word pre/postdiction is already coded into the database, as well as syntactic pre/post"diction". Even the domain is encoded for each word on a per-use basis, altho a *lot* of data is still not entered. What i have been concentrating on mostly, since i discovered harddrives no longer cost $1000/megabyte, is to collect the data as sets within a domain, with text concerning relavant words in that domain, for Tiggr or Gertie to use as pre/postdictors in that domain, as well as associative words. If she didn't have "nephrology" listed in the dictionary as medical, a broad search would find it in the knowledgebase in the grouping with other medical terms, and she could update the dictionary herself. At least i *hope* she has these eureka moments, cause i *sure* don't want to finish that job for her! I don't know why some things are so difficult for me to understand. It took me years to learn checkers, but i won my first chess game the same day i first saw the chess board. I'm wierd like that. So i am really glad to have this list here to help me out in difficult coding. Thanks again David. Graeme, i simply haven't gotten to your code yet, but i bet i use David's 2nd and 3rd posting with your's in parallel threads. Kat
21. Re: data analysis
- Posted by Graeme <graemeburke at CROSSWINDS.NET> Jan 18, 2001
- 467 views
- Last edited Jan 19, 2001
>I don't know why some things are so difficult for me to understand. It took >me years to learn checkers, but i won my first chess game the same day i >first saw the chess board. I'm wierd like that. So i am really glad to have >this list here to help me out in difficult coding. Thanks again David. >Graeme, i simply haven't gotten to your code yet, but i bet i use David's >2nd and 3rd posting with your's in parallel threads. Hey, I'm just kickin' back and enjoying this one. Feeling a bit guilty that I havn't done anything else with my little routine, but I've got pleanty to do and competeing with David on stuff like this is a bit of a pointless exercise. If I feel I can contribute I will, but it looks like you guys have it well in hand. Graeme. (BTW: The chess/checkers thing sounds exactly like me - there's a scary thought for you ----------------------------------------------------