Re: data analysis
- Posted by David Cuny <dcuny at LANSET.COM> Jan 17, 2001
- 436 views
Kat wrote: > I still don't understand the resyncing, > but maybe i will one day. Think of the two strings as queues. The algorithm tries to make sure that the first two items in the queue are the same. If they are, it's a match, and they can be removed from the queue. For example: [C]amero [C]odas Both strings start with 'C', so they are removed from the queue. If the characters are different, the algorithm looks in both queues to see which has the smallest gap: amero = has 'o' in 5th position odas = has 'a' in 3rd position In this case, the gap of 3 is the smallest, so two characters are removed from the second string to get the two in 'sync': amero [od]as Finally, if there is no match with the prior test, both characters are simply removed from the head of each queue: [m]ero [a]s Eventually, both queues are empty, or one of the strings has stuff left over. That was the first version of the algorithm, anyway. I added a couple of additional tests. For one thing, it looks for reversed letters: [ca]t [ac]t For another, if there is a gap, it looks to see if there if it can remove characters from the queue instead. For example: clack dreck The solution of removing the first letters from both queues: [cla]ck [dre]ck is preferred to removing them from only one queue, using the 'sync' routine outlined above: clack [dre]ck <-- less good solution > Did you see how the code i > posted returns a list of results? Changing MaxGap on the fly seems to me to be a bad idea, but if it works better for you, go for it. -- David Cuny