Re: Text File Comparison
- Posted by rforno at tutopia.com Feb 16, 2002
- 484 views
Problem is I am too lazy... Well, the idea behind the algorithm is simple. Assume you have found the kth matching pair. Then, starting at this point, you search for the (k+1)th matching pair. From all the matching pairs you find, you should select the one for which the sum of distances between the element in sequence s and the starting point in this sequence, and the corresponding difference in sequence t, is minimum. The complications arise because, to decrease the time spent, one does not want to test all possibilities, but instead wants to reject some because "a priori" they cannot give a better minimum distance. This is the origin of variables n and topi. Please find below a new version of the algorithm. I did not test it with the files you used, because I don't have them. However, I am confident this time the algorithm shall work ;) --Program to show the differences between two files --Author: R. M. Forno - Version 0.2 - 2002/02/16 function diff(sequence s, sequence t) integer i, lens, lent, max, min, a, b, z, n, m, topi sequence r, x lens = length(s) lent = length(t) max = lens + lent r = {} i = 1 z = 1 while i <= lens do min = max topi = lens + 1 m = i while m < topi do x = s[m] --Speedwise n = min - m + i + z - 1 if n > lent then n = lent end if for j = z to n do --Search for minimum slack if equal(x, t[j]) then min = m - i + j - z --Update minimum slack if min + i < topi then topi = min + i --Update upper limit for m end if a = m b = j exit end if end for m += 1 end while if min < max then for k = i to a - 1 do r = append(r, {1, k}) end for for k = z to b - 1 do r = append(r, {-1, k}) end for r = append(r, {0, a}) i = a + 1 --Update starting points z = b + 1 else exit end if end while for k = i to lens do --Last part r = append(r, {1, k}) end for for k = z to lent do r = append(r, {-1, k}) end for return r end function function read_in(sequence fn) sequence s object x integer f s = {} f = open(fn, "r") if f < 0 then puts(2, "Error - cannot open " & fn) abort(1) end if while 1 do x = gets(f) if atom(x) then exit end if s = append(s, x) end while return s end function procedure out_diff(sequence r, sequence a, sequence b) sequence x integer n, z for i = 1 to length(r) do x = r[i] n = x[1] z = x[2] if n < 0 then puts(1, "< " & b[z]) elsif n > 0 then puts(1, "> " & a[z]) else puts(1, " " & a[z]) end if end for end procedure --Example of usage sequence a1, a2, r a1 = read_in("file1") a2 = read_in("file2") r = diff(a1, a2) out_diff(r, a1, a2) ----- Original Message ----- From: <petelomax at blueyonder.co.uk> To: "EUforum" <EUforum at topica.com> Sent: Thursday, February 14, 2002 11:50 PM Subject: Re: Text File Comparison On Tue, 12 Feb 2002 23:22:45 -0300, you wrote: >>I was going to ask you what max,min, and n were actually for and what >>you were after with "minimum slack". >Regarding the names of my variables, I did not take much care for them to be >significative... :) That's helpful. After general success, I decided to 'cane' your code to see how stable it is, so I compared the stamped, comment stripped version of win32lib (285K) and the fulltext version (673K) while i <= lens do min = max topi = lens + 1 m = i while m < topi do x = s[m] --Speedwise -- ^^^ subscript out of bounds n = min - m + i + z - 1 if n > lent then n = lent end if for j = z to n do --Search for minimum slack if equal(x, t[j]) then min = m - i + j - z --Update minimum slack if min + i < topi then --<<< topi = min + i --Update upper limit for m end if --<<< a = m b = j exit end if end for m += 1 end while I've put the above to stop if bombing but as you won't explain what the code is trying to do I don't know, for example, if I should not update a & b at that point or reset min... Pete