Re: Text File Comparison
- Posted by rforno at tutopia.com Feb 10, 2002
- 473 views
Please find below a corrected and improved version of my previous algorithm on the subject. --Program to show the differences between two files --Author: R. M. Forno - Version 0.1 - 2002/02/10 function diff(sequence s, sequence t) integer i, lens, lent, max, min, a, b, z, n, m, topi sequence r, x lens = length(s) lent = length(t) max = lens + lent r = {} i = 1 z = 1 while i <= lens do min = max topi = lens + 1 m = i while m < topi do x = s[m] --Speedwise n = min - m + i + z - 1 if n > lent then n = lent end if for j = z to n do --Search for minimum slack if equal(x, t[j]) then min = m - i + j - z --Update minimum slack topi = min + i --Update upper limit for m a = m b = j exit end if end for m += 1 end while if min < max then for k = i to a - 1 do r = append(r, {1, k}) end for for k = z to b - 1 do r = append(r, {-1, k}) end for r = append(r, {0, a}) i = a + 1 --Update starting points z = b + 1 else exit end if end while for k = i to lens do --Last part r = append(r, {1, k}) end for for k = z to lent do r = append(r, {-1, k}) end for return r end function function read_in(sequence fn) sequence s object x integer f s = {} f = open(fn, "r") if f < 0 then puts(2, "Error - cannot open " & fn) abort(1) end if while 1 do x = gets(f) if atom(x) then exit end if s = append(s, x) end while return s end function procedure out_diff(sequence r, sequence a, sequence b) sequence x integer n, z for i = 1 to length(r) do x = r[i] n = x[1] z = x[2] if n < 0 then puts(1, "< " & b[z]) elsif n > 0 then puts(1, "> " & a[z]) else puts(1, " " & a[z]) end if end for end procedure --Example of usage sequence a1, a2, r a1 = read_in("file1") a2 = read_in("file2") r = diff(a1, a2) out_diff(r, a1, a2) ----- Original Message ----- From: <petelomax at blueyonder.co.uk> To: "EUforum" <EUforum at topica.com> Sent: Friday, February 08, 2002 9:40 PM Subject: Text File Comparison Looking for a source file comparison utility - must be written in Euphoria or a source I can translate. Thinking out loud, it seems non-trivial to report the smallest possible number of changed lines, which is what I want. At the moment I'm struggling with DOS fc utility but I'd like an output similar to: function fred() sequence result >integer i result={} < for i = 1 to 10 < if skip[i]=0 then > i=1 > while i <= 10 > if skip[i]>0 then > i+=skip[i] > else result&=i > i+=1 end if < end for > end while return result end procedure whereby ">" lines have been added & "<" removed. Hopefully someone out there in the Linux world has the source of "diff" I think it is which I suspect handles this alot better than I could starting from scratch. Using fc I get alot of false realigns on "end if" causing the output to be much larger than it ought to be. Raw performance is unlikely to be an issue. Pete