Re: Text File Comparison
- Posted by rforno at tutopia.com
Feb 10, 2002
Please find below a corrected and improved version of my previous algorithm
on the subject.
--Program to show the differences between two files
--Author: R. M. Forno - Version 0.1 - 2002/02/10
function diff(sequence s, sequence t)
integer i, lens, lent, max, min, a, b, z, n, m, topi
sequence r, x
lens = length(s)
lent = length(t)
max = lens + lent
r = {}
i = 1
z = 1
while i <= lens do
min = max
topi = lens + 1
m = i
while m < topi do
x = s[m] --Speedwise
n = min - m + i + z - 1
if n > lent then
n = lent
end if
for j = z to n do --Search for minimum slack
if equal(x, t[j]) then
min = m - i + j - z --Update minimum slack
topi = min + i --Update upper limit for m
a = m
b = j
exit
end if
end for
m += 1
end while
if min < max then
for k = i to a - 1 do
r = append(r, {1, k})
end for
for k = z to b - 1 do
r = append(r, {-1, k})
end for
r = append(r, {0, a})
i = a + 1 --Update starting points
z = b + 1
else
exit
end if
end while
for k = i to lens do --Last part
r = append(r, {1, k})
end for
for k = z to lent do
r = append(r, {-1, k})
end for
return r
end function
function read_in(sequence fn)
sequence s
object x
integer f
s = {}
f = open(fn, "r")
if f < 0 then
puts(2, "Error - cannot open " & fn)
abort(1)
end if
while 1 do
x = gets(f)
if atom(x) then
exit
end if
s = append(s, x)
end while
return s
end function
procedure out_diff(sequence r, sequence a, sequence b)
sequence x
integer n, z
for i = 1 to length(r) do
x = r[i]
n = x[1]
z = x[2]
if n < 0 then
puts(1, "< " & b[z])
elsif n > 0 then
puts(1, "> " & a[z])
else
puts(1, " " & a[z])
end if
end for
end procedure
--Example of usage
sequence a1, a2, r
a1 = read_in("file1")
a2 = read_in("file2")
r = diff(a1, a2)
out_diff(r, a1, a2)
----- Original Message -----
From: <petelomax at blueyonder.co.uk>
To: "EUforum" <EUforum at topica.com>
Sent: Friday, February 08, 2002 9:40 PM
Subject: Text File Comparison
Looking for a source file comparison utility - must be written in
Euphoria or a source I can translate.
Thinking out loud, it seems non-trivial to report the smallest
possible number of changed lines, which is what I want.
At the moment I'm struggling with DOS fc utility but I'd like an
output similar to:
function fred()
sequence result
>integer i
result={}
< for i = 1 to 10
< if skip[i]=0 then
> i=1
> while i <= 10
> if skip[i]>0 then
> i+=skip[i]
> else
result&=i
> i+=1
end if
< end for
> end while
return result
end procedure
whereby ">" lines have been added & "<" removed.
Hopefully someone out there in the Linux world has the source of
"diff" I think it is which I suspect handles this alot better than I
could starting from scratch.
Using fc I get alot of false realigns on "end if" causing the output
to be much larger than it ought to be. Raw performance is unlikely to
be an issue.
Pete
|
Not Categorized, Please Help
|
|