1. Text File Comparison
Looking for a source file comparison utility - must be written in
Euphoria or a source I can translate.
Thinking out loud, it seems non-trivial to report the smallest
possible number of changed lines, which is what I want.
At the moment I'm struggling with DOS fc utility but I'd like an
output similar to:
function fred()
sequence result
>integer i
result={}
< for i = 1 to 10
< if skip[i]=0 then
> i=1
> while i <= 10
> if skip[i]>0 then
> i+=skip[i]
> else
result&=i
> i+=1
end if
< end for
> end while
return result
end procedure
whereby ">" lines have been added & "<" removed.
Hopefully someone out there in the Linux world has the source of
"diff" I think it is which I suspect handles this alot better than I
could starting from scratch.
Using fc I get alot of false realigns on "end if" causing the output
to be much larger than it ought to be. Raw performance is unlikely to
be an issue.
Pete
2. Re: Text File Comparison
> Looking for a source file comparison utility - must be written in
> Euphoria or a source I can translate.
> Thinking out loud, it seems non-trivial to report the smallest
> possible number of changed lines, which is what I want.
Goto www.gnu.org and grab the source of diff : "diffutils-2.7.tar.gz" , 305 kB
It's in C. I tried to traslate it to Euphoria but was too difficult for me.
"The algorithm is described in "An O(ND) Difference Algorithm and its
Variations", Eugene
W. Myers, Algorithmica Vol. 1 No. 2, 1986, pp. 251--266; and in "A File
Comparison
Program", Webb Miller and Eugene W. Myers, Software--Practice and Experience
Vol. 15 No.
11, 1985, pp. 1025--1040. The algorithm was independently discovered as
described in
"Algorithms for Approximate String Matching", E. Ukkonen, Information and
Control Vol. 64,
1985, pp. 100--118."
Martin
3. Re: Text File Comparison
- Posted by rforno at tutopia.com
Feb 09, 2002
I've found this to be an interestin challenge, and consequently I developed
the following algorithm. I don't know if it is the same one mentioned by
Martin Stachon, or if it is optimal, but I made some tests and it works OK.
However, it is not fully tested.
--Program to show the differences between two files
--Author: R. M. Forno - Version 0.0 - 2002/02/09
function diff(sequence s, sequence t)
integer i, lens, lent, max, min, a, b, z, n
sequence r, x
lens = length(s)
lent = length(t)
max = lens + lent - 1
r = {}
i = 1
z = 1
while i <= lens do
min = max
for m = i to lens do
if m - i >= min then
exit
end if
x = s[m] --Speedwise
n = min - m + i + z --Speedwise
for j = z to lent do --Search for minimum slack
if j >= n then
exit
end if
if equal(x, t[j]) then
min = m - i + j - z --Update minimum slack
a = m
b = j
exit
end if
end for
end for
if min < max then
for k = i to a - 1 do
r = append(r, {1, k})
end for
for k = z to b - 1 do
r = append(r, {-1, k})
end for
r = append(r, {0, a})
i = a + 1 --Update starting points
z = b + 1
else
for k = i to lens do --Last part
r = append(r, {1, k})
end for
for k = z to lent do
r = append(r, {-1, k})
end for
exit
end if
end while
return r
end function
function read_in(sequence fn)
sequence s
object x
integer f
s = {}
f = open(fn, "r")
if f < 0 then
puts(2, "Error - cannot open " & fn)
abort(1)
end if
while 1 do
x = gets(f)
if atom(x) then
exit
end if
s = append(s, x)
end while
return s
end function
procedure out_diff(sequence r, sequence a, sequence b)
sequence x
integer n, z
for i = 1 to length(r) do
x = r[i]
n = x[1]
z = x[2]
if n < 0 then
puts(1, "< " & b[z])
elsif n > 0 then
puts(1, "> " & a[z])
else
puts(1, " " & a[z])
end if
end for
end procedure
--Example of usage
sequence a1, a2, r
a1 = read_in("file1")
a2 = read_in("file2")
r = diff(a1, a2)
out_diff(r, a1, a2)
----- Original Message -----
From: <petelomax at blueyonder.co.uk>
To: "EUforum" <EUforum at topica.com>
Sent: Friday, February 08, 2002 9:40 PM
Subject: Text File Comparison
Looking for a source file comparison utility - must be written in
Euphoria or a source I can translate.
Thinking out loud, it seems non-trivial to report the smallest
possible number of changed lines, which is what I want.
At the moment I'm struggling with DOS fc utility but I'd like an
output similar to:
function fred()
sequence result
>integer i
result={}
< for i = 1 to 10
< if skip[i]=0 then
> i=1
> while i <= 10
> if skip[i]>0 then
> i+=skip[i]
> else
result&=i
> i+=1
end if
< end for
> end while
return result
end procedure
whereby ">" lines have been added & "<" removed.
Hopefully someone out there in the Linux world has the source of
"diff" I think it is which I suspect handles this alot better than I
could starting from scratch.
Using fc I get alot of false realigns on "end if" causing the output
to be much larger than it ought to be. Raw performance is unlikely to
be an issue.
Pete
4. Re: Text File Comparison
- Posted by rforno at tutopia.com
Feb 10, 2002
Please find below a corrected and improved version of my previous algorithm
on the subject.
--Program to show the differences between two files
--Author: R. M. Forno - Version 0.1 - 2002/02/10
function diff(sequence s, sequence t)
integer i, lens, lent, max, min, a, b, z, n, m, topi
sequence r, x
lens = length(s)
lent = length(t)
max = lens + lent
r = {}
i = 1
z = 1
while i <= lens do
min = max
topi = lens + 1
m = i
while m < topi do
x = s[m] --Speedwise
n = min - m + i + z - 1
if n > lent then
n = lent
end if
for j = z to n do --Search for minimum slack
if equal(x, t[j]) then
min = m - i + j - z --Update minimum slack
topi = min + i --Update upper limit for m
a = m
b = j
exit
end if
end for
m += 1
end while
if min < max then
for k = i to a - 1 do
r = append(r, {1, k})
end for
for k = z to b - 1 do
r = append(r, {-1, k})
end for
r = append(r, {0, a})
i = a + 1 --Update starting points
z = b + 1
else
exit
end if
end while
for k = i to lens do --Last part
r = append(r, {1, k})
end for
for k = z to lent do
r = append(r, {-1, k})
end for
return r
end function
function read_in(sequence fn)
sequence s
object x
integer f
s = {}
f = open(fn, "r")
if f < 0 then
puts(2, "Error - cannot open " & fn)
abort(1)
end if
while 1 do
x = gets(f)
if atom(x) then
exit
end if
s = append(s, x)
end while
return s
end function
procedure out_diff(sequence r, sequence a, sequence b)
sequence x
integer n, z
for i = 1 to length(r) do
x = r[i]
n = x[1]
z = x[2]
if n < 0 then
puts(1, "< " & b[z])
elsif n > 0 then
puts(1, "> " & a[z])
else
puts(1, " " & a[z])
end if
end for
end procedure
--Example of usage
sequence a1, a2, r
a1 = read_in("file1")
a2 = read_in("file2")
r = diff(a1, a2)
out_diff(r, a1, a2)
----- Original Message -----
From: <petelomax at blueyonder.co.uk>
To: "EUforum" <EUforum at topica.com>
Sent: Friday, February 08, 2002 9:40 PM
Subject: Text File Comparison
Looking for a source file comparison utility - must be written in
Euphoria or a source I can translate.
Thinking out loud, it seems non-trivial to report the smallest
possible number of changed lines, which is what I want.
At the moment I'm struggling with DOS fc utility but I'd like an
output similar to:
function fred()
sequence result
>integer i
result={}
< for i = 1 to 10
< if skip[i]=0 then
> i=1
> while i <= 10
> if skip[i]>0 then
> i+=skip[i]
> else
result&=i
> i+=1
end if
< end for
> end while
return result
end procedure
whereby ">" lines have been added & "<" removed.
Hopefully someone out there in the Linux world has the source of
"diff" I think it is which I suspect handles this alot better than I
could starting from scratch.
Using fc I get alot of false realigns on "end if" causing the output
to be much larger than it ought to be. Raw performance is unlikely to
be an issue.
Pete
5. Re: Text File Comparison
On Sun, 10 Feb 2002 22:39:21 -0300, you wrote:
>
>Please find below a corrected and improved version of my previous algorithm
>on the subject.
I'm intrigued enough to show my embarrassed ignorance.
I was going to ask you what max,min, and n were actually for and what
you were after with "minimum slack". I've just been sidewiped with a
rather nasty flu-type bug which means I've managed little this past
week but as far as I've tested your algorithm works well. Now of
course I'm also intrigued as to what bugs you just fixed.
Pete
6. Re: Text File Comparison
- Posted by rforno at tutopia.com
Feb 12, 2002
My first version did not show the non-matching items of the second sequence
after the last match.
Regarding the names of my variables, I did not take much care for them to be
significative... :)
----- Original Message -----
From: <petelomax at blueyonder.co.uk>
To: "EUforum" <EUforum at topica.com>
Subject: Re: Text File Comparison
On Sun, 10 Feb 2002 22:39:21 -0300, you wrote:
>
>Please find below a corrected and improved version of my previous algorithm
>on the subject.
I'm intrigued enough to show my embarrassed ignorance.
I was going to ask you what max,min, and n were actually for and what
you were after with "minimum slack". I've just been sidewiped with a
rather nasty flu-type bug which means I've managed little this past
week but as far as I've tested your algorithm works well. Now of
course I'm also intrigued as to what bugs you just fixed.
Pete
7. Re: Text File Comparison
On Tue, 12 Feb 2002 23:22:45 -0300, you wrote:
>>I was going to ask you what max,min, and n were actually for and what
>>you were after with "minimum slack".
>Regarding the names of my variables, I did not take much care for them to be
>significative... :)
That's helpful.
After general success, I decided to 'cane' your code to see how stable
it is, so I compared the stamped, comment stripped version of win32lib
(285K) and the fulltext version (673K)
while i <= lens do
min = max
topi = lens + 1
m = i
while m < topi do
x = s[m] --Speedwise
-- ^^^ subscript out of bounds
n = min - m + i + z - 1
if n > lent then
n = lent
end if
for j = z to n do --Search for minimum slack
if equal(x, t[j]) then
min = m - i + j - z --Update minimum slack
if min + i < topi then --<<<
topi = min + i --Update upper limit for m
end if --<<<
a = m
b = j
exit
end if
end for
m += 1
end while
I've put the above to stop if bombing but as you won't explain what
the code is trying to do I don't know, for example, if I should not
update a & b at that point or reset min...
Pete
8. Re: Text File Comparison
- Posted by rforno at tutopia.com
Feb 16, 2002
Problem is I am too lazy...
Well, the idea behind the algorithm is simple. Assume you have found the kth
matching pair. Then, starting at this point, you search for the (k+1)th
matching pair. From all the matching pairs you find, you should select the
one for which the sum of distances between the element in sequence s and the
starting point in this sequence, and the corresponding difference in
sequence t, is minimum. The complications arise because, to decrease the
time spent, one does not want to test all possibilities, but instead wants
to reject some because "a priori" they cannot give a better minimum
distance. This is the origin of variables n and topi.
Please find below a new version of the algorithm. I did not test it with the
files you used, because I don't have them. However, I am confident this time
the algorithm shall work ;)
--Program to show the differences between two files
--Author: R. M. Forno - Version 0.2 - 2002/02/16
function diff(sequence s, sequence t)
integer i, lens, lent, max, min, a, b, z, n, m, topi
sequence r, x
lens = length(s)
lent = length(t)
max = lens + lent
r = {}
i = 1
z = 1
while i <= lens do
min = max
topi = lens + 1
m = i
while m < topi do
x = s[m] --Speedwise
n = min - m + i + z - 1
if n > lent then
n = lent
end if
for j = z to n do --Search for minimum slack
if equal(x, t[j]) then
min = m - i + j - z --Update minimum slack
if min + i < topi then
topi = min + i --Update upper limit for m
end if
a = m
b = j
exit
end if
end for
m += 1
end while
if min < max then
for k = i to a - 1 do
r = append(r, {1, k})
end for
for k = z to b - 1 do
r = append(r, {-1, k})
end for
r = append(r, {0, a})
i = a + 1 --Update starting points
z = b + 1
else
exit
end if
end while
for k = i to lens do --Last part
r = append(r, {1, k})
end for
for k = z to lent do
r = append(r, {-1, k})
end for
return r
end function
function read_in(sequence fn)
sequence s
object x
integer f
s = {}
f = open(fn, "r")
if f < 0 then
puts(2, "Error - cannot open " & fn)
abort(1)
end if
while 1 do
x = gets(f)
if atom(x) then
exit
end if
s = append(s, x)
end while
return s
end function
procedure out_diff(sequence r, sequence a, sequence b)
sequence x
integer n, z
for i = 1 to length(r) do
x = r[i]
n = x[1]
z = x[2]
if n < 0 then
puts(1, "< " & b[z])
elsif n > 0 then
puts(1, "> " & a[z])
else
puts(1, " " & a[z])
end if
end for
end procedure
--Example of usage
sequence a1, a2, r
a1 = read_in("file1")
a2 = read_in("file2")
r = diff(a1, a2)
out_diff(r, a1, a2)
----- Original Message -----
From: <petelomax at blueyonder.co.uk>
To: "EUforum" <EUforum at topica.com>
Sent: Thursday, February 14, 2002 11:50 PM
Subject: Re: Text File Comparison
On Tue, 12 Feb 2002 23:22:45 -0300, you wrote:
>>I was going to ask you what max,min, and n were actually for and what
>>you were after with "minimum slack".
>Regarding the names of my variables, I did not take much care for them to
be
>significative... :)
That's helpful.
After general success, I decided to 'cane' your code to see how stable
it is, so I compared the stamped, comment stripped version of win32lib
(285K) and the fulltext version (673K)
while i <= lens do
min = max
topi = lens + 1
m = i
while m < topi do
x = s[m] --Speedwise
-- ^^^ subscript out of bounds
n = min - m + i + z - 1
if n > lent then
n = lent
end if
for j = z to n do --Search for minimum slack
if equal(x, t[j]) then
min = m - i + j - z --Update minimum slack
if min + i < topi then --<<<
topi = min + i --Update upper limit for m
end if --<<<
a = m
b = j
exit
end if
end for
m += 1
end while
I've put the above to stop if bombing but as you won't explain what
the code is trying to do I don't know, for example, if I should not
update a & b at that point or reset min...
Pete