Re: Text File Comparison

new topic     » goto parent     » topic index » view thread      » older message » newer message

Problem is I am too lazy...
Well, the idea behind the algorithm is simple. Assume you have found the kth
matching pair. Then, starting at this point, you search for the (k+1)th
matching pair. From all the matching pairs you find, you should select the
one for which the sum of distances between the element in sequence s and the
starting point in this sequence, and the corresponding difference in
sequence t, is minimum. The complications arise because, to decrease the
time spent, one does not want to test all possibilities, but instead wants
to reject some because "a priori" they cannot give a better minimum
distance. This is the origin of variables n and topi.
Please find below a new version of the algorithm. I did not test it with the
files you used, because I don't have them. However, I am confident this time
the algorithm shall work ;)

--Program to show the differences between two files
--Author: R. M. Forno - Version 0.2 - 2002/02/16

function diff(sequence s, sequence t)
    integer i, lens, lent, max, min, a, b, z, n, m, topi
    sequence r, x
    lens = length(s)
    lent = length(t)
    max = lens + lent
    r = {}
    i = 1
    z = 1
    while i <= lens do
 min = max
 topi = lens + 1
 m = i
 while m < topi do
     x = s[m] --Speedwise
     n = min - m + i + z - 1
     if n > lent then
  n = lent
     end if
     for j = z to n do --Search for minimum slack
  if equal(x, t[j]) then
      min = m - i + j - z --Update minimum slack
      if min + i < topi then
   topi = min + i --Update upper limit for m
      end if
      a = m
      b = j
      exit
  end if
     end for
     m += 1
 end while
 if min < max then
     for k = i to a - 1 do
  r = append(r, {1, k})
     end for
     for k = z to b - 1 do
  r = append(r, {-1, k})
     end for
     r = append(r, {0, a})
     i = a + 1 --Update starting points
     z = b + 1
 else
     exit
 end if
    end while
    for k = i to lens do --Last part
 r = append(r, {1, k})
    end for
    for k = z to lent do
 r = append(r, {-1, k})
    end for
    return r
end function

function read_in(sequence fn)
    sequence s
    object x
    integer f
    s = {}
    f = open(fn, "r")
    if f < 0 then
 puts(2, "Error - cannot open " & fn)
 abort(1)
    end if
    while 1 do
 x = gets(f)
 if atom(x) then
     exit
 end if
 s = append(s, x)
    end while
    return s
end function

procedure out_diff(sequence r, sequence a, sequence b)
    sequence x
    integer n, z
    for i = 1 to length(r) do
 x = r[i]
 n = x[1]
 z = x[2]
 if n < 0 then
     puts(1, "< " & b[z])
 elsif n > 0 then
     puts(1, "> " & a[z])
 else
     puts(1, "  " & a[z])
 end if
    end for
end procedure

--Example of usage

sequence a1, a2, r
a1 = read_in("file1")
a2 = read_in("file2")
r = diff(a1, a2)
out_diff(r, a1, a2)

----- Original Message -----
From: <petelomax at blueyonder.co.uk>
To: "EUforum" <EUforum at topica.com>
Sent: Thursday, February 14, 2002 11:50 PM
Subject: Re: Text File Comparison



On Tue, 12 Feb 2002 23:22:45 -0300, you wrote:

>>I was going to ask you what max,min, and n were actually for and what
>>you were after with "minimum slack".
>Regarding the names of my variables, I did not take much care for them to
be
>significative... :)
That's helpful.

After general success, I decided to 'cane' your code to see how stable
it is, so I compared the stamped, comment stripped version of win32lib
(285K) and the fulltext version (673K)

while i <= lens do
min = max
topi = lens + 1
m = i
while m < topi do
x = s[m] --Speedwise
--                                   ^^^ subscript out of bounds
n = min - m + i + z - 1
if n > lent then
n = lent
end if
for j = z to n do --Search for minimum slack
if equal(x, t[j]) then
min = m - i + j - z --Update minimum slack
if min + i < topi then --<<<
topi = min + i --Update upper limit for m
end if --<<<
a = m
b = j
exit
end if
end for
m += 1
end while

I've put the above to stop if bombing but as you won't explain what
the code is trying to do I don't know, for example, if I should not
update a & b at that point or reset min...

Pete

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu