1. RE: Check if files equal
- Posted by Derek Parnell <Derek.Parnell at SYD.RABOBANK.COM> Jul 07, 2002
- 398 views
This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_000_01C22612.2AE20EC0 charset=iso-8859-1 Here is a routine that might be useful to someone... --------------- include file.e -- Returns true (1) if the two files contain the same data, otherwise it returns false (0) -- Parameters: -- 1: sequence : The path and name of a file -- 2: sequence : The path and name of another file function fileEqual(sequence pFileA, sequence pFileB) integer lhA, lhB integer lcA, lcB object ldA, ldB -- First check that they exist and that they are the same size. ldA = dir(pFileA) ldB = dir(pFileB) if atom(ldA) or atom(ldB) or ldA[1][D_SIZE] != ldB[1][D_SIZE] then return 0 end if -- Now compare each byte, starting from the first byte. lhA = open(pFileA, "rb") lhB = open(pFileB, "rb") lcA = 0 lcB = 0 -- Stop comparing as soon as a mismatch or EOF is found. while lcA = lcB and lcA != -1 do lcA = getc(lhA) lcB = getc(lhB) end while close(lhA) close(lhB) -- if we end up with EOF in both files, they must be equal. return (lcA = -1) and (lcB = -1) end function ------------- Derek. ================================================================== De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht ontvangt wordt u verzocht de inhoud niet te gebruiken en de afzender direct te informeren door het bericht te retourneren. ================================================================== The information contained in this message may be confidential and is intended to be exclusively for the addressee. Should you receive this message unintentionally, please do not use the contents herein and notify the sender immediately by return e-mail. ================================================================== ------_=_NextPart_000_01C22612.2AE20EC0 Content-Type: application/ms-tnef
2. RE: Check if files equal
- Posted by Derek Parnell <Derek.Parnell at SYD.RABOBANK.COM> Jul 07, 2002
- 375 views
This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_000_01C22613.6CC6B0B0 charset=iso-8859-1 Hi all, in the original question put to the list by Tone, he mentioned 500MB+ files. I'm guessing that these are databases, given the size. If so, checking the bytes from the front of the files is probably a good idea because a lot of database systems keep pointers and stamps near the front of the database file. Thus, even if the file sizes haven't been changed, a short scan will probably find a changed stamp or pointer. If you get about 25% through the files and haven't found a mismatch, you might take a risk that they are the same - or you might like to check the last 25% too. Just a thought. --------- Derek. ================================================================== De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht ontvangt wordt u verzocht de inhoud niet te gebruiken en de afzender direct te informeren door het bericht te retourneren. ================================================================== The information contained in this message may be confidential and is intended to be exclusively for the addressee. Should you receive this message unintentionally, please do not use the contents herein and notify the sender immediately by return e-mail. ================================================================== ------_=_NextPart_000_01C22613.6CC6B0B0 Content-Type: application/ms-tnef
3. RE: Check if files equal
- Posted by Derek Parnell <Derek.Parnell at SYD.RABOBANK.COM> Jul 07, 2002
- 397 views
This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_000_01C22630.D88A8E80 charset=iso-8859-1 So I guess the situation is that you have an incoming file and you want to see if you already have that file? If this guess is right, then try this algorithm: Calculate a checksum for the new file. For each file you already have: if the new checksum is the same as an existing file's checksum then reject the new file as a duplicate. end if end for If no existing checksum matched the new one, add the new file to your storage, keep it's checksum to compare when you get other new files. end if This way, you only ever calculate a file's checksum once, and from then on, you only compare checksums, which is fairly fast. The trick will be to create a checksum that is truely representative of the whole file. You'll probably need a 32-bit checksum for each 2^32 bits (536,870,912 bytes). ------- Derek. > -----Original Message----- > From: 10963508 at europeonline.com [mailto:10963508 at europeonline.com] > Sent: Monday, 8 July 2002 12:32 > To: EUforum > Subject: Re: Check if files equal > > > > Files are not databases, they are .zip .avi and .mp3 files > mainly (stuff > coming down from satellite) - so they are compressed in some way. > Speed is more important than accuracy. > > Tone Skoda > > ----- Original Message ----- > From: "Derek Parnell" <Derek.Parnell at SYD.RABOBANK.COM> > To: "EUforum" <EUforum at topica.com> > Sent: Monday, July 08, 2002 2:07 AM > Subject: RE: Check if files equal > > > > Hi all, > > in the original question put to the list by Tone, he > mentioned 500MB+ > files. > > I'm guessing that these are databases, given the size. If > so, checking the > > bytes from the front of the files is probably a good idea > because a lot of > > database systems keep pointers and stamps near the front of > the database > > file. Thus, even if the file sizes haven't been changed, a > short scan will > > probably find a changed stamp or pointer. If you get about > 25% through the > > files and haven't found a mismatch, you might take a risk > that they are > the > > same - or you might like to check the last 25% too. Just a thought. > > > > --------- > > Derek. > > > > ================================================================== > > > > > > ================================================================== > > > > > > > ================================================================== De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onterecht ontvangt wordt u verzocht de inhoud niet te gebruiken en de afzender direct te informeren door het bericht te retourneren. ================================================================== The information contained in this message may be confidential and is intended to be exclusively for the addressee. Should you receive this message unintentionally, please do not use the contents herein and notify the sender immediately by return e-mail. ================================================================== ------_=_NextPart_000_01C22630.D88A8E80 Content-Type: application/ms-tnef
4. RE: Check if files equal
- Posted by jordah ferguson <jorfergie03 at yahoo.com> Jul 08, 2002
- 421 views
Kats test is very efficient, but terribly slow. i think checking last characters of the file[50]bytes and beginning the ten random bytes in the middle. be sure to use seek() and where to get the correct file size. jordah Kat wrote: > On 7 Jul 2002, at 1:51, 10963508 at europeonline.com wrote: > > > > > What is the fastest way of checking if two very large files (~500 MB) > > are > > equal? > > I was thinking about this: > > -name > > -size > > -date last modified > > -pick about 10 random positions and check if bytes at those positions in > > both files match. > > I would not trust those tests at all. > > > Is there any better and faster way that I'm not aware of? > > Open file > while not eof do > Read them in, one buffer size at a time, > compare, > if not equal { tell me it's not equal, abort} > end while > > Kat > >
5. RE: Check if files equal
- Posted by rforno at tutopia.com Jul 09, 2002
- 384 views
Igor: Thanks! You saved me some effort. I just had plans to develop exactly this program. There are some similar in the web, but they are not good enough or else they are not free, at least the ones I know of. Surely, being programmed in C or some other compiled language, they should be faster. Does someone know who programmed it? There is no information about it in the program itself. Regards. ----- Original Message ----- From: Igor Kachan <kinz at peterlink.ru> Subject: Re: Check if files equal Hi Tone, ---------- > Îò: 10963508 at europeonline.com > Êîìó: EUforum <EUforum at topica.com> > Òåìà: Check if files equal > Äàòà: 7 èþëÿ 2002 ã. 3:51 > > What is the fastest way of checking if two very large files (~500 MB) are > equal? > I was thinking about this: > -name > -size > -date last modified > -pick about 10 random positions and check if bytes at those positions in > both files match. > > Is there any better and faster way that I'm not aware of? > > Tone ©koda > There is a good program by RDS: http://www.RapidEuphoria.com/dupfile.zip Regards, Igor Kachan kinz at peterlink.ru
6. RE: Check if files equal
- Posted by rforno at tutopia.com Jul 12, 2002
- 385 views
This is a multi-part message in MIME format. ------=_NextPart_000_000B_01C22917.1C06F5C0 charset="ISO-8859-2" Rob: Actually, the program finds equal files on as many directories you want, not only two. Since you programmed it, you should know better... I found a problem with this good program: It has an elusive bug. When trying it, I discovered that I had somwhere four equal files, but the program found two groups of two equal files. After scratching my head for a while, I traced the bug to this circumstance: assume you have several files with the same length, and there are among them a group of say 3 equal files and another group of say 4 equal files. When processing the first group found, the equal files length are set to -1. Then, when processing the second group, the 'while' can stop due to unequal length *before* some other member of the second group is reached. Attached you'll find a correction to this bug. Regards. ----- Original Message ----- From: Robert Craig <rds at RapidEuphoria.com> To: EUforum <EUforum at topica.com> Sent: Wednesday, July 10, 2002 2:46 AM Subject: Re: Check if files equal > > Igor Kachan writes: > > There is a good program by RDS: > > > > http://www.RapidEuphoria.com/dupfile.zip > > > Ricardo Forno writes: > > Does someone know who programmed it? There is no information > > about it in the program itself. > > Since you seem to like it, I'll step forward and take the credit. > > dupfile.exw finds all sets of identical files within a directory (and > subdirectories), or between two directories and their subdirectories. > Whenever I use it, it seems to run very fast. > I don't think writing it in C will help much, > and of course you can always use the E to C Translator on it. > > Regards, > Rob Craig > Rapid Deployment Software > http://www.RapidEuphoria.com > > > > ------=_NextPart_000_000B_01C22917.1C06F5C0 Content-Type: application/x-zip-compressed; name="dupfile.ZIP"