1. Check if files equal
- Posted by 10963508 at europeonline.com Jul 06, 2002
- 453 views
What is the fastest way of checking if two very large files (~500 MB) are equal? I was thinking about this: -name -size -date last modified -pick about 10 random positions and check if bytes at those positions in both files match. Is there any better and faster way that I'm not aware of? Tone ©koda
2. Re: Check if files equal
- Posted by Kat <gertie at PELL.NET> Jul 06, 2002
- 409 views
On 7 Jul 2002, at 1:51, 10963508 at europeonline.com wrote: > > What is the fastest way of checking if two very large files (~500 MB) are > equal? > I was thinking about this: > -name > -size > -date last modified > -pick about 10 random positions and check if bytes at those positions in > both files match. I would not trust those tests at all. > Is there any better and faster way that I'm not aware of? Open file while not eof do Read them in, one buffer size at a time, compare, if not equal { tell me it's not equal, abort} end while Kat
3. Re: Check if files equal
- Posted by r.schr at t-online.de Jul 07, 2002
- 427 views
10963508 at europeonline.com wrote: > > What is the fastest way of checking if two very large files (~500 MB) are > equal? > I was thinking about this: > -name > -size > -date last modified > -pick about 10 random positions and check if bytes at those positions in > both files match. > > Is there any better and faster way that I'm not aware of? > T think the fastest way to ensure the exactly equal contents is to use the Window command: "FC /B filename_1 filename_2"; the /B means binary comparison. "FC /?" will give you all possible parameters. Have a nice day, Rolf
4. Re: Check if files equal
- Posted by Juergen Luethje <jluethje at gmx.de> Jul 07, 2002
- 419 views
Hi, Kat wrote: > On 7 Jul 2002, at 1:51, 10963508 at europeonline.com wrote: ^^^^^^^^ It would be nice, to see a name here. (Just my opinion.) >> What is the fastest way of checking if two very large files (~500 MB) are >> equal? I assume you mean equal content, not equal name, equal date, .. >> I was thinking about this: >> -name Name doesn't matter concerning the content. >> -size >> -date last modified Date doesn't matter concerning the content. >> -pick about 10 random positions and check if bytes at those positions in >> both files match. > I would not trust those tests at all. First, I would compare the size of the files, this is very fast. Whether this comparison can be trusted or not, depends on it's result! If both files don't have the same size, it's 100% sure that they are not equal. If they have the same size, further testing is needed. The same logic goes for CRC tests and the comparison of random bytes. >> Is there any better and faster way that I'm not aware of? I think it would be the best, first to make some _fast_ tests, that will find unequal files in some probability. (I don't know how fast CRC testing is.) Then, if these tests didn't prove that the files are unequal, more precise tests must follow. Of course, the most precise test is this: > Open file > while not eof do > Read them in, one buffer size at a time, > compare, > if not equal { tell me it's not equal, abort} > end while > Kat Best regards, Juergen
5. Re: Check if files equal
- Posted by Kat <gertie at PELL.NET> Jul 07, 2002
- 429 views
On 7 Jul 2002, at 11:21, Juergen Luethje wrote: > > Hi, Kat wrote: > > > On 7 Jul 2002, at 1:51, 10963508 at europeonline.com wrote: > ^^^^^^^^ > It would be nice, to see a name here. (Just my opinion.) This isn't my problem, you know. > The same logic goes for CRC tests and the comparison of random bytes. To get a CRC, you need to perform math on all the bytes. How will you get that math done, if you don't read all the bytes? Kat
6. Re: Check if files equal
- Posted by Juergen Luethje <jluethje at gmx.de> Jul 07, 2002
- 427 views
Kat wrote: > On 7 Jul 2002, at 11:21, Juergen Luethje wrote: >> Hi, Kat wrote: >> >> > On 7 Jul 2002, at 1:51, 10963508 at europeonline.com wrote: >> ^^^^^^^^ >> It would be nice, to see a name here. (Just my opinion.) > This isn't my problem, you know. I know that, of course. Sorry if it looked as if I was meaning you! I wrote it there because I assume, that the one who starts a thread, reads all the posts in it. When you want to discuss about a text, please don't snip the decisive part away! Here it is again, what I wrote in my previous post: --------------------------------------------------------------------- If both files don't have the same size, it's 100% sure that they are not equal. If they have the same size, further testing is needed. --------------------------------------------------------------------- >> The same logic goes for CRC tests and the comparison of random bytes. > To get a CRC, you need to perform math on all the bytes. I know. > How will you get that math done, if you don't read all the bytes? I didn't write that. In the above text between the two lines, just replace "size" with "CRC", and then you'll see what I mean. > Kat Regards, Juergen
7. Re: Check if files equal
- Posted by Kat <gertie at PELL.NET> Jul 07, 2002
- 424 views
On 7 Jul 2002, at 21:08, Juergen Luethje wrote: > > Kat wrote: > > > On 7 Jul 2002, at 11:21, Juergen Luethje wrote: > > >> Hi, Kat wrote: > >> > >> > On 7 Jul 2002, at 1:51, 10963508 at europeonline.com wrote: > >> ^^^^^^^^ > >> It would be nice, to see a name here. (Just my opinion.) > > > This isn't my problem, you know. > > I know that, of course. Sorry if it looked as if I was meaning you! > I wrote it there because I assume, that the one who starts a thread, > reads all the posts in it. > > > When you want to discuss about a text, please don't snip the decisive > part away! I replied to what i wished to reply to. > Here it is again, what I wrote in my previous post: > --------------------------------------------------------------------- > If both files don't have the same size, it's 100% sure that they are > not equal. If they have the same size, further testing is needed. > --------------------------------------------------------------------- > > >> The same logic goes for CRC tests and the comparison of random bytes. > > > To get a CRC, you need to perform math on all the bytes. > > I know. > > > How will you get that math done, if you don't read all the bytes? > > I didn't write that. In the above text between the two lines, just > replace "size" with "CRC", and then you'll see what I mean. Ok, i don't know who wrote what now, and i deleted all the previous emails on this thread so i can't cheack, and i don't feel like wasteing any more time with it, but it appeared to me as though someone was saying a CRC check was an alternative to reading in the files. Obviously, if you read in the files, you can abort with a "not equal" error long before you perform the CRC calculations, which you'd do at the end of the files. Kat
8. Re: Check if files equal
- Posted by 10963508 at europeonline.com Jul 07, 2002
- 421 views
Files are not databases, they are .zip .avi and .mp3 files mainly (stuff coming down from satellite) - so they are compressed in some way. Speed is more important than accuracy. Tone Skoda ----- Original Message ----- From: "Derek Parnell" <Derek.Parnell at SYD.RABOBANK.COM> To: "EUforum" <EUforum at topica.com> Subject: RE: Check if files equal > > Hi all, > in the original question put to the list by Tone, he mentioned 500MB+ files. > I'm guessing that these are databases, given the size. If so, checking the > bytes from the front of the files is probably a good idea because a lot of > database systems keep pointers and stamps near the front of the database > file. Thus, even if the file sizes haven't been changed, a short scan will > probably find a changed stamp or pointer. If you get about 25% through the > files and haven't found a mismatch, you might take a risk that they are the > same - or you might like to check the last 25% too. Just a thought. > > --------- > Derek. > > ================================================================== > > > ================================================================== > > > >
9. Re: Check if files equal
- Posted by kbochert at ix.netcom.com Jul 07, 2002
- 440 views
-------Phoenix-Boundary-07081998- You wrote on 7/7/02 5:53:05 PM: > >Files are not databases, they are .zip .avi and .mp3 files mainly (stuff >coming down from satellite) - so they are compressed in some way. >Speed is more important than accuracy. > >Tone Skoda > Some thoughts: 1) Compare long words rather than words or bytes. 2) Reduce disk latency. If possible, read one file entirely into memory before starting. If not possible, fill most of memory with one file, then read comparitively small chunks of the other. 3) It may be useful to use non-blocking calls to the read routine so you can compare one buffer while reading the next. More importantly, this may prevent the disk from having to do a full rotation between reads. 4) Have the two files on different disks! 5) Have the two files on four disks! 6) You could use assembly for the comparison routines, and optimize for the processors' multiple execution units, but that is likely to be swamped by disk traffic. 7) Inside knowledge of the format might allow comparison of just a CRCC. Karl Bochert -------Phoenix-Boundary-07081998---
10. Re: Check if files equal
- Posted by Igor Kachan <kinz at peterlink.ru> Jul 08, 2002
- 434 views
Hi Tone, ---------- > Îò: 10963508 at europeonline.com > Êîìó: EUforum <EUforum at topica.com> > Òåìà: Check if files equal > Äàòà: 7 èþëÿ 2002 ã. 3:51 > > What is the fastest way of checking if two very large files (~500 MB) are > equal? > I was thinking about this: > -name > -size > -date last modified > -pick about 10 random positions and check if bytes at those positions in > both files match. > > Is there any better and faster way that I'm not aware of? > > Tone ©koda > There is a good program by RDS: http://www.RapidEuphoria.com/dupfile.zip Regards, Igor Kachan kinz at peterlink.ru
11. Re: Check if files equal
- Posted by Robert Craig <rds at RapidEuphoria.com> Jul 09, 2002
- 419 views
Igor Kachan writes: > There is a good program by RDS: > > http://www.RapidEuphoria.com/dupfile.zip > Ricardo Forno writes: > Does someone know who programmed it? There is no information > about it in the program itself. Since you seem to like it, I'll step forward and take the credit. dupfile.exw finds all sets of identical files within a directory (and subdirectories), or between two directories and their subdirectories. Whenever I use it, it seems to run very fast. I don't think writing it in C will help much, and of course you can always use the E to C Translator on it. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
12. Re: Check if files equal
- Posted by petelomax at blueyonder.co.uk Jul 12, 2002
- 464 views
{{{ On Mon, 8 Jul 2002 11:37:33 +0800, Derek Parnell <Derek.Parnell at SYD.RABOBANK.COM> wrote:
<snip>
The trick will be to create a checksum that is truely representative of the
whole file. You'll probably need a 32-bit checksum for each 2^32 bits
(536,870,912 bytes).
whole file. You'll probably need a 32-bit checksum for each 2^32 bits
(536,870,912 bytes).
Just my $0.02: I'd use md5.
There's a dos/windows version here: http://www.fourmilab.ch/md5/ and linux, DLL, & more here: http://userpages.umbc.edu/mabzug1/cs/md5/md5.html
13. Re: Check if files equal
- Posted by Robert Craig <rds at RapidEuphoria.com> Jul 12, 2002
- 422 views
Ricardo Forno writes: > Actually, the program finds equal files on as > many directories you want, not only two. > Since you programmed it, you should know better... Yes, thanks. I forgot that I generalized it. > Then, when processing the second group, the 'while' > can stop due to unequal length *before* some other > member of the second group is reached. Attached > you'll find a correction to this bug. Thanks. Your fix looks essentially correct. I'll study it / test it a bit more, then upload the corrected version. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com