Binary compare of very large files with progress update...

XunilXunil

New Member
Joined
Jul 1, 2021
Messages
8
Reaction score
3
Credits
72
Hi,

I'm looking for a way to binary compare two large 3TB disk image files in the terminal, preferably as quickly as possible and with some kind of feedback with regard to progress.

I have done some searching online with this regard but everything I have tried has not worked for one reason or another.

I tried using cmp -l in conjunction with pv. The comparison process seemed to end prematurely when the disks involved went to sleep after about 7 hours... which was not encouraging. The terminal thought it was still executing this operation, however. I also couldn't get pv to give me anything useful other than the amount of time passed (which I could have got from a clock... lol) and a progress bar that didn't move. I'm quite new to linux so perhaps I am doing the whole pv syntax wrong... not sure. I used pv -p -e -t | before the cmp command. I'm going to go ahead an guess that's wrong

I have tried wxHexEditor... this also crashed before finishing.

I really need to do this. Any help would be greatly appreciated.

Thankyou in advance. :)
 


I have no experience with what you are trying to do but I would think that when comparing binary files that the system had to load both files into memory. That would mean you would probably need 6TB(2x3TB) of memory in your system, so I would expect your system to eventually run out of memory when it has less memory than the total needed. I could be wrong about it but it actually doesn't sound illogical, someone with programming experience may come a long and have a better answer than me.
 
No, it isn't necessary to load the whole files into memory. Files can be loaded into memory byte by byte or in larger chucks and those bytes/chunks compared. Then the next chuck is loaded and when a difference is found it is reported or recorded for later reporting. Programmer here... ;-) ...just a newb at linux ;-). But thanks for taking the time to help. :)
 
Now that you explain it that way it does makes sense. I don't have any programming experience so was just trying to jump in and share some thoughts, it's the best way to help each others chain of thoughts and come up with something creative and new understanding. cmp is the way to compare binary files on Linux but I have never heard of pv, what is pv? I did a search on comparing binary files and I did come across this on superuser, maybe that will help?
 
Ok. Thankyou. :) I shall have a look at it. Pv is a lunix terminal command (supposedly) for adding feedback (such as estimated time to completion/percent done etc) on tasks initiated in the terminal which don't have any such feedback. You can use it with pipes to combine it with commands. Apparently I'm doing it wrong though... I don't think it comes installed as standard with most linux distros. It probably depends on your distro. Mine didn't have it. I had to install.
 
Yeah I know what pipes and such are but I had never heard of that command before, looks like it's this command.
Code:
pv-1.6.6-7.el8.x86_64 : A tool for monitoring the progress of data through a pipeline
Repo        : epel
Matched from:
Filename    : /usr/bin/pv
Filename    : /usr/share/doc/pv
Filename    : /usr/share/licenses/pv
The flags you used are correct because they are shown in the help and man page.
Code:
pv --help
man pv
I think you probably did it like this?
Code:
pv -pet  | cmp -l img1 img2
 
I just ran your syntax. It initates a binary compare (which I imagine will crash after around 7 hours), same as my syntax but gives no feedback through pv at all. My syntax didn't work but I was able to get a report on time elapsed and a static, unmoving progress bar...
 
Which one did you try, I removed one because it wasn't correct because it was command substitution which wouldn't give the result you want from cmp. I found this ugly hack somewhere else you could setup a a cronjob as root privileges on your hard disk to prevent it from going to sleep. So something like this, than replace sda with your hard disk(s) which you can find by running lsblk in the terminal.
Code:
*/5 * * * * /bin/touch /dev/sda &>/dev/null
 
Last edited:
I tried the one you removed. The one you have replaced it with is the exact syntax I used without success. What is a cronjob?
 
Last edited:
I tried the one you removed. The one you have replaced it with is the exact syntax I used without success. What is a cronjob?
Yes that one wouldn't work because it was command substitution and didn't get the result I thought it would, so a pipeline like you had work like it should.
Are the images/files you are trying to compare on the same disk?
 
Last edited:
Ok. I have managed to find the correct syntax to make pv work and give an ETA and other useful information. I do not know yet if the compare will crash but at least I have feedback about how far through the job is. I hope this helps others. The correct syntax is:

Code:
pv firstfile | cmp -l secondfile
 
I don't see how that would make a difference because it's just the same command without the options. See what happens, if it causes you disk to go to sleep still, I would try setting up a cronjob a touch touches your disk every 5 minutes.
1. Edit the root crontab: sudo crontab -e
2. Configure the job: */5 * * * * /bin/touch /dev/sda &>/dev/null
3. Save
Replace sda with the disk on your system which you can find by using lsblk. Not sure which text editor will open on your system for editing the crontab. After the compare completes or doesn't complete successfully you can remove the cronjob.
 
Ok thanks. If the disk goes to sleep I will try that. :) Thanks for the suggestion. :) Syntax is different. The difference is that the first file needed to be on the other side of the pipe. pv now gives feedback. :)
 
Yes I see that now, it's still early in the morning where I am it. Still waking up here ;)
 
No worries. :) Just thought I'd give a brief follow up to confirm that the above syntax did indeed fully work and the task completed without any drives going to sleep this time. No errors reported as my files are the same. I also did some testing with deliberately differing files to confirm that an appropriate output message is indeed given if differences are found and the syntax passed those tests so, all good. I hope this is helpful to others. Thanks for your input, f33dm3bits. :)
 
Glad you were able to successfully compare your files and glad to have helped you out getting there! :)
 

Members online


Top