line comparison

Diputs

Active Member
Joined
Jul 28, 2021
Messages
250
Reaction score
109
Credits
1,924
Once more, a real life example of a command line issue being faced with. The issue is about line comparison.
I'll explain;

I'm comparing two text files with DIFF.
There's some differences, OK, but there's also one line being displayed which is different, but it is a VERY LONG line. Lets say 400 characters.
I look at these two lines ... and yes, you guessed it: they appear to be the same.
Isn't that cute ?

So, my question is : isn't there a tool for comparison of lines, and to tell me : hey, character 76 on the line is different !
Instead of DIFF telling me : there is a difference in the line. Somewhere. I promise.
 


You could use something like diffuse, or meld, which are GUI programs for diffing files. They should allow you to see differences more clearly in files.

Or in the terminal:
If you have Vim installed, you could use vim’s diff functionality.
E.g.
Bash:
vim -d /path/to/file1 /path/to/file2
I can’t remember how granular Vims diff is. I can’t remember if it just highlights entire lines that are different, or whether it highlights only the differences on the lines.

I haven’t done much diffing on Linux for a while. Haven’t got my laptop handy this weekend either.

But from what I recall. Meld and diffuse are pretty good!
 
Last edited:
You could upload your file to GitHub and and use their diff in web browser, it shows every character changed in addition to lines or blocks of lines.

You can also do it offline with git diff command or in VSCode which uses git behind the scene.
 
Once more, a real life example of a command line issue being faced with. The issue is about line comparison.
I'll explain;

I'm comparing two text files with DIFF.
There's some differences, OK, but there's also one line being displayed which is different, but it is a VERY LONG line. Lets say 400 characters.
I look at these two lines ... and yes, you guessed it: they appear to be the same.
Isn't that cute ?

So, my question is : isn't there a tool for comparison of lines, and to tell me : hey, character 76 on the line is different !
Instead of DIFF telling me : there is a difference in the line. Somewhere. I promise.
In the following, the git command will show the single different character between the two files, file1 and file2. Note that the first git command uses color which is not reproduced here. Different colors are output for the different characters "d" and "h", If the command is run in a terminal without color and also without the option "--word-diff=color", the different characters are still shown clearly. It's not necessary to rely on color as the second git command below shows. The different characters are shown with a minus and plus sign making it clear that file1 had the "d" and file2 had the "h".

Code:
[tom@min ~]$ cat file1
the quick brown fox jumps over the lazy dog

[tom@min ~]$ cat file2
the quick brown fox jumps over the lazy hog

[tom@min ~]$ git diff --word-diff=color --word-diff-regex=. file1 file2
diff --git a/file1 b/file2
index af9f93a..3034bdb 100644
--- a/file1
+++ b/file2
@@ -1 +1 @@
the quick brown fox jumps over the lazy dhog

[tom@min ~]$ git diff --word-diff-regex=. file1 file2
diff --git a/file1 b/file2
index af9f93a..3034bdb 100644
--- a/file1
+++ b/file2
@@ -1 +1 @@
the quick brown fox jumps over the lazy [-d-]{+h+}og
 
Last edited:
You could use something like diffuse, or meld, which are GUI programs for diffing files. They should allow you to see differences more clearly in files.

Or in the terminal:
If you have Vim installed, you could use vim’s diff functionality.
E.g.
Bash:
vim -d /path/to/file1 /path/to/file2
I can’t remember how granular Vims diff is. I can’t remember if it just highlights entire lines that are different, or whether it highlights only the differences on the lines.

I haven’t done much diffing on Linux for a while. Haven’t got my laptop handy this weekend either.

But from what I recall. Meld and diffuse are pretty good!

meld and diffuse aren't installed on "my" systems,
but "vim" is - and that works perfectly. It's marking the differences in white (as opposed to gray), which seems clear enough. (it's not using ANSI colors, which is good cause I don't use these in Putty).

But then I found that the same also works with VI

vi -d file1 file2

and that's even better because I'm a VI user
 
We can't install anything that is not available from the vendor itself,
I'm a corporate Linux user
 
We can't install anything that is not available from the vendor itself,
I'm a corporate Linux user
Hello @Diputs.
In the following are two methods to address the query in post #1:
isn't there a tool for comparison of lines, and to tell me : hey, character 76 on the line is different !
Since you mentioned in post #6 that the git tool was not available to you, I believe the following achieves the result you are seeking using tools usually provided in default installations of linux, in this case, the coreutils and diffutils packages. Hopefully that will alleviate the issue of absent tools.

In this first approach, the two lines being compared are copied to a file each in /tmp where they reside having been subject to the command: "fold -1" which changes them by placing each character on a separate line in the system's /tmp directory (but it could be anywhere else that was available). These files in /tmp are equivalent in vertical line number to the character number in the original horizontal file. Then the files are compared with the diff command which outputs the results and then the files are removed from /tmp:
Code:
[tom@min ~]$ cat file1
the quick brown fox jumps over the lazy dog

[tom@min ~]$ cat file3
the quick brown fox jumps over the hazy hog
[tom@min ~]$

[tom@min ~]$ cat file1 | fold -1 > /tmp/file1f; cat file3 | fold -1 > /tmp/file3f ; diff /tmp/file1f /tmp/file3f ;rm /tmp/file*f

36c36
< l
---
> h
41c41
< d
---
> h

The results show that files differ at the 36th character, where the letter l has been replaced by the letter h, and at the 41st character where the letter d has been replaced by the letter h. The results can be verified by inspecting the original files.

The second approach uses the command: cmp which ouputs the differences at the byte number, which is equivalent to the character number in the first method above since each character takes up a single byte. The same files are used here as in the above example.

In the first run of the comparison with the command: cmp, the first difference between the files is identified, and the altered character (a letter in this case) is identified with its octal number 154 for l, and it's replacement identified similarly with the octal 150 for h. The octal numbers can be checked in the ascii manpage:
Code:
[tom@min ~]$ cmp -b file1 file3
file1 file3 differ: byte 36, line 1 is 154 l 150 h

The cmp command, by default, stops after the first difference is detected, so if there are more differences, one needs to reapply the command, skipping the part of the line that has already been observed by the command. Therefore, in the following, the first 36 bytes are skipped in both lines, so that the next section of the lines can be compared to identify any more differences:
Code:
[tom@min ~]$ cmp -b -i 36:36 file1 file3
file1 file3 differ: byte 5, line 1 is 144 d 150 h

In this case the cmp command has correctly identified the second difference in the lines it is looking at, but note that it has started counting from 1 at the skip point of 36, so it detected the next difference at byte 5, that is, five places further along from the 36th byte. Since 36 and 5 add up up to 41, the next difference identified is actually at the 41st byte, which is the same result as the first example shown above, bearing in mind that one byte is equivalent to one character in this case.

The above examples can be scripted or put into functions to make them more economical for use rather that be used in the long form shown here for purposes of clear exposition ... hopefully :) YMMV
 
Last edited:
From the above I gathered 6 solution:

1. VI or VIM with -d
2. a solution involving " diffuse "
3. a solution involving " meld "
4. a solution involving " git "
5. one based on cat, fold and most importantly : diff
6. a solution involving " cmp -b"

solutions 3, 4 and 5 don't work on my machines as these tools don't seem "native". I know that may be a too strict definition, but that is what it boils down to : some tools exist on nearly all machines, some don't. Extended tools are nice, but the first question is: can you do it with standard tools ? Answer is yes.

So solutions 1 , 5 and 6 seem to do the job just fine, but on very different way.

My favorite is the one with CMP:

cmp -b file1 file3

Didn't know CMP could do this. This one just says which character is the first difference. That will do in most cases.

Next best is the solution with either VI or VIM. I didn't know this function even existed in those tools, but I guess that's why we are here.

vi -d file1 file2
vim -d file1 file2

It nicely splits the screens and shows the differences in color.

The last one works fine as well, but the command is pretty ... long, and in the end it's basically the "normal" output of diff. Actually - but that may be part of another thread - the behavior of DIFF led me to create this very thread. The "diff" tool is extremely nice, but there are specific scenarios where some odd feedback is given.

cat file1 | fold -1 > /tmp/file1f; cat file3 | fold -1 > /tmp/file3f ; diff /tmp/file1f /tmp/file3f ;rm /tmp/file*f

Because the command too long to remember - for me at least - it's the 3rd best solution only ... but it works all right.
 
Last edited:

Staff online


Latest posts

Top