Solved Help with regex and grep

Solved issue

CaffeineAddict

Well-Known Member
Joined
Jan 21, 2024
Messages
3,976
Reaction score
4,173
Credits
32,526
I'm working on a collection of commands to run against wordlists for optimization and have 3 questions regarding the command below.

The command (including vague description) is:

Bash:
# Get rid of passwords containing non-ascii or non-visible characters (except for the space)
grep --extended-regexp '^[[:print:]]*$' "input.txt" > "output.txt"

According to description it should remove all lines which contain non-printable character.

But it seems this is not correct, my interpretation of the command is that it will remove lines whose beginning starts with non-printable character, if a line ends with non-printable or if non-printable is in the middle of a line it will be preserved and slip out into "output.txt", is that correct?

If that is correct, how would I modify this command so that any lines which contain non-printable be removed, regardless of whether non-printable is in the beginning, end or in the middle of a line.

Lastly, I don't understand how grep works in this case, by some logic, reading the command from left to right, it appears that only matches are outputed into "output.txt", therefore since matches are lines with non-printable character it follows that only such lines should be outputed? but the truth is the reverse, which is, matches are NOT outputed to "output.txt" but discarded.

Why is that? which grep docs or resource state that matches are discarded rather than preserved?
I'm expecting matches (lines with non-printable) to be put into "output.txt" and the reset discarded, rather than vice versa.
 


I'm working on a collection of commands to run against wordlists for optimization and have 3 questions regarding the command below.

The command (including vague description) is:

Bash:
# Get rid of passwords containing non-ascii or non-visible characters (except for the space)
grep --extended-regexp '^[[:print:]]*$' "input.txt" > "output.txt"

According to description it should remove all lines which contain non-printable character.

But it seems this is not correct, my interpretation of the command is that it will remove lines whose beginning starts with non-printable character, if a line ends with non-printable or if non-printable is in the middle of a line it will be preserved and slip out into "output.txt", is that correct?

If that is correct, how would I modify this command so that any lines which contain non-printable be removed, regardless of whether non-printable is in the beginning, end or in the middle of a line.

Lastly, I don't understand how grep works in this case, by some logic, reading the command from left to right, it appears that only matches are outputed into "output.txt", therefore since matches are lines with non-printable character it follows that only such lines should be outputed? but the truth is the reverse, which is, matches are NOT outputed to "output.txt" but discarded.

Why is that? which grep docs or resource state that matches are discarded rather than preserved?
I'm expecting matches (lines with non-printable) to be put into "output.txt" and the reset discarded, rather than vice versa.
The following is a test for the grep command in post #1.

Create a file with ascii characters and non-ascii characters:

Using vim, first write a file with lines that only contain ascii characters thus:
the
089867
)*%#llo23

Then write lines with non-ascii characters, which in this case are terminal codes created by pressing:
ctrl+v and <Insert> ctrl+v and <Home> ctrl+v and <PageUp>

Then write a second line of the same key presses but with a space in between each non-ascii terminal code, thus:

ctrl+v and <Insert> <space> ctrl+v and <Home> <space> ctrl+v and <PageUp>

Then write a third line which includes both the ascii and non-ascii terminal codes but begins with an ascii character, in this case the letter j:

j ctrl+v and <Insert> ctrl+v and <Home> ctrl+v and <PageUp>

The file contents look like the following using vim and cat:

Code:
$ vim file1
the
089867
)*%#llo23
^[[2~^[OH^[[5~
^[[2~ ^[OH ^[[5~
j^[[2~^[OH^[[5~


$ cat -A file1
the$
089867$
)*%#llo23$
^[[2~^[OH^[[5~$
^[[2~ ^[OH ^[[5~$
j^[[2~^[OH^[[5~$

Using the grep command shows the following result:
Code:
$ grep -E '^[[:print:]]*$' file1
the
089867
)*%#llo23

The command shows that only the lines with ascii characters are output. Effectively, it appears that non-printable characters, in this case lines with terminal codes, are not printed using the grep command.
 
Last edited:


Follow Linux.org

Staff online

Members online


Top