Seeking for command lines

satimis

Member
Credits
557
Hi all,

Please advise the command lines to find files on server database;

1)
Files contain word1, word2, word3 etc and to print them on /tmp/

2)
Files contain URL having word1, word2, word3 etc and to print them on /tmp/

Thanks in advance.

Regards
 


JasKinasis

Well-Known Member
Credits
7,583
Depends on exactly what you mean.
Do you want it to search for files that contain ALL of those words?
E.g. Files containing word1 AND word2 AND word3
Or do you want to find files with ANY of those words?
e.g. files containing word1, OR word2, OR word3
 

satimis

Member
Credits
557
Depends on exactly what you mean.
Do you want it to search for files that contain ALL of those words?
E.g. Files containing word1 AND word2 AND word3
Or do you want to find files with ANY of those words?
e.g. files containing word1, OR word2, OR word3
Hi,

Yes.

Files containing word1 AND word2 AND word3 Thanks

Regards
 
Last edited:

satimis

Member
Credits
557
Hi,

Thanks for your link.

I have no problem finding files by running following command on Terminal;
# find ./ -type f -name "abc.txt"

But I'm looking for a command line finding files with following content;
word-1 and word-2 and word-3 etc.

Regards
 

stan

Well-Known Member
Credits
7,349
This might work, at least for a single word:


You can try separating your multiple word pattern with a "pipe" character ( | ) as shown here:

 
Last edited:

Lord Boltar

Well-Known Member
Credits
5,619
Hi,

Thanks for your link.

I have no problem finding files by running following command on Terminal;
# find ./ -type f -name "abc.txt"

But I'm looking for a command line finding files with following content;
word-1 and word-2 and word-3 etc.

Regards
If you are looking for something to search the actual document to find words contained in/on the document and not the extension then you probably better off using locate or grep

Code:
locate {part_of_word}
This assumes your locate-database is up to date but you can update this manually with:
Code:
sudo updatedb
You can use grep to list the files containing words in the given directory:

Code:
grep -Ril {words} directory
Here:
* -R recursively search files in sub-directories.
* -i ignore text case
* -l show file names instead of file contents portions. (note: -L shows file names that do not contain the word).
 
Last edited:

satimis

Member
Credits
557
If you are looking for something to search the actual document to find words contained in/on the document and not the extension then you probably better off using locate or grep

Code:
locate {part_of_word}
This assumes your locate-database is up to date but you can update this manually with:
Code:
sudo updatedb
You can use grep to list the files containing words in the given directory:

Code:
grep -Ril {words} directory
Here:
* -R recursively search files in sub-directories.
* -i ignore text case
* -l show file names instead of file contents portions. (note: -L shows file names that do not contain the word).
Hi,

Thanks for your advice.

Performed following command lines on Terminal;

$ sudo updatedb
$ locate {dual boot} /path/to/database/drive/
no output

$ locate /path/to/database/drive/ {dual boot}
no output

$ grep -Ril {dual boot} /path/to/database/drive/directory/
grep: boot}: No such file or directory

Also having tried as root with the same result.

Regards
 

satimis

Member
Credits
557
This might work, at least for a single word:


You can try separating your multiple word pattern with a "pipe" character ( | ) as shown here:

Thanks for your support

Performed following command lines on Terminal:-

$ locate /path/to/database/drive/ {dual boot}
no output

$ grep -Ril {dual boot} /path/to/database/drive/directory/
grep: boot}: No such file or directory

$ grep -E 'dual|boot' <em>/path/to/database/drive/</em>
bash: syntax error near unexpected token `newline'

Regards
 

JasKinasis

Well-Known Member
Credits
7,583
OK, I explored a few options using grep and ag (silver searcher), but I couldn't find anything that would necessarily match files containing ALL words in the list.

Using grep with multiple search words/patterns, you'd end up with matches for lines in files that contained anywhere between one word and all three words. But the file itself might not contain ALL of the words. It might only have instances of one, or two words.

Another alternative would be to search for one word, pipe the results to grep to find the second word and then pipe a third time to find the third word and so on.
But again, that would only match single lines in the file that contain all three words, NOT files containing the three words.

So, what I've come up with is a one-liner using find and awk, which attempts to find files containing ALL of three supplied search terms/words.
Bash:
find /path/to/search -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /word1/ { f1++ }; /word2/ { f2++ }; /word3/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \;
Note:
In the above find command - replace /path/to/search with the search-path. And replace word1, word2 and word3 with the three words you want to search for in the three searches in the awk command.

The command uses find to find all files in the specified directory.
The file is then ran through awk.
Awk sets up some counters for each of the words (f1..f3) and initialises them all to zero. Any awk finds "word1" in the file - the counter variable f1 is incremented.
If it finds "word2" - f2 is incremented.
If it finds "word3" - f3 is incremented.
If all three counters are set (e.g. all three counters are greater than zero) - the file contains all three words. So we print the filename for the current file and move on to the next file!

So that would work for three search terms.
To add a 4th word, you would edit the above command to the following:
Bash:
find /path/to/search -type f -exec awk 'FNR == 1 { f1=f2=f3=f4=0; }; /word1/ { f1++ }; /word2/ { f2++ }; /word3/ { f3++ }; /word4/ { f4++ }; f1 && f2 && f3 && f4 { print FILENAME; nextfile; }' {} \;
The above will allow for four search terms.

So what we've done is added an extra counter variable, an extra search for the new word and a counter increment action for when that word is found. And we add an extra logical AND (&&) to check that the new counter has been set at the end.
If you wanted to add more search terms - add more counters, searches/actions and checks at the end.

I've never done anything too complicated with awk-scripting, so it may be possible to do this in a slightly better way to deal with handling an arbitrary number of search terms. But I'm not in the frame of mind to work that out right now!

But what I've posted should list all files in the search path that contain ALL of several search terms.

I hope this helps!
 
Last edited:

satimis

Member
Credits
557
Hi JasKinasis,

Lot of thanks for your advice and your time spent to help me.

Performed following tests;

1)
$ find /path/to/storage/hard-drive/ -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /Epson/ { f1++ }; /3490/ { f2++ }; /Ubuntu/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \; > /tmp/epson_3490.txt

I need to copy the output to a epson_3490.txt file
Epson, 3490 and Ubuntu 3 words are selected from a file

It hangs on Terminal for prolonged time without progress.

2)
$ find /path/to/storage/hard-drive/folder/ -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /Scanning/ { f1++ }; /GIMP/ { f2++ }; /steps/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \; > /tmp/scanning.txt

I need to copy the output to scanning.txt file
Scanning, GIMP and steps 3 words are selected from a file in a sub-folder

It also hangs on Terminal for prolonged time without progress.

My PC spec:
8-core AMD CPU
RAM - 32G
Hard dics for OS - 1TB NVMe PCIe Gen 3 SSD
Hard drive for storage - WD 4TB

Please help. Thanks

Regards
 

JasKinasis

Well-Known Member
Credits
7,583
The reason it's hanging is probably due to the sheer number of files and subdirectories that find has to deal with. And each file it finds is ran entirely through awk, to check for the three patterns.
So it's not that it's "not making progress", it is making progress -it's just slow!

If you want to free up the terminal to allow you to enter other commands, run the find command as a background task by adding an ampersand & to the end of the line.

So for your last example:
Bash:
find /path/to/storage/hard-drive/folder/ -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /Scanning/ { f1++ }; /GIMP/ { f2++ }; /steps/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \; > /tmp/scanning.txt &
Now the find command will run in the background and you'll be able to continue using the terminal whilst the job is still in progress.
After entering the command, you'll see output like this:
Code:
[1]+    1540
Note: The number is the PID of the job you just started. The number probably won't be 1540, that's just an arbitrary example.

You can check whether the job has finished using the jobs -l command. That will list any background jobs the current terminal is running with their PID number and the command.

If the background job is still running, the jobs -l command will output something like this:
Code:
[1]+    1540 Running                    find /path/to/storage/hard-drive/folder/ -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /Scanning/ { f1++ }; /GIMP/ { f2++ }; /steps/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \; > /tmp/scanning.txt &
When the background job has finished - the jobs -l command will output something like this:
Code:
[1]+ Done find /path/to/storage/hard-drive/folder/ -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /Scanning/ { f1++ }; /GIMP/ { f2++ }; /steps/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \; > /tmp/scanning.txt
Likewise, if the job is taking too long and you have to kill it, you can kill it using the kill command and the PID of the job.
So for my example, we'd use:
Bash:
kill 1540
Obviously, you'd need to replace 1540 with the PID of the job you want to kill. But if you kill the job before it completes, your output file may potentially be missing some results.

To speed the command up, you could add additional filters to the find command. For example, if you know you're only looking for certain types of files, you could put some filters on the file-name.
So if you're only interested in .php files:
Bash:
find /path/to/storage/hard-drive/folder/ -name "*.php" -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /Scanning/ { f1++ }; /GIMP/ { f2++ }; /steps/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \; > /tmp/scanning.txt &
And if you're interested in looking at several types of files - e.g. .php, .html and .js files - you could amend your command to this:
Bash:
find /path/to/storage/hard-drive/folder/ \( -name "*.php" -o -name "*.html" -o -name "*.js" \) -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /Scanning/ { f1++ }; /GIMP/ { f2++ }; /steps/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \; > /tmp/scanning.txt &
Adding filters like that will drastically reduce the number of files that get ran through awk and will speed up your job.

And if the file extensions are in mixed case, e.g. some are using .php as a file extension and others are using .PHP, then use -iname (case insensitive name) for your filename filters, instead of -name (case sensitive name).
 

satimis

Member
Credits
557
Hi JasKinasis,

I'll check your late advice later. I have following findings to report;

Finally the command lines finish their work. I don't know how long they took. I just leave the PC running unattended.

File size;
epson_3490.txt - 5.8kB
scanning.txt - 2.8kB

Files recorded;
On epson_3490.txt

vms_details_pc1a_old_20180722.txt
epson_scanner_20180116.txt
Remix_OS_for_PC_64_B2016011201_Alpha.iso
url_20080109.txt
scanning_n_scanner_20140902.txt
epson_scanner_20180116.txt
epson_perfection_3490_doc_20201227.txt
....
etc.

Files recorded;
On scanning.txt

Win10_1703_Chinese(Traditional)_x64.iso
W10.HOME.X64.en-US.Apr2016.iso
Remix_OS_for_PC_64_B2016011201_Alpha.iso
flintos_rpi_v0.3.img
and7rpi2016-08-25.img
iscan.man
....
etc.

Non of them relevant on scanning.txt

I don't know why they check .iso .img ?

Regards
 

satimis

Member
Credits
557
Hi JasKinasis,

Performed following tests on Terminal

1)
# find /path/to/storage/hard-drive/ -name "*.txt" -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /Scanning/ { f1++ }; /GIMP/ { f2++ }; /steps/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \; > /tmp/scanning.txt &
[1] 57904

# jobs -l
[1]+ 57904 Running find find /path/to/storage/hard-drive/ -name "*.txt" -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /Scanning/ { f1++ }; /GIMP/ { f2++ }; /steps/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \; > /tmp/scanning.txt &

# jobs -l
[1]+ 57904 Done find find /path/to/storage/hard-drive/ -name "*.txt" -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /Scanning/ { f1++ }; /GIMP/ { f2++ }; /steps/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \; > /tmp/scanning.txt

scanning.txt is an empty file.

2)
# find /path/to/storage/hard-drive/ -name "*.txt" -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /epson/ { f1++ }; /3490/ { f2++ }; /ubuntu/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \; > /tmp/epson_3490.txt &
[1] 84072

# jobs -l
[1]+ 84072 Running find /path/to/storage/hard-drive/ -name "*.txt" -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /epson/ { f1++ }; /3490/ { f2++ }; /ubuntu/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \; > /tmp/epson_3490.txt &

#
[1]+ Done find /path/to/storage/hard-drive/ -name "*.txt" -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /epson/ { f1++ }; /3490/ { f2++ }; /ubuntu/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \; > /tmp/epson_3490.txt

# cat /tmp/epson_3490.txt
~/PC1A_1TB_Daily_Working_Doc_20180926/epson_scanner_20180116.txt
~/Ubuntu_Desktop_7.10/scanner_20071230.txt
~/Reference_Misc/url_20080109.txt
~/Computer_and_Hardware_20121105_20200131/scanning_n_scanner_20140902.txt
......
etc.

It works here. Thanks

Would it be possible to find files with a pattern, example;
<FilesMatch '.(php|php5|suspected|py|phtml)$'>
Order allow,deny
Deny from all
</FilesMatch>


?

Please help me to learn the function of;
1)
{ print FILENAME; nextfile; }' {}


2)
f1 && f2 && f3 { print FILENAME; nextfile; }

Thanks

Regards
 

JasKinasis

Well-Known Member
Credits
7,583
1)
# find /path/to/storage/hard-drive/ -name "*.txt" -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /Scanning/ { f1++ }; /GIMP/ { f2++ }; /steps/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \; > /tmp/scanning.txt &
[1] 57904

# jobs -l
[1]+ 57904 Running find find /path/to/storage/hard-drive/ -name "*.txt" -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /Scanning/ { f1++ }; /GIMP/ { f2++ }; /steps/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \; > /tmp/scanning.txt &

# jobs -l
[1]+ 57904 Done find find /path/to/storage/hard-drive/ -name "*.txt" -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /Scanning/ { f1++ }; /GIMP/ { f2++ }; /steps/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \; > /tmp/scanning.txt

scanning.txt is an empty file.
If the text file is empty - it means that no files were found with ALL three of the search terms in them.

2)
# find /path/to/storage/hard-drive/ -name "*.txt" -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /epson/ { f1++ }; /3490/ { f2++ }; /ubuntu/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \; > /tmp/epson_3490.txt &
[1] 84072

# jobs -l
[1]+ 84072 Running find /path/to/storage/hard-drive/ -name "*.txt" -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /epson/ { f1++ }; /3490/ { f2++ }; /ubuntu/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \; > /tmp/epson_3490.txt &

#
[1]+ Done find /path/to/storage/hard-drive/ -name "*.txt" -type f -exec awk 'FNR == 1 { f1=f2=f3=0; }; /epson/ { f1++ }; /3490/ { f2++ }; /ubuntu/ { f3++ }; f1 && f2 && f3 { print FILENAME; nextfile; }' {} \; > /tmp/epson_3490.txt

# cat /tmp/epson_3490.txt
~/PC1A_1TB_Daily_Working_Doc_20180926/epson_scanner_20180116.txt
~/Ubuntu_Desktop_7.10/scanner_20071230.txt
~/Reference_Misc/url_20080109.txt
~/Computer_and_Hardware_20121105_20200131/scanning_n_scanner_20140902.txt
......
etc.

It works here. Thanks
Great news!


Would it be possible to find files with a pattern, example;
<FilesMatch '.(php|php5|suspected|py|phtml)$'>
Order allow,deny
Deny from all
</FilesMatch>
Didn't I answer this in another thread?


Please help me to learn the function of;
1)
{ print FILENAME; nextfile; }' {}


2)
f1 && f2 && f3 { print FILENAME; nextfile; }
1) { print FILENAME; nextfile; }
That prints the FILENAME for the file that awk is reading and then moves on to the next file.

2) f1 && f2 && f3 { print FILENAME; nextfile; }
The first part means "If f1 AND f2 and F3 are not zero, then print the file-name for the current file and move onto the next file.

f1, f2 and f3 are counter variables in the awk-script that gets ran by find. These are used to count the number of instances of each search term.


If I rewrite the one-liner to take up several lines, perhaps it will make a little more sense to you:
Bash:
find /path/to/search -type f -name "*.txt" -exec awk '
FNR == 1 { f1=f2=f3=0; };
/epson/ { f1++ };
/3490/ { f2++ };
/ubuntu/ { f3++ };
f1 && f2 && f3 { print FILENAME; nextfile; }
' {} \; > /tmp/epson_3490.txt &
Note: This is just for illustrative purposes.
I wouldn't advise entering the one-liner over several lines like this.
This is just to show you step by step what is going on.

So the first line contains the bulk of the find command:
find /path/to/search -type f -name "*.txt" -exec awk '
So in the above, we have a find command that is searching for .txt files. For each .txt file that is found, the -exec will run awk. The single quote is the start of the awk script.

The second line is the first part of the awk script:
FNR == 1 { f1=f2=f3=0; };
FNR is the line-number of the file we're reading.
If the line number is 1 (i.e. we're reading the first line of a file) - Then we set up three counter variables and initialise them to zero.
So each time we start reading a new file, the counters are reset to zero.

The next three lines are our search patterns and the actions to take when each pattern is found:
Code:
/epson/ { f1++ };
/3490/ { f2++ };
/ubuntu/ { f3++ };
So every time "epson" is found, the value of f1 is increased by 1.
Likewise, every time "3490" is found, the value of f2 is increased by 1.
And every time "ubuntu" is found, f3 is increased by 1.

The next line has already been explained:
f1 && f2 && f3 { print FILENAME; nextfile; }
If f1 AND f2 AND f3 are not zero, then we got at least one match in the file for each search term. So the file contains all three search terms. In which case, we print/output the filename and then move to the next file.

It's also worth noting that all of the actions - the line number check, the three searches and the check on f1, f2 and f3 are performed on each line in the file. So as soon as all three counter variables are greater than zero - awk will move to the next file. Which means that each matching file doesn't have to be read until the very end. As soon as it has been determined that the file matches all of the patterns, awk outputs the file-name and moves to the next file. After moving to the next file - when the first line is read, the counters will be re-initialised to zero.

Non-matching files will be read until the end, so it only really speeds up the reading of the matching files. But it is a small optimisation that will speed things up a tiny bit.

The final line:
' {} \; > /tmp/epson_3490.txt &
The single quote at the start of the line is the end of the awk script.
The {} \; is the remainder of find's -exec section.
The {} is a placeholder which find uses to substitute in the file-names into for the files found by find. So those filenames are passed to the awk command. They are the files that awk acts upon. The \; denotes the end of the find commands -exec section.
The rest of the line: > /tmp/epson_3490.txt & is the redirection to the output file and the final ampersand runs the job in the background.

Hopefully breaking the command down like that makes things a little easier to understand?!
 
Last edited:

JasKinasis

Well-Known Member
Credits
7,583
Would it be possible to find files with a pattern, example;
<FilesMatch '.(php|php5|suspected|py|phtml)$'>
Order allow,deny
Deny from all
</FilesMatch>
I knew I answered this already, it was in your other thread:
 

satimis

Member
Credits
557
If the text file is empty - it means that no files were found with ALL three of the search terms in them.

Great news!
-----
H JasKinasis,

Sorry returning you a late reply it was because last week having other engagement.

I'm only a beginner is Linux command line

1)
-exec awk

Can I use | pipe command instead of -exec
?

2)
FNR == 1 { f1=f2=f3=0; };

If having 4 search patterns
FNR == 1 { f1=f2=f3=f4=0; }; ?
What is 0 for ?
What is { } here for ?

3)
/epson/ { f1++ };
/3490/ { f2++ };
/ubuntu/ { f3++ };
What is ++ for ?

4)
' {} \; > /tmp/epson_3490.txt &
What is {} here for ?

5)
f1 && f2 && f3 { print FILENAME; nextfile; }
&&
for "and" ?
; why it needs here?

6)
If I need excluding .img .ISO etc in the search, what command I need to add ?

Thanks

Regards
 

satimis

Member
Credits
557
I knew I answered this already, it was in your other thread:
Hi JasKinasis,

I'll answer it on that thread later. Sorry I overlooked it.

Edit
===
Please point me where can I find documents learning Linux command and Linux script?

On Google search I found many suggestions, hesitating to know which direction shall I select

Regards
 
Last edited:
$100 Digital Ocean Credit
Get a free VM to test out Linux!

Members online


Top