grepping files

Diputs

Active Member
Joined
Jul 28, 2021
Messages
194
Reaction score
78
Credits
1,471
I was wondering how to perform the below query using Bash code,
but I don't need all code written out exactly, just get the idea how to :

The question is: I'm looking for any file, or files, that contain multiple words.
I don't need the name of files of which have a line that contains all of these words, like this:

grep word1 file.txt | grep word2 | grep word3

I need the name of the file or files that have ALL of the words, but ANYWHERE INSIDE the file.
( So, not in the same line necessarily )

I would think GREP would be needed, but I'm not sure how to use it then.
 


You could try...

grep -Rnw '/path/to/somewhere/' -e 'word1' -e 'word2' -e 'word3'

You don't need the filename if you're looking at all the files.
 
You could probably write that in awk.
 
Here's a preliminary attempt to output the names of the files that have all the searched words ... @Diputs objective AIUI:

Here are some files to work on and their contents. The number in the filename corresponds to the number of words in the file:
Code:
[tom@min ~]$ ls
file3  file4  file6  file7

[tom@min ~]$ cat file3
red
green
blue

[tom@min ~]$ cat file4
red
green
blue
yellow

[tom@min ~]$ cat file6
red
green
blue
yellow
black
white

[tom@min ~]$ cat file7
red
green
blue
yellow
black
white
brown

Running the @dos2unix command:

Code:
[tom@min ~]$ grep -Rnw * -e 'red' -e 'green' -e 'black'
file3:1:red
file3:2:green
file4:1:red
file4:2:green
file6:1:red
file6:2:green
file6:5:black
file7:1:red
file7:2:green
file7:5:black

Extracting the filenames that contain the search items:

Code:
[tom@min ~]$ grep -Rnw * -e 'red' -e 'green' -e 'black' | awk -F: '{print $1}'
file3
file3
file4
file4
file6
file6
file6
file7
file7
file7

Omitting repeated filenames, with the output prepended by the original number of the same file name:

Code:
[tom@min ~]$ grep -Rnw * -e 'red' -e 'green' -e 'black' | awk -F: '{print $1}' | uniq -c
      2 file3
      2 file4
      3 file6
      3 file7

The output shows 2 files had the 3 searched terms. There's more that could be done. How would this scale to a large number of files?
 
Last edited:
You could try...

grep -Rnw '/path/to/somewhere/' -e 'word1' -e 'word2' -e 'word3'

You don't need the filename if you're looking at all the files.

That's OK if you want to count but I'm only interested if all of the words appear in a file, or not. If not all words are present in a file, it's no good.
 
Here's a preliminary attempt to output the names of the files that have all the searched words ... @Diputs objective AIUI:

Here are some files to work on and their contents. The number in the filename corresponds to the number of words in the file:
Code:
[tom@min ~]$ ls
file3  file4  file6  file7

[tom@min ~]$ cat file3
red
green
blue

[tom@min ~]$ cat file4
red
green
blue
yellow

[tom@min ~]$ cat file6
red
green
blue
yellow
black
white

[tom@min ~]$ cat file7
red
green
blue
yellow
black
white
brown

Running the @dos2unix command:

Code:
[tom@min ~]$ grep -Rnw * -e 'red' -e 'green' -e 'black'
file3:1:red
file3:2:green
file4:1:red
file4:2:green
file6:1:red
file6:2:green
file6:5:black
file7:1:red
file7:2:green
file7:5:black

Extracting the filenames that contain the search items:

Code:
[tom@min ~]$ grep -Rnw * -e 'red' -e 'green' -e 'black' | awk -F: '{print $1}'
file3
file3
file4
file4
file6
file6
file6
file7
file7
file7

Omitting repeated filenames, with the output prepended by the original number of the same file name:

Code:
[tom@min ~]$ grep -Rnw * -e 'red' -e 'green' -e 'black' | awk -F: '{print $1}' | uniq -c
      2 file3
      2 file4
      3 file6
      3 file7

The output shows 2 files had the 3 searched terms. There's more that could be done. How would this scale to a large number of files?

I can see this working but scalability is not the best - Still, I like this solution.
 


Top