Greping (or however) a huge list of IP's in one run?

None-yet

Member
Credits
906
Grepable return from Nmap. All are one line but IP is in either 11 or 12 digits. They mostly repeat the IP where port is listed and then where the status is as in the example. I need the fastest method of cleaning this by removing all but the ip and removing the duplicate listing. So basically I only need the ip only and no dups.

Here is an example:
Host: xx.x.xx.xx () Status: Up
Host: xxx.x.xx.xx () Ports: xx/filtered/tcp//ftp///
Host: xxx.x.xx.xx () Status: Up
Host: xxx.x.xx.xx () Ports: xx/filtered/tcp//ftp///
Host: xxx.xxx.xxx.xx () Ports: xx/closed/tcp//ftp///
Host: xxx.xxx.xxx.xxx () Status: Up
Host: xxx.xxx.xxx.xx () Ports: xx/filtered/tcp//ftp///
Host: xxx.xxx.xxx.xxx () Status: Up
Host: xxx.xxx.xxx.xxx () Ports: xx/filtered/tcp//ftp///

My issue is how can I do this? Will it need to be done multiple times to account for ip's with varying number lengths or can it be done in one run and if so then how?

Thanks and you (Americans) all have a great Thanksgiving!
 


khedger

Active Member
Credits
1,026
Hmmm......not sure exactly what context you're trying to do this in. There may be a way to grep/sed/awk something here, I don't know. I'd write a PERL or Python script and use a regex to suck the IP out of each line and then write a new file containing the list of IPs. I used to know the regex to suck an IP address up, but it's been over ten years and I can't remember. Anybody?
Anyway the script would look something like:
Code:
foreach record in the file containing the nmap data
regex the IP address out into a var
check your array of vars to see if the new IP exists
if not then put the new IP on the array

then
traverse your array of IPs and write them to each line of an output file
 

JasKinasis

Well-Known Member
Credits
5,132
Off the top of my head - you’re looking for the digits 0-9, repeated one to three times followed by a period . then then another one to three digits, another period, another one to three digits, another period and finally another one to three digits.

Using a regex with grep that should look something like this:
Bash:
\egrep -o "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" /path/to/file | sort -u > /path/to/listofIPs
The initial backslash escapes any aliases that might be set for egrep - we don’t want additional options being used with egrep that might pollute our output.
The -o option tells grep to only report exact matches.
Then there is the regex and the path to the file to grep through.
Egrep’s output is piped to sort, we’ve used the -u option to make it a unique sort - removing duplicates. And finally we redirect to an output file.

And if you have multiple files in a directory containing IP’s, you could add greps -R option (recursive) and then specify a directory instead of a single file.
Bash:
\egrep -Ro "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" /path/to/directory | sort -u > /path/to/listofIPs
That will search every file in the specified directory (and any sub-directories) for IP addresses and will output a sorted, unique list of IP’s.

EDIT:
I’m on my phone and not anywhere near a PC atm, so I haven’t tested it. But I’m fairly confident it’s correct!

Also, above is a quick and dirty regex, because it will accept values over 255 for the numbers.
I didn’t have time to try to work out a more robust/correct one that will only pick out valid IP addresses. However, a quick web search should yield a more accurate regex to use.
 
Last edited:

None-yet

Member
Credits
906
For multiple files in the same dir then is it possible to grep them all with a wildcard some way?
 

None-yet

Member
Credits
906
Thanks
 

None-yet

Member
Credits
906
I have been unable to get this to work thus far. Here is what I did with a wild card with 8 files. Never went through though.
Code:
[email protected]:/media/sf_Storage_1/Master# \egrep -Ro "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" /*.txt | sort -u > /list-of-Ips.txt
[email protected]:/media/sf_Storage_1/Master#
 

None-yet

Member
Credits
906
Tried this also. Inside a folder with the 8 files I was hoping to pull from.

Code:
[email protected]:/media/sf_Storage_1/Master# \egrep -o "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" /*txt | sort -u > IP's.txt
>
 

JasKinasis

Well-Known Member
Credits
5,132
Hmmmm, strange..... I was pretty sure that regex should work.
I'll fire up my laptop later and will try that regex myself to see if it works.

In the meantime - can you confirm your file paths are correct?

According to your snippets, the text files are in the root of the file-system. i.e. /
So the files you're searching are at /*.txt?
Is that correct?
Or did you mean to put ./*.txt - as in "all .txt files in the current working directory"?

The text files being in root looks suspect to me.


Otherwise, perhaps it's where I'm escaping the periods with backslashes in the regex?!
Normally a period is a special character in a regex - meaning "match any character". So by escaping it with a back-slash - it should be interpreted as a literal period character instead. Which is how we want the period characters to be interpreted.
So maybe we don't need to escape the periods??!..........

Either way - I'll fire up my laptop later and will give it a try. But those are just a few thoughts off the top of my head!
 
Last edited:

JasKinasis

Well-Known Member
Credits
5,132
OK, I've tried my original regex on my laptop and it works for me.... So perhaps your file-paths were incorrect or something?! IDK.

This regex is a little better and should only let valid IPV4 addresses through:
Bash:
\egrep -o "((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){1,3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)" /path/to/file | sort -u > ~/uniqueIPV4List.txt
And if you need to extract any IPV6 addresses - it's a lot more complex:
Bash:
\egrep -o "(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))" /path/to/file | sort -u > ~/uniqueIPV6List.txt
I found the above regex in a web-search and plugged it into the \egrep command. It's horrible to read, but it works nicely!
 

Members online


Top