Plain text is a way of life in the Linux world, whether they be
log files or dumps of error messages.A Linux administrator,
therefore, needs to be familiar with tools that make analysis of
these files easier. Luckily, Linux has a large number of command
line utilities to help you do this job.
GNU awk is a funny sounding name for a program, but it's one
that will serve you well as you maintain your Linux system. Instead
of having to look at everything in a log file, for example, awk
will help you pick just the data you need out of it. To get
started, let's look at a few simple examples:
First, let's start by getting 6 numbers for this week's lottery
ticket:
awk 'BEGIN { for (i = 1; i <= 6; i++) print int(50 * rand()) }'
|
Ok. I think that's enough fun. Now, let's look at some examples
that are more along the lines of what we want to use it for. For
example, let's get the total kilobytes used by text files in a
directory:
ls -l *.txt | awk '{ x += $5 } END { print "total Kb: " (x + 1023)/1024 }'
|
We can also get the total bytes used by the user 'mike' in a
given directory:
ls -l | awk '$3 == "mike" { sum += $5 } END { print sum }'
|
You can even use awk to keep a simple spreadsheet. Awk is
perfectly capable of adding up a column of numbers. Let's say
you've had a yard sale. Let's say you're
using your PDA to keep track of what you're selling and for how
much, and saving the data in a simple text file. It might look
something like this:
Item Time Amount
======================================================
Sinatra_Record 11:30 00.50
blacklight_poster 11:45 00.75
lava_lamp 11:50 05.50
guitar 11:55 15.00
blacklight_poster 12:00 00.75
beer_mug 12:05 01.50
beer_mug 12:05 01.50
beer_mug 12:05 01.50
end_tables 12:15 30.00
bicentennial_plate 12:20 01.50
stuffed_squirrel 12:30 03.25
|
To get the total of what you've sold, awk can easily add up the
third column:
awk '/:/ { sum += $3 }; END { print sum }' yardsale.txt
|
 |
You'll notice the underscore between words in the description.
It's there because two words are seen as two columns and since awk
works by analyzing columns, we don't want to confuse it.
|
You can also see what you've specifically earned on beer
mugs:
awk '/beer/ { sum += $3 }; END { print sum }'
|
If you use awk on the Apache log file, you can filter out the
data to get the exact time of the hits on your website. The
following will show you the frequency of visits.
cat access | awk '{print $4}' | uniq -c
|
The following will create a list of worm infected hosts that are
trying (in vain) to infect you:
egrep -i "(root.exe|cmd.exe|_vti_bin)" access | awk '{print $1}' | sort -n | uniq
|
It wouldn't be too difficult to include output from a modified
version of this to add these infected machines to our firewall.
Including something like this in a script might work:
egrep -i "(root.exe|cmd.exe|_vti_bin)" access | sort -n | uniq | awk '{print "/sbin/iptables -I INPUT -p tcp --syn -s", $1, "-j DROP"}
|
You can use awk for purposes of violence as well, namely,
killing processes. Try the following example out. First, fire up an
application. I'll use 'xcalc' here. Then you can use this awk
one-liner to kill it, without having to use 'ps' and then look up
the pid number.
ps uax | grep xcalc | awk '{print $2}' | xargs kill
|