Linux Online Advertisement
[ Register ]

[ Applications ]
[ Documentation ]
[ Distributions ]
[ Download Info ]
[ General Info ]
[ Book Store ]

Advertisement

[ Courses ]
[ News ]
[ People ]
[ Hardware ]
[ Vendors ]
[ Projects ]
[ Events ]
[ User Groups ]
[ User Area ]

Running Linux, Fourth Edition

[ About Us ]
[ Home Page ]
[ Advertise ]

Advanced Linux Course

Text Processing Tools

Plain text is a way of life in the Linux world, whether they be log files or dumps of error messages.A Linux administrator, therefore, needs to be familiar with tools that make analysis of these files easier. Luckily, Linux has a large number of command line utilities to help you do this job.

GNU Awk or Gawk

GNU awk is a funny sounding name for a program, but it's one that will serve you well as you maintain your Linux system. Instead of having to look at everything in a log file, for example, awk will help you pick just the data you need out of it. To get started, let's look at a few simple examples:

First, let's start by getting 6 numbers for this week's lottery ticket:

awk 'BEGIN { for (i = 1; i <= 6; i++) print int(50 * rand()) }'

Ok. I think that's enough fun. Now, let's look at some examples that are more along the lines of what we want to use it for. For example, let's get the total kilobytes used by text files in a directory:

ls -l *.txt | awk '{ x += $5 } END { print "total Kb: " (x + 1023)/1024 }'

We can also get the total bytes used by the user 'mike' in a given directory:

ls -l | awk '$3 == "mike" { sum += $5 } END { print sum }'

You can even use awk to keep a simple spreadsheet. Awk is perfectly capable of adding up a column of numbers. Let's say you've had a yard sale. [1] Let's say you're using your PDA to keep track of what you're selling and for how much, and saving the data in a simple text file. It might look something like this:

Item                   Time            Amount
======================================================
Sinatra_Record         11:30           00.50
blacklight_poster      11:45           00.75
lava_lamp              11:50           05.50
guitar                 11:55           15.00
blacklight_poster      12:00           00.75
beer_mug               12:05           01.50
beer_mug               12:05           01.50
beer_mug               12:05           01.50
end_tables             12:15           30.00
bicentennial_plate     12:20           01.50
stuffed_squirrel       12:30           03.25

To get the total of what you've sold, awk can easily add up the third column:

awk '/:/ { sum += $3 }; END { print sum }' yardsale.txt
Note

You'll notice the underscore between words in the description. It's there because two words are seen as two columns and since awk works by analyzing columns, we don't want to confuse it.

You can also see what you've specifically earned on beer mugs:

awk '/beer/ { sum += $3 }; END { print sum }'

Using awk for some administration tasks

If you use awk on the Apache log file, you can filter out the data to get the exact time of the hits on your website. The following will show you the frequency of visits.

cat access | awk '{print $4}' | uniq -c

The following will create a list of worm infected hosts that are trying (in vain) to infect you:

egrep -i "(root.exe|cmd.exe|_vti_bin)" access | awk '{print $1}' | sort -n | uniq

It wouldn't be too difficult to include output from a modified version of this to add these infected machines to our firewall. Including something like this in a script might work:

egrep -i "(root.exe|cmd.exe|_vti_bin)" access | sort -n | uniq | awk '{print "/sbin/iptables -I INPUT -p tcp --syn -s", $1, "-j DROP"}

Killing me softly

You can use awk for purposes of violence as well, namely, killing processes. Try the following example out. First, fire up an application. I'll use 'xcalc' here. Then you can use this awk one-liner to kill it, without having to use 'ps' and then look up the pid number.

ps uax | grep xcalc | awk '{print $2}' | xargs kill

Notes

[1]

The yard sale (or garage sale) is a typical US phenomenon. Families rid themselves of things they don't want by putting it out on tables in front of their house, confirming the proverb: One man's trash is another man's treasure. If you've seen the movie 'Toy Story 2', you remember that there's a scene where Woody is stolen while rescuing Wheezy, the Penguin (who bears a striking resemblance to Tux) from a yard sale.



Comments: feedback (at) linux.org
Advertising: banners (at) linux.org
Copyright Linux Online Inc.
Compilation ©1994-2008 Linux Online, Inc.
All rights reserved.