| Linux Online: Short Lessons |
|---|
Baking Pies with Perl
Michael J. Jordan, Linux Online Staff
April 26, 2007
I recently updated my laptop from Ubuntu, version Edgy to the most
recent version, called Feisty. As I was installing some new
applications, I noticed that the updates were going extremely
slowly. This was due to the fact that Feisty had just been released
that morning and the servers that I was connecting to were getting
hammered. I had seen a tip earlier that day that a person had written
about using a mirror in Sweden to speed up his Feisty download. Since
Feisty gets its instructions on where to download files from a file
called sources.list, located in /etc/apt/, to use the speedier mirror,
I needed to make changes to the sources.list file. Since this file
pointed to servers in the US, all of the URLs in that file started
with us.archive.ubuntu.com. All I would have to do is change us. to
se. (the two letter designation for Sweden) and run apt-get update to
start running the file. Ordinarily, anybody might just use their
favorite text editor and open the file and then go changing lines one
by one or maybe even use the find-replace utility, if the editor has
one. Well, there were fifteen instances of 'us.' in sources.list and
there was a faster way to change them all to 'se.' - and that was by
using a trick I call 'perl pie'.
This particular pie has nothing to do with the baked kind. The 'pie'
in my example actually stands for command line options '-pi -e' that I
use with perl to substitute text. If you've got a file that has some
word or words that you'd like to change, then:
perl -pi -e 's/before/after/g' your.file
This will save you a lot of time. While doing
my work, I use this several times a day.
Let's say you've written a press release for a conference that you're
planning. Let's say you're the head of the East Oshkosh Mycological
Society and you're bringing in experts on saffron milk caps from all
over the world. You're ready to send it out.
The annual East Oshkosh Mycological Society announces it's third
annual Mycological Conference. This year's subject will be the
saffron milk cap, one of the world's most popular mushrooms. The
conference will be held from October 25 to October 29. Registration
begins on October 1 and ends on October 10. Those wishing to register
after October 10 can do so but there will be a $20 surcharge. A
pre-conference dinner and cocktail hour will be held on October 24 at
7:00 PM. Only those signed up during the October 1-10 registration
period can attend the dinner.
And let's say you went on to describe a few of the presentations, with the
dates, all in October. You finish the announcement and you're about to
email it to your fellow fungi lovers and you realize that the
conference is in November, not October. Ugh, you exclaim. What was I
thinking! Not to worry. Perl pie to the rescue.
perl -pi -e 's/October/November/g' conference.txt
Some things to watch out for
One of the most practical uses of this is to change source code. I
assume developers of every kind use this perl one-liner (or some
variation) a lot. However, when you're dealing with source code,
you're also using characters other than letters. In these situations,
you'll need to be careful some times. An example with source code that
most everybody can identify with is HTML, since most people have
attempted, at least once, to create a web page. Let's say you've
written some article for your site and after you've finished, you've
realized that you wanted every occurrence of the word 'aardvark' to
appear in bold. So, what do you do? Do you fire up
the HTML editor and highlight every instance of 'aardvark' and click on
the big B on the menu bar? You can, but it would be easier to to this:
perl -pi -e 's/aardvark/<strong>aardvark<\/strong>/g' webpage.html
You may have noticed that while we want the tag to appear in our document, we have
used <\/strong> in our perl one-liner. That is because perl uses the forward slash (/) for
its own purposes. Here, it's separating our substitutions. The backward slash (\) tells
perl to treat the forward slash after it like a normal character. If you don't do this,
you'll get an error. You'll need to do the same with single quotes and apostrophes (') as well.
Your own mini wiki
Wikipedia is one of the most popular sites on the web. The concept behind it is that
everybody can edit the entries, so it's a project that the whole world can collaborate on.
There is a lot of talk about the accuracy of its content, but little is made about how
people create web pages on Wikipedia (or any wiki type site, for that matter) without
writing any HTML. This is one of the factors in his success, I think.
For example, as in my previous example, if I want a word to appear in bold type on a web
page, I have to find some way to get this: <strong>bold</strong> into my
document. When you use Wikipedia, the instructions tell you to use certain characters
before and after the word you want to appear in bold type. We can also use our
'perl pie' one-liner to do this. With this, we can use any text editor and quickly get some
simple formatting into a web page. Since we're not trying to compete with Dreamweaver,
let's define three simple styles that we want: bold face, italics and red text.
First, this will get you bold text:
perl -pi -e 's/([*])(.*?)([*])/<strong>\2<\/strong>/g'
What we've used here is called a regular expression. This is a programming term
for a line of code that acts like a Swiss Army knife. You'll see two asterisks inside brackets
that are inside parenthesis ([*]). The asterisks are our key characters. What this perl pie
example does is look for any word or words surrounded by asterisks, like *aardvark*, and
substitute the asterisks for <strong></strong> tags. So, using the same method,
this line:
s/([_])(.*?)([_])/<em>\2<\/em>/g
will create italics. And this line:
s/([%])(.*?)([%])/<span style="background-color:#ff9999;">\2<\/span>/g
will give us red highlighted text. Again, not Dreamweaver, but if you're posting to a blog
and you like writing your drafts in a text editor, you could use this system. If you want
more markup, just substitute the asterisks, underscores and percents signs for other
characters and change the tags. Characters other than letters or numbers work best, though
you might also use two letters together that don't repeat much - like YYwordYY
for a letter to appear, in say, yellow highlight.
Accumulating all of your one-liners into one script will give you your own portable wiki
engine
#!/usr/bin/perl
## mini wiki engine
while (<>) {
## bold
s/([*])(.*?)([*])/<strong>\2<\/strong>/g;
## italics
s/([_])(.*?)([_])/<em>\2<\/em>/g;
## red highlight
s/([%])(.*?)([%])/<span style="background-color:#ff9999;">\2<\/span>/g;
# send to standard output
print "$_"
}
Just run the script, directing the output to a file:
./miniwiki.pl blog.txt > blog.html
For the cautious
I'll mention this last, because calling this routine 'pie' is a
great mnemonic device and the following modification sort of ruins
that. All of the previous '-pi -e' examples overwrite the files you're
working on. If you make a mistake, you've lost your original file,
forever [cue ominous music]. Of course, there is a way around this.
Just add .BKP to the perl one liner like so:
perl -p -i.BKP -e 's/before/after/g' your.file
This will copy 'your.file' to a file named 'your.file.BKP' (for backup) and save you
from any pain if you've made a mistake.
Of course, my article doesn't look as nice with a title like 'perl -p -i.BKP -e', but
it might be a good thing to think of the BKP as the ice cream on your pie - a sort
of perl a la mode.
A quick fix
Though not a substitute for a word processor, web development suite or a wiki, this
simple perl one-liner can be a quick fix for simple mistakes or a fast way to alter
a file for whatever reason you might have. Since most Linux distributions install
Perl by default, it's a tool you'll already have at your disposal.
Michael J. Jordan is the managing editor of Linux Online. He can be reached at Michael.Jordan**AT**linux.org
|