Linux Online Advertisement
[ Register ]

[ Applications ]
[ Documentation ]
[ Distributions ]
[ Download Info ]
[ General Info ]
[ Book Store ]

Advertisement

[ Courses ]
[ News ]
[ People ]
[ Hardware ]
[ Vendors ]
[ Projects ]
[ Events ]
[ User Groups ]
[ User Area ]

Programming Perl (3rd Edition)

[ About Us ]
[ Home Page ]
[ Advertise ]

Linux Online: Short Lessons

Baking Pies with Perl

Michael J. Jordan, Linux Online Staff

April 26, 2007

I recently updated my laptop from Ubuntu, version Edgy to the most recent version, called Feisty. As I was installing some new applications, I noticed that the updates were going extremely slowly. This was due to the fact that Feisty had just been released that morning and the servers that I was connecting to were getting hammered. I had seen a tip earlier that day that a person had written about using a mirror in Sweden to speed up his Feisty download. Since Feisty gets its instructions on where to download files from a file called sources.list, located in /etc/apt/, to use the speedier mirror, I needed to make changes to the sources.list file. Since this file pointed to servers in the US, all of the URLs in that file started with us.archive.ubuntu.com. All I would have to do is change us. to se. (the two letter designation for Sweden) and run apt-get update to start running the file. Ordinarily, anybody might just use their favorite text editor and open the file and then go changing lines one by one or maybe even use the find-replace utility, if the editor has one. Well, there were fifteen instances of 'us.' in sources.list and there was a faster way to change them all to 'se.' - and that was by using a trick I call 'perl pie'.

This particular pie has nothing to do with the baked kind. The 'pie' in my example actually stands for command line options '-pi -e' that I use with perl to substitute text. If you've got a file that has some word or words that you'd like to change, then:

perl -pi -e 's/before/after/g' your.file

This will save you a lot of time. While doing my work, I use this several times a day.

Let's say you've written a press release for a conference that you're planning. Let's say you're the head of the East Oshkosh Mycological Society and you're bringing in experts on saffron milk caps from all over the world. You're ready to send it out.

The annual East Oshkosh Mycological Society announces it's third annual Mycological Conference. This year's subject will be the saffron milk cap, one of the world's most popular mushrooms. The conference will be held from October 25 to October 29. Registration begins on October 1 and ends on October 10. Those wishing to register after October 10 can do so but there will be a $20 surcharge. A pre-conference dinner and cocktail hour will be held on October 24 at 7:00 PM. Only those signed up during the October 1-10 registration period can attend the dinner.

And let's say you went on to describe a few of the presentations, with the dates, all in October. You finish the announcement and you're about to email it to your fellow fungi lovers and you realize that the conference is in November, not October. Ugh, you exclaim. What was I thinking! Not to worry. Perl pie to the rescue.

perl -pi -e 's/October/November/g' conference.txt

Some things to watch out for

One of the most practical uses of this is to change source code. I assume developers of every kind use this perl one-liner (or some variation) a lot. However, when you're dealing with source code, you're also using characters other than letters. In these situations, you'll need to be careful some times. An example with source code that most everybody can identify with is HTML, since most people have attempted, at least once, to create a web page. Let's say you've written some article for your site and after you've finished, you've realized that you wanted every occurrence of the word 'aardvark' to appear in bold. So, what do you do? Do you fire up the HTML editor and highlight every instance of 'aardvark' and click on the big B on the menu bar? You can, but it would be easier to to this:

perl -pi -e 's/aardvark/<strong>aardvark<\/strong>/g' webpage.html

You may have noticed that while we want the tag to appear in our document, we have used <\/strong> in our perl one-liner. That is because perl uses the forward slash (/) for its own purposes. Here, it's separating our substitutions. The backward slash (\) tells perl to treat the forward slash after it like a normal character. If you don't do this, you'll get an error. You'll need to do the same with single quotes and apostrophes (') as well.

Your own mini wiki

Wikipedia is one of the most popular sites on the web. The concept behind it is that everybody can edit the entries, so it's a project that the whole world can collaborate on. There is a lot of talk about the accuracy of its content, but little is made about how people create web pages on Wikipedia (or any wiki type site, for that matter) without writing any HTML. This is one of the factors in his success, I think.

For example, as in my previous example, if I want a word to appear in bold type on a web page, I have to find some way to get this: <strong>bold</strong> into my document. When you use Wikipedia, the instructions tell you to use certain characters before and after the word you want to appear in bold type. We can also use our 'perl pie' one-liner to do this. With this, we can use any text editor and quickly get some simple formatting into a web page. Since we're not trying to compete with Dreamweaver, let's define three simple styles that we want: bold face, italics and red text.

First, this will get you bold text:

perl -pi -e 's/([*])(.*?)([*])/<strong>\2<\/strong>/g'

What we've used here is called a regular expression. This is a programming term for a line of code that acts like a Swiss Army knife. You'll see two asterisks inside brackets that are inside parenthesis ([*]). The asterisks are our key characters. What this perl pie example does is look for any word or words surrounded by asterisks, like *aardvark*, and substitute the asterisks for <strong></strong> tags. So, using the same method, this line:

s/([_])(.*?)([_])/<em>\2<\/em>/g

will create italics. And this line:

s/([%])(.*?)([%])/<span style="background-color:#ff9999;">\2<\/span>/g

will give us red highlighted text. Again, not Dreamweaver, but if you're posting to a blog and you like writing your drafts in a text editor, you could use this system. If you want more markup, just substitute the asterisks, underscores and percents signs for other characters and change the tags. Characters other than letters or numbers work best, though you might also use two letters together that don't repeat much - like YYwordYY for a letter to appear, in say, yellow highlight.

Accumulating all of your one-liners into one script will give you your own portable wiki engine

#!/usr/bin/perl

## mini wiki engine

while (<>)  {
## bold
s/([*])(.*?)([*])/<strong>\2<\/strong>/g;
## italics
s/([_])(.*?)([_])/<em>\2<\/em>/g;
## red highlight
s/([%])(.*?)([%])/<span style="background-color:#ff9999;">\2<\/span>/g;

# send to standard output

    print "$_"

}

Just run the script, directing the output to a file:

./miniwiki.pl blog.txt > blog.html

For the cautious

I'll mention this last, because calling this routine 'pie' is a great mnemonic device and the following modification sort of ruins that. All of the previous '-pi -e' examples overwrite the files you're working on. If you make a mistake, you've lost your original file, forever [cue ominous music]. Of course, there is a way around this. Just add .BKP to the perl one liner like so:

perl -p -i.BKP -e 's/before/after/g' your.file

This will copy 'your.file' to a file named 'your.file.BKP' (for backup) and save you from any pain if you've made a mistake.

Of course, my article doesn't look as nice with a title like 'perl -p -i.BKP -e', but it might be a good thing to think of the BKP as the ice cream on your pie - a sort of perl a la mode.

A quick fix

Though not a substitute for a word processor, web development suite or a wiki, this simple perl one-liner can be a quick fix for simple mistakes or a fast way to alter a file for whatever reason you might have. Since most Linux distributions install Perl by default, it's a tool you'll already have at your disposal.

Michael J. Jordan is the managing editor of Linux Online. He can be reached at Michael.Jordan**AT**linux.org




Comments: feedback (at) linux.org
Advertising: banners (at) linux.org
Copyright Linux Online Inc.
Compilation ©1994-2008 Linux Online, Inc.
All rights reserved.