Back-up early, back-up often
The second most important part of being a system administrator (some
might argue the most important), is backing up your system.
Getting a computer up and running in the first place precludes data
storage, but it's the storing of data that makes computers so
important. Practically every business activity in the world
today is run by computers or at least relies on computers to be more
efficient. Let's put it this way: If you have a business and the data
you stored suddenly disappears, you are out of business.
Despite the importance of data storage, backing up data on a Linux box
is actually one of the easiest and least labor intensive parts of
being a system administrator. After the initial decision as to what
data should be backed up, when and to what medium, it can all be
automated.
Tarred and feathered
The whole kit and caboodle
The most common means of backing up a system is to a tape
cartridge. If you happen to have one of these handy, you would issue
this command to back up your whole system:
This, of course, depends on one important thing. One is that your tapes
have enough space on them to backup the whole system. If not, you can
do it this way:
Using the -M option will ask you to put in a new tape when the first
one is full.
Other methods
Backing up to another machine
Another method of assuring backups is to put your data onto another
machine. I have found that the most efficient way of doing this is to set up a
cron job
[1]
to do everything, ideally, at night while you sleep. For
those who sleep during the day, that's OK too. You can set a cron job for any time.
First, you would make a tarball in a similar way as we described earlier, but
this time, you would create the file mystuff.tar.gz inserting into it all the files in the directory /home/acme-inc/
tar -czpf mystuff.tar.gz /home/acme-inc/ |
Now you have a tarball of the entire directory /home/acme-inc/. Now, we need
to get that tarball to another machine on your network.
The first way is kind of a fudge way of doing it. I don't mean to imply that
it's not a good way. It's perfectly valid, assuming you have a webserver
installed on the machine whose files you want to back up. If you were
hosting virtual websites with Apache, for example, this solution would be
ideal and fairly easy to set up. You could make regular backups of these sites
and make sure those backups sit safely on another machine in your network.
You need to have the program wget installed on the target
machine. (ie. the machine you're going to store the backups
on). wget is a GNU utility to retrieve files via HTTP and
FTP protocols. We'll use this because it's fairly easy to retrieve your
tarball because you'll know exactly where you put it. Plus wget is
non-interactive. You just have to program your cron job with
wget + URL to the machine |
and let it do the work.
Now that you've got the tarball made, your best bet is to copy it to the
machine's root webserver directory. Usually this is something like
/var/www/. Your mileage may vary.
[2]
You'd just issue the command to copy or move the file there.
cp mystuff.tar.gz /var/www/ |
Now the file is in /var/www/. If that machine is called 'hal9000' on your
local network, your other machine, would issue this command to retrieve that
file:
wget http://hal9000/mystuff.tar.gz |
We could write up a script to do the first part. It would look something like
this:
#!/bin/sh
# script to create backup file and
# copy it to webserver directory
# create tarball
tar -czpf mystuff.tar.gz /home/acme-inc/
# move/copy tarball
# change cp to mv if your partial to moving it
# feel free to change /where/it/is to the real place
cp /where/it/is/mystuff.tar.gz /var/www/
# end script |
Call it backup_script, for example and make it executable with
You can now place this in your crontab to get it executed when you wish.
Use
to edit the file.
* 4 * * * /where/it/is/backup_script |
Some caveats
The previous example is not very secure. It is just meant to introduce the
general idea. In order for you to use this scheme, it's quite possible that
your tarball gets copied to a public server. If your webserver isn't available
to the public, as is the case with many home networks (make sure you're blocking
access to port 80) then no problem. But if you have sensitive information
in the file, then you don't want to have it in a place that's accessible to
the public. Then we'll go to plan B.
Plan B
Plan B is going to be a lot more secure, but a bit more complicated. Such is
life. It's easier to leave your front door open than to put on a lock but it's
safer to use locks. First, to carry out this plan, you need to have a couple
of other things installed on your machines. You'll need OpenSSH server and
client. These come with the scp (secure copy)
program. You'll also need the package expect. Expect is a
program that will talk back to interactive programs - substituting human user
intervention. In our previous example, wget was non-interactive. It just got
the file. scp is different. It expects a password, so to give it what it's
expecting we use expect. I expect you get my meaning ...
You'd create a tarball in just the same way as before, but this time we won't
copy it to the webserver. We'll just leave it where it is. scp will retrieve
the file where we tell it to. To carry out backup Plan B, we'll need to modify
our script and create another one.
#!/bin/sh
# script to create backup file and
# scp it to another machine in our network
# create tarball
tar -czpf mystuff.tar.gz /home/acme-inc/
# call the expect script
./automate_scp mike@merlot:/home/mike PASSWORD
# end script |
What I've done is use the script called automate_scp to move the file to my
home directory on another machine in my network called 'merlot'.
[3]
My real password would have gone where you see PASSWORD.
This could be a potential security risk, so make sure you chmod this script
0700 (that is, -rwx---) so other people logged into the system can't read it.
Now we'll have to create the script automate_scp.
#!/usr/bin/expect -f
# tell the script it should be looking for a password
set password [lindex $argv 1]
# use scp to copy our file
spawn /usr/bin/scp mystuff.tar.gz [lindex $argv 0]
# be on the look out for password
expect "password:"
# here comes the password
send "$password\r"
#that's all folks!
expect eof |
Selecting which data to backup
It's probably best to be selective about what data you back up. In most cases,
you don't need to back up everything. For example, when I do backups of my
home directory, I leave out the cache files left by Mozilla and other web
browsers. To me that's just extra baggage that unnecessarily bloats the
tarball. The best way I've found making backups is to create a file of what
exactly it is you're interested in getting back when and if there's a problem
with your hard drive. First, create a file from a directory listing:
The
options will do two important things. The
will get the "dot" files - those that begin
with (.) and the
will make sure that the list
turns out as a nice single column. It's easier to edit this way. Now,
just go into the file and decide what you want and don't want to back
up. If you're backing up, let's say, /home/steve/ then you will need
to edit this file should you create any new files or directories in
/home/steve/. If you create anything in already existing
subdirectories, you don't need to worry. To make your tarball, just
type the following.
tar -zcvpf my_backup.tar.gz `cat backmeup` |
The program 'cat' feeds the contents of the file
backmeup into the tar program and creates the
tarball based on its contents.
Incremental Backups
You can also make periodic backups using a system of checking for changed or
new files. These are known as incremental backups. Basically, you use 'find' to look for files that have changed within a certain period of time
that you determine is right for your needs. You'd start with a "master" file
of sorts which would be your first tarball.
tar -cvzf acme-complete.tar.gz /home/acme/ > /home/admin/YYMMDD.complete.log |
Here we've created a tarball of the entire directory /home/acme. Using
the -v option with tar gives us verbose output to
pipe to a log file. That's optional but it may come in handy,
especially if you log your incremental backups as well. You can later
run the utility 'diff' on the log files and get an idea of your
productivity. Then again, you may not want to do that (if your 'diffs'
don't turn out to be too 'different'). Now for the incremental backups:
find /home/homedir -mtime -1 -print | tar -cvzf acme_complete.tar.gz -T - > YYMMDD.bkup.log |
Using the option -mtime -1 -print, we'll look for
files that have been modified or created within the last (1)
day. These are then piped to 'tar' and those files are added to our
original tarball. If you did backups every week instead of everyday,
you would have to change -1 to -7.
You could do the first "master" tarball by hand, but it would probably be a
good idea to automate the incremental backups. Here's is a sample script that
would do it:
#!/bin/sh
# get the date in YYYY-MM-DD format
this_day=`date +%Y-%m-%0e`
# make the backup and create log file for each day
find /home/acme -mtime -1 -print | tar -cvzf acme_complete.tar.gz -T - > $this_day.backup.log
# end script |
Then you would now copy this to your favorite location for storing backups.
Backing up to CDs
The use of CD-Read/Write devices, also known as CD "burners" has
exploded within the last few years. This is also a suitable and easy
to use medium for backups. First we'll use the utility
mkisofs to make a "image" of
what we want to put on the CD.
mkisofs -o acme_bkup.iso -R -v /home/acme/ |
Let's explain some of these options. -o (output) should be put before
the image file you're going to create. The -R option (Rock Ridge) will
ensure that the file will preserve it's file names as they were
originally. This is very important for Linux. You don't want all your
files to end up looking like MS-DOS names! Finally, -v is for verbose,
just in case you want to know what's going on. Now you can put it on
CD. For this, we'll use 'cdrecord'.
cdrecord -v speed=2 dev=2,0 acme_bkup.iso |
This example assumes a couple of things. One is that the device address of
your CD-RW device is 2,0. You should run
to find out what device address is valid for your system. The other is that
you're going to write at double speed. This, of course, may vary.
These are command line options, so there should be no problem in combining
these into a script to automate the process. Leave a CD in the drive when you
go to bed and voilá! - freshly roasted CD in the morning. It really can't
compare with the smell of freshly brewed coffee, but it will give you more
peace of mind.