Comment goes here
You should log in and post some comments! The link is up there in the toolbar. Go nuts!
 

Backing Up Systems

« Back to Basics Articles

Back-up early, back-up often

The second most important part of being a system administrator (some might argue the most important), is backing up your system. Getting a computer up and running in the first place precludes data storage, but it's the storing of data that makes computers so important. Practically every business activity in the world today is run by computers or at least relies on computers to be more efficient. Let's put it this way: If you have a business and the data you stored suddenly disappears, you are out of business.

Despite the importance of data storage, backing up data on a Linux box is actually one of the easiest and least labor intensive parts of being a system administrator. After the initial decision as to what data should be backed up, when and to what medium, it can all be automated.


Tarred and feathered

Backing up to another machine
One method of assuring backups is to put your data onto another machine. The most efficient way of doing this is to set up a cron job (cron is a daemon that is used for scheduling jobs to be carried out automatically on a regular basis. Consult: man cron for more information) to do everything, ideally, at night while you sleep. For those who sleep during the day, that's OK too. You can set a cron job for any time.

First, you would make a tarball. Create the file mystuff.tar.gz inserting into it all the files in the directory /home/acme-inc/
tar -czpf mystuff.tar.gz /home/acme-inc/

Now you have a tarball of the entire directory /home/acme-inc/. Now, we need to get that tarball to another machine on your network.

The first way is kind of a fudge way of doing it. I don't mean to imply that it's not a good way. It's perfectly valid, assuming you have a webserver installed on the machine whose files you want to back up. If you were hosting virtual websites with Apache, for example, this solution would be ideal and fairly easy to set up. You could make regular backups of these sites and make sure those backups sit safely on another machine in your network.

You need to have the program wget installed on the target machine. (ie. the machine you're going to store the backups on).wget is a GNU utility to retrieve files via HTTP and FTP protocols. We'll use this because it's fairly easy to retrieve your tarball because you'll know exactly where you put it. Plus wget is non-interactive. You just have to program your cron job with

wget + URL to the machine

and let it do the work.

Now that you've got the tarball made, your best bet is to copy it to the machine's root webserver directory. Usually this is something like /var/www/. Your mileage may vary. You'd just issue the command to copy or move the file there.
cp mystuff.tar.gz /var/www/

Now the file is in /var/www/. If that machine is called 'hal9000' on your local network, your other machine, would issue this command to retrieve that file:
wget http://hal9000/mystuff.tar.gz

We could write up a script to do the first part. It would look something like this:
#!/bin/sh

# script to create backup file and
# copy it to webserver directory

# create tarball

tar -czpf mystuff.tar.gz /home/acme-inc/

# move/copy tarball
# change cp to mv if your partial to moving it
# feel free to change /where/it/is to the real place

cp /where/it/is/mystuff.tar.gz /var/www/

# end script

Call it backup_script, for example and make it executable with
chmod u+x backup_script

You can now place this in your crontab to get it executed when you wish. Use
crontab -e

to edit the file.
* 4 * * * /where/it/is/backup_script

Some caveats

The previous example is not very secure. It is just meant to introduce the general idea. In order for you to use this scheme, it's quite possible that your tarball gets copied to a public server. If your webserver isn't available to the public, as is the case with many home networks (make sure you're blocking access to port 80) then no problem. But if you have sensitive information in the file, then you don't want to have it in a place that's accessible to the public. Then we'll go to plan B.


Plan B

Plan B is going to be a lot more secure, but a bit more complicated. Such is life. It's easier to leave your front door open than to put on a lock but it's safer to use locks. First, to carry out this plan, you need to have a couple of other things installed on your machines. You'll need OpenSSH server and client. These come with the scp (secure copy) program. You'll also need the package expect. Expect is a program that will talk back to interactive programs - substituting human user intervention. In our previous example, wget was non-interactive. It just got the file. scp is different. It expects a password, so to give it what it's expecting we use expect. I expect you get my meaning ...
You'd create a tarball in just the same way as before, but this time we won't copy it to the webserver. We'll just leave it where it is. scp will retrieve the file where we tell it to. To carry out backup Plan B, we'll need to modify our script and create another one.
#!/bin/sh

# script to create backup file and
# scp it to another machine in our network

# create tarball

tar -czpf mystuff.tar.gz /home/acme-inc/

# call the expect script

./automate_scp mike@merlot:/home/mike PASSWORD

# end script

What I've done is use the script called automate_scp to move the file to my home directory on another machine in my network called 'merlot' (the use of names here requires a valid /etc/hosts file with entries in it for the machines in the network). My real password would have gone where you see PASSWORD. This could be a potential security risk, so make sure you chmod this script 0700 (that is, -rwx---) so other people logged into the system can't read it. Now we'll have to create the script automate_scp.
#!/usr/bin/expect -f
# tell the script it should be looking for a password
set password [lindex $argv 1]
# use scp to copy our file
spawn /usr/bin/scp mystuff.tar.gz [lindex $argv 0]
# be on the look out for password
expect "password:"
# here comes the password
send "$password
#that's all folks!
expect eof


Selecting which data to backup

It's probably best to be selective about what data you back up. In most cases, you don't need to back up everything. For example, when I do backups of my home directory, I leave out the cache files left by Mozilla and other web browsers. To me that's just extra baggage that unnecessarily bloats the tarball. The best way I've found making backups is to create a file of what exactly it is you're interested in getting back when and if there's a problem with your hard drive. First, create a file from a directory listing:
ls -a1 > backmeup

The
-a1
options will do two important things. The
-a
will get the "dot" files - those that begin with (.) and the
-1
will make sure that the list turns out as a nice single column. It's easier to edit this way. Now, just go into the file and decide what you want and don't want to back up. If you're backing up, let's say, /home/steve/ then you will need to edit this file should you create any new files or directories in /home/steve/. If you create anything in already existing subdirectories, you don't need to worry. To make your tarball, just type the following.
tar -zcvpf my_backup.tar.gz `cat backmeup`
The program 'cat' feeds the contents of the file backmeup into the tar program and creates the tarball based on its contents.

Incremental BackupsYou can also make periodic backups using a system of checking for changed or new files. These are known as incrementalbackups. Basically, you use 'find' to look for files that have changed within a certain period of time that you determine is right for your needs. You'd start with a "master" file of sorts which would be your first tarball.
tar -cvzf acme-complete.tar.gz /home/acme/ > /home/admin/YYMMDD.complete.log

Here we've created a tarball of the entire directory /home/acme. Using the -v option with tar gives us verbose output to pipe to a log file. That's optional but it may come in handy, especially if you log your incremental backups as well. You can later run the utility 'diff' on the log files and get an idea of your productivity. Then again, you may not want to do that (if your 'diffs' don't turn out to be too 'different'). Now for the incremental backups:
find /home/homedir -mtime  -1 -print | tar -cvzf acme_complete.tar.gz -T - > YYMMDD.bkup.log

Using the option -mtime -1 -print, we'll look for files that have been modified or created within the last (1) day. These are then piped to 'tar' and those files are added to our original tarball. If you did backups every week instead of everyday, you would have to change -1 to -7.
You could do the first "master" tarball by hand, but it would probably be a good idea to automate the incremental backups. Here's is a sample script that would do it:
#!/bin/sh
# get the date in YYYY-MM-DD format
this_day=`date +%Y-%m-%0e`
# make the backup and create log file for each day
find /home/acme -mtime -1 -print | tar -cvzf acme_complete.tar.gz -T - > $this_day.backup.log
# end script
Then you would now copy this to your favorite location for storing backups.


Backing up to CDs

The use of CD-Read/Write devices, also known as CD "burners" has exploded within the last few years. This is also a suitable and easy to use medium for backups. First we'll use the utility mkisofs to make a "image" of what we want to put on the CD.
mkisofs -o acme_bkup.iso -R -v /home/acme/

Let's explain some of these options. -o (output) should be put before the image file you're going to create. The -R option (Rock Ridge) will ensure that the file will preserve it's file names as they were originally. This is very important for Linux. You don't want all your files to end up looking like MS-DOS names! Finally, -v is for verbose, just in case you want to know what's going on. Now you can put it on CD. For this, we'll use 'cdrecord'.
cdrecord -v speed=2 dev=2,0 acme_bkup.iso

This example assumes a couple of things. One is that the device address of your CD-RW device is 2,0. You should run
cdrecord -scanbus

to find out what device address is valid for your system. The other is that you're going to write at double speed. This, of course, may vary.
These are command line options, so there should be no problem in combining these into a script to automate the process. Leave a CD in the drive when you go to bed and voilá! - freshly roasted CD in the morning. It really can't compare with the smell of freshly brewed coffee, but it will give you more peace of mind.