Basic Backup Script in BASH

E

ehansen

Guest
Introduction

Everyone should be backing up their data. This just doesn't go towards sysadmins, but even people at home who never even think of it. Just like everything else in I.T., hard drives were built to fail. If you do not make efficient back ups, you are at your own mercy when your drive will no longer spin up, meaning all that data is now gone.

I try to make as many of my tasks as easy as possible simply by using Bash scripting. Its the most portable language meaning 99.9% of the time, you aren't going to need bash to run the script...and even if so, most of the code I write can work with other shells as well.

I'm going to copy/paste parts of my script, and give details about each block of code. At the end, I'll also provide a link to the full script.

What WIll This (Not) Do?

This script will do the following:
Back up the specified directories
Create a directory based on month & date
Create a separate backup for each given device
E-mail you the results (this assumes a working SSMTP, which is out of the scope of this particular guide)
Pretty much everything else is not going to be done. This is a quick and dirty backup solution, and isn't intended to be the absolute answer. I wrote this originally for my own needs and my needs only.

Lets Begin


Code:
DAY=`date +%d`
MONTH=`date +%m`
YEAR=`date +%Y`


Here we get various information needed for storing our backup information. Pretty straight forward with a smile at the end if I do say so myself. Pretty easy, eh?


Code:
# Backup directory to use (2011/08/31 for 08.31.2011)
BKDIR="/backups/$YEAR/$MONTH/$DAY"
 
if [ ! -d "$BKDIR" ]; then
        mkdir -p $BKDIR
fi


We make the directory where the backup files will be stored (-p ensures any missing directories will be made). The script must have write permissions to the BKDIR (in this case /backups/), or else it will fail.

BKLOG="/backups/$YEAR/$MONTH/$DAY.log"

Log of information (will make more sense later). I like things being consistent.

Code:
ARRPOS=0

I'll be honest, I can't really explain this, but it is used, and should make more sense when you actually see its use. Its like trying to explain to someone new to computers how a keyboard makes a letter appear...you just tell them "just press the key to see its power" if you want to keep them interested in you.

Code:
DRIVE=('sda' 'sdb')

This backup solution is device-based, and my server has two devices (one main, another with misc. data). The end result will basically be $BKDIR/$DRIVE[$ARRPOS].tar.gz (i.e.: /backups/2012/01/26/sda.tar.gz)

Code:
BACKUP=('/' '/pub')

What to back up on each device (for me, this is backing up everything).

Code:
SDAEX=('/media' '/tmp' '/dev' '/proc' '/sys' '/mnt' '/pub' '/var/cache' '/backups')

A lot of these aren't needed, and we also do not want to back up our back ups by default.

Code:
touch $BKLOG

Create an empty file for the log file, making sure it can be made.


Code:
echo "To: [email protected]" > $BKLOG
echo "From:Backups <backups@secrets>" >> $BKLOG
echo -e "Subject: Generated backup report for `hostname` on $YEAR.$MONTH.$DAYn" >> $BKLOG
echo -e ">> Backup for: $YEAR.$MONTH.$DAY started @ `date +%H:%M:%S`n" >> $BKLOG


The purpose of $BKLOG is to log the status of the back up process. When it is done, we will be e-mailing the report out (see "To:" field). You can change this however you want, this is how I did it for myself.


Code:
# Checks to see if day = 1, and if so, backs up the last month's backups
if [ "$DAY" == "01" ]; then
        M=`echo -n $MONTH | awk '{printf substr($1,2)}'`
        let OLD=M-1
 
        echo "- New month detected.  Backing up previous month's ($OLD) backups." >> $BKLOG
        echo "  + Backup file: /backups/$YEAR/$OLD.tar.gz" >> $BKLOG
        SD=$( { time tar -cpPzf /backups/$YEAR/$OLD.tar.gz /backups/$YEAR/$OLD/; } 2>&1 )
 
        # Got stats, delete folder
        rm -rf /backups/$YEAR/$OLD
 
        SD=`echo -n "$SD" | grep real`
        MIN=`echo -n "$SD" | awk '{printf substr($2,0,2)}'`
        SEC=`echo -n "$SD" | awk '{printf substr($2,3)}'`
        echo -e "- done [ $MIN $SEC ].n" >> $BKLOG
fi


As the comment block states, this is a monthly backup that occurs. It backs up the previous month's backups before starting one for the current day. This, combined with other routines put into the system keeps backups for a lengthy amount of time. This is also why we excluded /backups from our routine...WAY too many back ups of back ups if you ask me.

One thing I want to talk about, since its the meat of the actual back up routine, is this line:

Code:
tar -cpPzf /backups/$YEAR/$OLD.tar.gz /backups/$YEAR/$OLD/

This is basically telling tar to create (-c) a gunzipped (-z) back up file (-f) named /backups/$YEAR/$OLD.tar.gz containing the data found in /backups/$YEAR/$OLD/ directory, preserving permissions (-p), using absolute file names (-P) basically not stripping "/" from the beginning of the file name. The -P switch is used because it makes the output ugly, and it can lead to a broken back up.

Continuing on...


Code:
# Cycle through each drive and back up each
for d in "${DRIVE[@]}"; do
        echo "- Backing up drive $d" >> $BKLOG
 
        # By default, at least don't backup lost+found directories
        EX="--exclude=lost+found"
 
        # If we are backing up drive 1 (/dev/sda), there's to exclude
        if [ $d == "sda" ]; then
                for e in "${SDAEX[@]}"; do
                        EX="`echo -n $EX` --exclude=$e"
                done
        fi
 
        # Do the magic work and display some cool info
        SD=$( { time tar -cpPzf $BKDIR/$d.tar.gz $EX ${BACKUP[$ARRPOS]}; } 2>&1 )
        SD=`echo -n "$SD" | grep real`
        MIN=`echo -n "$SD" | awk '{printf substr($2,0,2)}'`
        SEC=`echo -n "$SD" | awk '{printf substr($2,3)}'`
        SD=$(ls -liha $BKDIR/$d.tar.gz)
        SIZE=`echo -n $SD | awk '{printf $6}'`
 
        let ARRPOS++
done


This is the code that does the backing up of current data. This is also where we need ARRPOS. This is all pretty much self explanatory as well to be honest. Biggest change here is, besides the array wrapped in a for block, we get the file size of the created back up file. So, lets re-wind a little bit here and look at the for block...

Another block of code I didn't discuss earlier (since its in a couple of spots) is this:


Code:
        SD=$( { time tar -cpPzf $BKDIR/$d.tar.gz $EX ${BACKUP[$ARRPOS]}; } 2>&1 )
        SD=`echo -n "$SD" | grep real`
        MIN=`echo -n "$SD" | awk '{printf substr($2,0,2)}'`
        SEC=`echo -n "$SD" | awk '{printf substr($2,3)}'`


If you run the time command, you'll get an output like this:


Code:
[ehansen@sfu ~]$ time
 
real    0m0.000s
user    0m0.000s
sys    0m0.000s


What the block of code does is measure how long it takes to create the back up file (tar command), and then we only measure the "real" time. The reason why this is done is because even in a multithreaded environment like Linux lets you have, a process may have to stop its thread for a moment to either let another user's program (or system action like signal handling) occur. The "real" time is the actual time it took for a command to execute. We use my best friend, Mr. awk, to parse the data from the information.

Code:
for d in "${DRIVE[@]}"; do

d will be whatever value is currently at $DRIVE[$ARRPOS]. For example, the first time around, d will be sda, second time it will be sdb.


Code:
        # By default, at least don't backup lost+found directories
        EX="--exclude=lost+found"
 
        # If we are backing up drive 1 (/dev/sda), there's to exclude
        if [ $d == "sda" ]; then
                for e in "${SDAEX[@]}"; do
                        EX="`echo -n $EX` --exclude=$e"
                done
        fi


ext3 & 4 file systems create this wonderful file called lost+found. Personally, I'm not a fan of it, because every time I try to restore corrupted data from it, I just get my bottom handed to me, but besides this, its a folder that really should be pointless to include in a routine back up measure. If we're working on the first partition (where /boot, /home, /var, etc... are stored), we exclude some of the more minor files that really mean nothing when the system is shut down. The reason why there's the line "EX="`echo -n $EX` --exclude=$e" is its basically the same as, for example, in PHP or Perl where you can do EX .= " --exclude=$e". Bash, however, is not as friendly with strings and concating.

Code:
# Mail this script out...ssmtp for GMail accounts, otherwise change for appropriate MTA
/usr/sbin/ssmtp -t < $BKLOG


This is the last of the back up script, just giving out some generic details, and then using ssmtp to send out the report. Nothing to really discuss here.

There you have it...a (surprisingly) simple back up solution for a server. Is it robust? Not really. A better implementation would be to do incremental daily with a full backup only once a week, and to also consider possible locked database files for example, but that's your homework. I encourage people to not steal other people's code, but to build upon it and also learn from it. What I provide will not be a solution for everyone, maybe anyone, but a lot of the techniques found here can be implemented in other languages even (such as PHP...which I converted this script to for fun).

As promised, here is the script, in full (with some poor comments in the header) that you can wget with: http://eric-hansen.info/old/scripts/script/backup.sh
 


Code:
  M=`echo -n $MONTH | awk '{printf substr($1,2)}'`
        let OLD=M-1
Will value of M be the same for February and December ?
 
Hello, great script! But the link is not working. Best,
RM
 
Hi,

Great job but please keep entire backup.bash file
 
This would be good if the link for the script is working!? if someone is got this pls let me know...thx
 
Since I cannot post links in my first post, I'll tell you how to get the script:
Go to the WaybackMachine on "archiveXorg/web" (replace the X with a dot).
Paste the Link from the first post into there.

Then you should see a snapshot from February 2, 2012.
Click on it, you're very welcome. :)
 

Members online


Latest posts

Top