With aggregated storage, you can take multiple drives on different systems and join them for the files to be mirrored, striped or distributed.
For Internet companies that store a lot of data, this is the basis for how it is done. The data needs to be highly available, which is managed by replicating the data to multiple servers and allowing users to access the data from varying servers to allow the prevention of bottlenecks. If the data is only on one server and many users attempt to gain access to it, then that server will slow down. By distributing the data, then it is more readily available to multiple users from multiple locations. If a server fails, then the data is still available from other servers.
Adding Storage
The best method for a server is to add a disk. Since we are using VirtualBox, we need to add a Virtual Disk.
Open the Server1 machine in question, make sure it’s not running and select ‘Storage’ (once it is done, then we will do the same for Server2). Keep in mind that we are not doing the ‘Master’ Server.
Select ‘Controller: SATA’ then click on the icon for ‘Adds Hard Disk’. Then select ‘Create’ and name the disk ‘as1’ for Server1 and ‘as2’ for Server2. Select them inside each Server Storage settings.
Now, when we start the Servers, the disk is ‘installed’ on the server. Now we only have to format it and mount it.
In a terminal, you can run ‘lsblk’ to see the disk address. In my case, for Server1, it is ‘sdd’ and on Server2, it is ‘sdb’.
Partition and format the drive as ‘XFS’. You can use an MSDOS partition and not GPT. Once you have formatted the drive, it will have an UUID to use for mounting.
We need to create a folder in which to mount the new drive, so use the command:
This will create a folder ‘as’ at the root of the system drive to mount our aggregate storage device. Do this on both servers.
You need to get the UUID of the new drive you just formatted. To do this, use the command:
Change the drive address as you need on each server.
Edit the ‘/etc/fstab’ as Root and add the line (replace your UUID that you just got from the previous command). You should be able to copy and paste it. You can open a second terminal window if needed. Add the line at the end:
Save the file and then run the following command to mount all drives in the ‘fstab’ file:
If you are adding a physical disk to a server, just make sure the system sees it and you format it. Set it up in FSTAB to auto-mount.
Installing GlusterFS on CentOS
We will use GlusterFS to perform the aggregation, but there are other programs that you can use depending on your preference.
We need to install the GlusterFS Server on both systems and the client on Server1 with FUSE. The FUSE is a user-space that also allows the files to be handled in a round-robin fashion so as not to burden one server.
On Server1, you need to have the EPEL repository set up (for CentOS). If you do not, perform:
NOTE: By the time of this writing, CentOS7 has come to the end of life for support. A few of the add-ons, like GlusterFS, have ceased supporting the version of CentOS. So, for the installation, I am using CentOS 9 Stream.
For the installation in CentOS, we need to add the repository for Gluster:
This will allow a new repository to be set up. The file created for the repository are not correct, so we need to make a change:
For Server2, use the command:
We need to configure the Firewall to allow communication to Gluster. On Server1 and Server2, perform the following:
Now that everything is ready, we can start the service with:
Glusterfs should now run on CentOS.
Installing GlusterFS on Ubuntu
For Ubuntu, we need to set up a PPA so we have access to the repository. The command is:
If you get an error that the ‘add-apt-repository’ does not exist, then you need to install it with:
After you add the PPA to the repository list, it should perform an update to get the repository list from each repository.
On Server1, perform:
On Server2, you need to:
Next, we need to set up iptables for connection from the other server:
NOTE: If you have over two servers, you need to run the ‘iptables’ command for each server except the local one that is itself.
Now that everything is ready, we just need to start the service and enable it to start when the system boots:
Set up a Distributed Volume
A distributed volume is one that is split between the servers. Some files are on one server and other files are on another and so on.
Here, we will set up a distributed file system between the two Gluster Servers.
Make sure the servers can see each other when pinging by FQDN hostnames.
On Server1, you need to run:
Change the FQDN to what you need it to be for your systems. You should get a ‘success’ message. If not, then you have an issue with the servers seeing each other on the network. Again, make sure you can ping with the FQDN. Also, the Gluster service should be active on both servers.
On either or both systems, you can run the code to show that server connection as a peer:
If we are creating a distributed volume, then we need a place to store it. So, on each server, perform the following command:
The new folder is on both systems and in the ‘/as’ folder, which is our mounted volume. The folder names can be different on each system and the mount location (/as). I kept them the same for ease of creating the volume and mounting it.
We need to create a distributed volume and we need to give it a name, so we will call it ‘distvol’. On Server1, perform the following to create the volume:
Now that it exists, we can start the disk volume with the command:
The result of the command should be ‘success’.
You need to keep in mind that there are now three disks. Two disks are physical and one is virtual. The two physical disks are on Server1 and Server2, mounted to ‘/as’. The virtual one is ‘distvol’ that is a joining of the two physical disks.
On Server1, you can mount the virtual volume with the command:
Looking at ‘/mnt’, it is empty since we have placed nothing on the volume yet.
So, on Server1, perform the following to place two files on ‘distvol’:
If you perform and ‘ls’ you should see both files. So try the following command on both servers:
On Server1 you should see file ‘1’ and on Server2 you should see file ‘2’. It distributes the files between the two.
Set up a Replicated Volume
A replicated volume is one that is mirrored. Here, both disks will have the same data.
The first thing we need is a folder to use as our volume. On both servers, perform:
If you performed the last example, you will need to remove the mounted folder with:
You can stop the ‘distvol’ with the command:
So now we will make a replicated volume in the folder ‘/as/replic’ and name the volume ‘replicvol’.
The command on Server1 will be:
The volume should exist and now all you need to do is start it:
You can mount the volume again so you can see what is on it:
If you go to ‘/mnt’ on Server1 and create a file like the last example, you can look in ‘/as/replic’ on both servers and see that the file exists in both volumes.
Conclusion
This should give you a basic understanding of setting up storage that is shared between two or more systems.
Give this a try to see how well it works. This makes for a great way to make data redundant on a network so it is accessible even when a system fails.
For Internet companies that store a lot of data, this is the basis for how it is done. The data needs to be highly available, which is managed by replicating the data to multiple servers and allowing users to access the data from varying servers to allow the prevention of bottlenecks. If the data is only on one server and many users attempt to gain access to it, then that server will slow down. By distributing the data, then it is more readily available to multiple users from multiple locations. If a server fails, then the data is still available from other servers.
Adding Storage
The best method for a server is to add a disk. Since we are using VirtualBox, we need to add a Virtual Disk.
Open the Server1 machine in question, make sure it’s not running and select ‘Storage’ (once it is done, then we will do the same for Server2). Keep in mind that we are not doing the ‘Master’ Server.
Select ‘Controller: SATA’ then click on the icon for ‘Adds Hard Disk’. Then select ‘Create’ and name the disk ‘as1’ for Server1 and ‘as2’ for Server2. Select them inside each Server Storage settings.
Now, when we start the Servers, the disk is ‘installed’ on the server. Now we only have to format it and mount it.
In a terminal, you can run ‘lsblk’ to see the disk address. In my case, for Server1, it is ‘sdd’ and on Server2, it is ‘sdb’.
Partition and format the drive as ‘XFS’. You can use an MSDOS partition and not GPT. Once you have formatted the drive, it will have an UUID to use for mounting.
We need to create a folder in which to mount the new drive, so use the command:
Code:
sudo mkdir /as
This will create a folder ‘as’ at the root of the system drive to mount our aggregate storage device. Do this on both servers.
You need to get the UUID of the new drive you just formatted. To do this, use the command:
Code:
lsblk -o UUID /dev/sdd
Change the drive address as you need on each server.
Edit the ‘/etc/fstab’ as Root and add the line (replace your UUID that you just got from the previous command). You should be able to copy and paste it. You can open a second terminal window if needed. Add the line at the end:
Code:
UUID=c97b2ef1-42d3-4968-9bb1-584cb3c23c38 /as xfs defaults 0 0
Save the file and then run the following command to mount all drives in the ‘fstab’ file:
Code:
sudo mount -a
If you are adding a physical disk to a server, just make sure the system sees it and you format it. Set it up in FSTAB to auto-mount.
Installing GlusterFS on CentOS
We will use GlusterFS to perform the aggregation, but there are other programs that you can use depending on your preference.
We need to install the GlusterFS Server on both systems and the client on Server1 with FUSE. The FUSE is a user-space that also allows the files to be handled in a round-robin fashion so as not to burden one server.
On Server1, you need to have the EPEL repository set up (for CentOS). If you do not, perform:
Code:
sudo yum install epel-release -y
NOTE: By the time of this writing, CentOS7 has come to the end of life for support. A few of the add-ons, like GlusterFS, have ceased supporting the version of CentOS. So, for the installation, I am using CentOS 9 Stream.
For the installation in CentOS, we need to add the repository for Gluster:
Code:
sudo dnf -y install centos-release-gluster9
This will allow a new repository to be set up. The file created for the repository are not correct, so we need to make a change:
Code:
sudo sed -i -e “s/enabled=1/enabled=0/g” /etc.yum.repos.d/CentOS-Gluster-9.repo
Then we need to run the following command on Server 1:
[code]sudo dnf --enablerepo=centos-gluster9 -y install glusterfs-server glusterfs glusterfs-fuse
For Server2, use the command:
Code:
sudo dnf --enablerepo=centos-gluster9 -y install glusterfs-server
We need to configure the Firewall to allow communication to Gluster. On Server1 and Server2, perform the following:
Code:
sudo firewall-cmd --permanent --add-service=glusterfs
sudo firewall-cmd --reload
Now that everything is ready, we can start the service with:
Code:
sudo systemctl enable glusterd.service
sudo systemctl start glusterd.service
Glusterfs should now run on CentOS.
Installing GlusterFS on Ubuntu
For Ubuntu, we need to set up a PPA so we have access to the repository. The command is:
Code:
sudo add-apt-repository ppa:gluster/glusterfs-9
If you get an error that the ‘add-apt-repository’ does not exist, then you need to install it with:
Code:
sudo apt install software-properties-common -y
After you add the PPA to the repository list, it should perform an update to get the repository list from each repository.
On Server1, perform:
Code:
sudo apt install glusterfs-server glusterfs-client -y
On Server2, you need to:
Code:
sudo apt install glusterfs-server -y
Next, we need to set up iptables for connection from the other server:
Code:
iptables -I INPUT -p all -s <ip address of other server> -j ACCEPT
NOTE: If you have over two servers, you need to run the ‘iptables’ command for each server except the local one that is itself.
Now that everything is ready, we just need to start the service and enable it to start when the system boots:
Code:
sudo systemctl start glusterd
sudo systemctl enable glusterd
Set up a Distributed Volume
A distributed volume is one that is split between the servers. Some files are on one server and other files are on another and so on.
Here, we will set up a distributed file system between the two Gluster Servers.
Make sure the servers can see each other when pinging by FQDN hostnames.
On Server1, you need to run:
Code:
sudo gluster peer probe server2.centos.linux.org
Change the FQDN to what you need it to be for your systems. You should get a ‘success’ message. If not, then you have an issue with the servers seeing each other on the network. Again, make sure you can ping with the FQDN. Also, the Gluster service should be active on both servers.
On either or both systems, you can run the code to show that server connection as a peer:
Code:
sudo gluster peer status
If we are creating a distributed volume, then we need a place to store it. So, on each server, perform the following command:
Code:
sudo mkdir /as/dist
The new folder is on both systems and in the ‘/as’ folder, which is our mounted volume. The folder names can be different on each system and the mount location (/as). I kept them the same for ease of creating the volume and mounting it.
We need to create a distributed volume and we need to give it a name, so we will call it ‘distvol’. On Server1, perform the following to create the volume:
Code:
sudo gluster volume create distvol transport tcp server1.centos.linux.org:/as/dist server2.centos.linux.org:/as/dist
Now that it exists, we can start the disk volume with the command:
Code:
sudo gluster volume start distvol
The result of the command should be ‘success’.
You need to keep in mind that there are now three disks. Two disks are physical and one is virtual. The two physical disks are on Server1 and Server2, mounted to ‘/as’. The virtual one is ‘distvol’ that is a joining of the two physical disks.
On Server1, you can mount the virtual volume with the command:
Code:
sudo mount -t glusterfs server1.centos.linux.org:/distvol /mnt
Looking at ‘/mnt’, it is empty since we have placed nothing on the volume yet.
So, on Server1, perform the following to place two files on ‘distvol’:
Code:
cd /mnt
touch 1
touch 2
If you perform and ‘ls’ you should see both files. So try the following command on both servers:
Code:
ls /as/dist
On Server1 you should see file ‘1’ and on Server2 you should see file ‘2’. It distributes the files between the two.
Set up a Replicated Volume
A replicated volume is one that is mirrored. Here, both disks will have the same data.
The first thing we need is a folder to use as our volume. On both servers, perform:
Code:
sudo mkdir /as/replic
If you performed the last example, you will need to remove the mounted folder with:
Code:
sudo umount /mnt
You can stop the ‘distvol’ with the command:
Code:
sudo gluster stop distvol
So now we will make a replicated volume in the folder ‘/as/replic’ and name the volume ‘replicvol’.
The command on Server1 will be:
Code:
gluster volume create replicvol replica 2 server1.centos.linux.org:/as/replic server2.centos.linux.org:/as/replic
The volume should exist and now all you need to do is start it:
Code:
gluster volume start replicvol
You can mount the volume again so you can see what is on it:
Code:
sudo mount -t gluster server1.centos.linux.org:/replicvol
If you go to ‘/mnt’ on Server1 and create a file like the last example, you can look in ‘/as/replic’ on both servers and see that the file exists in both volumes.
Conclusion
This should give you a basic understanding of setting up storage that is shared between two or more systems.
Give this a try to see how well it works. This makes for a great way to make data redundant on a network so it is accessible even when a system fails.