Table of Contents
- Prepare Servers
- Install GlusterFS
- Partition Data Disks
- Configure Gluster Volume
- Test Replication
This tutorial is designed to be a quick introduction to GlusterFS. GlusterFS is an open source network file system that can be helpful in situations where you have multiple servers that need access to the same data. It can be configured in multiple ways. We will focus on setting it up to replicate data across three servers.
We are going to choose Ubuntu 16.04 LTS as the OS - this lets us skip some extra configuration steps. For example, we don't have to open ports in a firewall or adjust a SELinux security context. We can address those in a CentOS specific tutorial, or you can consult GlusterFS resources available elsewhere on the web.
To follow along, please provision three servers, each with one drive for the OS, and a second for use by the Gluster bricks. You don't have to have your Gluster storage on separate disks, but it is a recommended configuration. Each of the servers will be connected to two LANs and use DHCP to acquire IP addresses for both the public and private LANs. Your IP addresses will differ from these, but here is an example of how it could look.
Host IP Addresses node1 188.8.131.52 gluster1 10.14.154.12 node2 184.108.40.206 gluster2 10.14.154.13 node3 220.127.116.11 gluster3 10.14.154.11
Once your servers have been provisioned, make a note of the public and private IP address that were assigned. It would be a good idea to open up an SSH session with each server.
We can set the hostnames for each server by running:
hostname node1, substituting in the correct name on each node.
To make the hostname change survive reboots, you can edit
For GlusterFS to work with names, instead of having to specify IP addresses, we can populate
/etc/hosts on each node so that it looks similar to this:
127.0.0.1 localhost 127.0.1.1 ubuntu # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters 18.104.22.168 node1 22.214.171.124 node2 126.96.36.199 node3 10.14.154.12 gluster1 10.14.154.13 gluster2 10.14.154.11 gluster3
You can test connectivity between your three nodes by running a
ping nodeX and
ping glusterX from each one.
With those preliminary items out of the way, we can proceed to install GlusterFS.
Let's add a package to enable HTTPS for transfers by running:
apt-get install apt-transport-https
GlusterFS for Ubuntu is available using a Personal Package Archive (PPA). We'll need to specify the version we want. Based on the info at Launchpad.net: Gluster, it looks like glusterfs-4.1 is the current release. Depending on when you are going through this tutorial, there may be a newer version available.
To add the repository:
Now update the available packages:
apt-get install glusterfs-server
Partition Data Disks
Since we are going to use
parted to configure a partition on each of our second disks, install it:
apt-get install parted
To partition the disk that we'll use for GlusterFS, we need to run through a series of steps on each server. We will run
parted specifying the block device (
/dev/vdb) that we want to work with. We'll use
mklabel gpt to specify a GUID Partition Table (GPT) instead of MBR. We'll then create a logical partition that uses the entire disk with
mkpart brick xfs 0% 100%. Next we format the new
/dev/vdb1 partition using XFS. Finally, we create a directory and mount the filesystem into that directory. Note: The path intentionally has one extra level (
/mydata) below the mount point (
Therefore, on each server you'll want to run:
parted /dev/vdb mklabel gpt mkpart brick xfs 0% 100% mkfs.xfs /dev/vdb1 mkdir -p /gluster/data/mydata mount.xfs /dev/vdb1 /gluster/data
At this point, each server should have an entry similar to this showing in the
df -h output:
/dev/vdb1 20G 33M 20G 1% /gluster/data
Since we want this mount available each time the servers boot, we need to add an entry to
/etc/fstab. We can get the UUID of the new partitions by running
blkid /dev/vdb1 on each server. The output will look something like this:
/dev/vdb1: UUID="7c1fe608-a7fd-4966-9490-80430598a2ba" TYPE="xfs" PARTLABEL="brick1" PARTUUID="f36ce678-3125-49f1-8874-b579dc283a06"
Please Note: The UUID will be different on each server, so you should NOT just copy and paste the same entry into
/etc/fstab on each server.
/etc/fstab on each server and add a line similar to the last line of this example:
# /etc/fstab: static file system information. # # Use 'blkid' to print the universally unique identifier for a # device; this may be used with UUID= as a more robust way to name devices # that works even if disks are added and removed. See fstab(5). # # <file system> <mount point> <type> <options> <dump> <pass> UUID=3795ff81-d755-4ac1-8bea-2e35c811d1e6 / ext4 errors=remount-ro 0 1 /dev/fd0 /media/floppy0 auto rw,user,noauto,exec,utf8 0 0 UUID=b614d777-d51b-4598-8ed0-9b00031e034a none swap sw 0 0 UUID=7c1fe608-a7fd-4966-9490-80430598a2ba /gluster/data xfs defaults 0 0
As a summary, the new entry consists of the UUID returned from
blkid /dev/vdb1, the desired mount point (
/gluster/data/), the filesystem type (
xfs), the default options (
defaults), a dump value of 0, and a pass setting of 0, which tells the OS not to run
fsck on boot.)
Once you've completed those steps on each server, we can move on and begin working with Gluster itself.
Configure Gluster Volume
We have a series of commands to run to include all three servers and provision our first volume.
On node1 only:
Bring the second server into our cluster:
gluster peer probe gluster2
Bring the third server into our cluster:
gluster peer probe gluster3
Create a new Gluster volume named
mydata that replicates data between all three members:
gluster volume create mydata replica 3 gluster1:/gluster/data/mydata gluster2:/gluster/data/mydata gluster3:/gluster/data/mydata
Start the Gluster volume:
gluster volume start mydata
Now that we have the volume created we need to mount it on each server.
On all three nodes run:
mount.glusterfs localhost:/mydata /gluster/data/mydata
We can verify the status of our peers with
gluster peer status:
Number of Peers: 2 Hostname: node2 Uuid: 69518256-84e7-42b4-bb89-7aed8d539fab State: Peer in Cluster (Connected) Hostname: node3 Uuid: 755bb822-9aa0-40d2-b924-caaa8f94f46c State: Peer in Cluster (Connected)
If you run this command on the other nodes, you'll get similar output. It will list the peer nodes not including the node you are running the command from.
Verify the status of our new volume with
gluster volume status:
Status of volume: mydata Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick node1:/gluster/data/mydata 49152 0 Y 1953 Brick node2:/gluster/data/mydata 49152 0 Y 1813 Brick node3:/gluster/data/mydata 49152 0 Y 1756 Self-heal Daemon on localhost N/A N/A Y 1976 Self-heal Daemon on node2 N/A N/A Y 4522 Self-heal Daemon on node3 N/A N/A Y 2384 Task Status of Volume mydata ------------------------------------------------------------------------------ There are no active volume tasks
If we want our volume to be available when the server boots, we should add entries to
/etc/fstab. Note: We want entries for both mount points. We'll want our applications to read and write data using a new mountpoint that uses the GlusterFS volume, not the local XFS
On all three nodes:
Create a directory for the GlusterFS mount:
Add a new
localhost:/mydata /mnt/shared glusterfs defaults,_netdev 0 0
Notice the differences from the first entry we added to
/etc/fstab. This time we start with the name of our Gluster volume (
mydata) on the localhost. We tell it to make that mount accessible as
/mnt/shared, specify a filesystem type of
glusterfs, use the default options with the addition of
_netdev (so that it waits for the network to be up before trying to mount), and the same values of 0 for the dump and pass columns.
Create or copy some files into
/mnt/shared on any of the three nodes and they will show up on the other two as well. As an quick example, one each server you could run:
You should end up with three files:
-rw-r--r-- 1 root root 0 Sep 4 21:58 file-node1.txt -rw-r--r-- 1 root root 0 Sep 4 22:01 file-node2.txt -rw-r--r-- 1 root root 0 Sep 4 22:01 file-node3.txt
and they all show up in
/mnt/shared/ on each of the three servers.
To demonstrate the replication failure recovery process, we can stop the
glusterd service on one of the nodes.
On node2 run:
service glusterd stop
On node1 run:
gluster peer status
It should now show that
node2 has been disconnected:
Number of Peers: 2 Hostname: node2 Uuid: 69518256-84e7-42b4-bb89-7aed8d539fab State: Peer in Cluster (Disconnected) Hostname: node3 Uuid: 755bb822-9aa0-40d2-b924-caaa8f94f46c State: Peer in Cluster (Connected)
On node1 run:
gluster volume status
We can see that
node2 has dropped out of our volume:
Status of volume: mydata Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick node1:/gluster/data/mydata 49152 0 Y 1953 Brick node3:/gluster/data/mydata 49152 0 Y 1756 Self-heal Daemon on localhost N/A N/A Y 1976 Self-heal Daemon on node3 N/A N/A Y 2384 Task Status of Volume mydata ------------------------------------------------------------------------------ There are no active volume tasks
node1 copy or create some new files files in
touch /mnt/shared/file2-`hostname`.txt touch /mnt/shared/file3-`hostname`.txt
node3 verify the files are there.
-rw-r--r-- 1 root root 0 Sep 4 21:58 file-node1.txt -rw-r--r-- 1 root root 0 Sep 4 22:01 file-node2.txt -rw-r--r-- 1 root root 0 Sep 4 22:01 file-node3.txt -rw-r--r-- 1 root root 0 Sep 4 22:10 file2-node1.txt -rw-r--r-- 1 root root 0 Sep 4 22:10 file3-node1.txt
service glusterd start
Check to see that everything is in sync:
gluster peer status gluster volume status ls /mnt/shared/
You should see that the files created while node2 was offline have been replicated and are now available.
Gluster keeps several log files available in
/var/log/glusterfs/ that may be helpful if something isn't working as expected and you aren't sure what is going on.
In this tutorial we configured GlusterFS for replication of data on three Ubuntu nodes. You can now begin exploring and experimenting with how GlusterFS works. I recommend taking a look at the excellent GlusterFS Documentation. I'm happy to review any feedback or questions you may have come up with while following the tutorial. Please comment below or start a new discussion in the Community section of this site.
Depending on when you are going through this tutorial, there may be a newer version available. run 3 2.0