Tutorials

Get Started with GlusterFS on Ubuntu

Table of Contents

Introduction

This tutorial is designed to be a quick introduction to GlusterFS. GlusterFS is an open source network file system that can be helpful in situations where you have multiple servers that need access to the same data. It can be configured in multiple ways. We will focus on setting it up to replicate data across three servers.

We are going to choose Ubuntu 16.04 LTS as the OS - this lets us skip some extra configuration steps. For example, we don't have to open ports in a firewall or adjust a SELinux security context. We can address those in a CentOS specific tutorial, or you can consult GlusterFS resources available elsewhere on the web.

Prepare Servers

To follow along, please provision three servers, each with one drive for the OS, and a second for use by the Gluster bricks. You don't have to have your Gluster storage on separate disks, but it is a recommended configuration. Each of the servers will be connected to two LANs and use DHCP to acquire IP addresses for both the public and private LANs. Your IP addresses will differ from these, but here is an example of how it could look.

Host    IP Addresses

node1   192.96.159.217    gluster1  10.14.154.12
node2   158.222.102.206   gluster2  10.14.154.13
node3   208.94.38.62      gluster3  10.14.154.11

GlusterFS three node DCD layout

Once your servers have been provisioned, make a note of the public and private IP address that were assigned. It would be a good idea to open up an SSH session with each server.

We can set the hostnames for each server by running: hostname node1, substituting in the correct name on each node.

To make the hostname change survive reboots, you can edit /etc/hostname.

For GlusterFS to work with names, instead of having to specify IP addresses, we can populate /etc/hosts on each node so that it looks similar to this:

127.0.0.1       localhost
127.0.1.1       ubuntu

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

192.96.159.217  node1
158.222.102.206 node2
208.94.38.62    node3

10.14.154.12    gluster1
10.14.154.13    gluster2
10.14.154.11    gluster3

You can test connectivity between your three nodes by running a ping nodeX and ping glusterX from each one.

With those preliminary items out of the way, we can proceed to install GlusterFS.

Install GlusterFS

Let's add a package to enable HTTPS for transfers by running:

apt-get install apt-transport-https

GlusterFS for Ubuntu is available using a Personal Package Archive (PPA). We'll need to specify the version we want. Based on the info at Launchpad.net: Gluster, it looks like glusterfs-4.1 is the current release. Depending on when you are going through this tutorial, there may be a newer version available.

To add the repository:

add-apt-repository ppa:gluster/glusterfs-4.1

Now update the available packages:

apt-get update

Install GlusterFS:

apt-get install glusterfs-server

Partition Data Disks

Since we are going to use parted to configure a partition on each of our second disks, install it:

apt-get install parted

To partition the disk that we'll use for GlusterFS, we need to run through a series of steps on each server. We will run parted specifying the block device (/dev/vdb) that we want to work with. We'll use mklabel gpt to specify a GUID Partition Table (GPT) instead of MBR. We'll then create a logical partition that uses the entire disk with mkpart brick xfs 0% 100%. Next we format the new /dev/vdb1 partition using XFS. Finally, we create a directory and mount the filesystem into that directory. Note: The path intentionally has one extra level (/mydata) below the mount point (/gluster/data).

Therefore, on each server you'll want to run:

parted /dev/vdb

mklabel gpt

mkpart brick xfs 0% 100%

mkfs.xfs /dev/vdb1

mkdir -p /gluster/data/mydata

mount.xfs /dev/vdb1 /gluster/data

At this point, each server should have an entry similar to this showing in the df -h output:

/dev/vdb1           20G   33M   20G   1% /gluster/data

Since we want this mount available each time the servers boot, we need to add an entry to /etc/fstab. We can get the UUID of the new partitions by running blkid /dev/vdb1 on each server. The output will look something like this:

/dev/vdb1: UUID="7c1fe608-a7fd-4966-9490-80430598a2ba" TYPE="xfs" PARTLABEL="brick1" PARTUUID="f36ce678-3125-49f1-8874-b579dc283a06"

Please Note: The UUID will be different on each server, so you should NOT just copy and paste the same entry into /etc/fstab on each server.

Edit /etc/fstab on each server and add a line similar to the last line of this example:

# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
UUID=3795ff81-d755-4ac1-8bea-2e35c811d1e6 /               ext4    errors=remount-ro 0       1
/dev/fd0        /media/floppy0  auto    rw,user,noauto,exec,utf8 0       0
UUID=b614d777-d51b-4598-8ed0-9b00031e034a none swap sw 0 0
UUID=7c1fe608-a7fd-4966-9490-80430598a2ba /gluster/data xfs defaults 0 0

As a summary, the new entry consists of the UUID returned from blkid /dev/vdb1, the desired mount point (/gluster/data/), the filesystem type (xfs), the default options (defaults), a dump value of 0, and a pass setting of 0, which tells the OS not to run fsck on boot.)

Once you've completed those steps on each server, we can move on and begin working with Gluster itself.

Configure Gluster Volume

We have a series of commands to run to include all three servers and provision our first volume.

On node1 only:

Bring the second server into our cluster:

gluster peer probe gluster2

Bring the third server into our cluster:

gluster peer probe gluster3

Create a new Gluster volume named mydata that replicates data between all three members:

gluster volume create mydata replica 3 gluster1:/gluster/data/mydata gluster2:/gluster/data/mydata gluster3:/gluster/data/mydata

Start the Gluster volume:

gluster volume start mydata

Now that we have the volume created we need to mount it on each server.

On all three nodes run:

mount.glusterfs localhost:/mydata /gluster/data/mydata

We can verify the status of our peers with gluster peer status:

Number of Peers: 2

Hostname: node2
Uuid: 69518256-84e7-42b4-bb89-7aed8d539fab
State: Peer in Cluster (Connected)

Hostname: node3
Uuid: 755bb822-9aa0-40d2-b924-caaa8f94f46c
State: Peer in Cluster (Connected)

If you run this command on the other nodes, you'll get similar output. It will list the peer nodes not including the node you are running the command from.

Verify the status of our new volume with gluster volume status:

Status of volume: mydata
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick node1:/gluster/data/mydata            49152     0          Y       1953
Brick node2:/gluster/data/mydata            49152     0          Y       1813
Brick node3:/gluster/data/mydata            49152     0          Y       1756
Self-heal Daemon on localhost               N/A       N/A        Y       1976
Self-heal Daemon on node2                   N/A       N/A        Y       4522
Self-heal Daemon on node3                   N/A       N/A        Y       2384

Task Status of Volume mydata
------------------------------------------------------------------------------
There are no active volume tasks

If we want our volume to be available when the server boots, we should add entries to /etc/fstab. Note: We want entries for both mount points. We'll want our applications to read and write data using a new mountpoint that uses the GlusterFS volume, not the local XFS /gluster/data/mydata mount.

On all three nodes:

Create a directory for the GlusterFS mount:

mkdir /mnt/shared

Add a new /etc/fstab entry:

localhost:/mydata /mnt/shared glusterfs defaults,_netdev 0 0

Notice the differences from the first entry we added to /etc/fstab. This time we start with the name of our Gluster volume (mydata) on the localhost. We tell it to make that mount accessible as /mnt/shared, specify a filesystem type of glusterfs, use the default options with the addition of _netdev (so that it waits for the network to be up before trying to mount), and the same values of 0 for the dump and pass columns.

Test Replication

Create or copy some files into /mnt/shared on any of the three nodes and they will show up on the other two as well. As an quick example, one each server you could run:

touch /mnt/shared/file-`hostname`.txt

You should end up with three files:

-rw-r--r-- 1 root root     0 Sep  4 21:58 file-node1.txt
-rw-r--r-- 1 root root     0 Sep  4 22:01 file-node2.txt
-rw-r--r-- 1 root root     0 Sep  4 22:01 file-node3.txt

and they all show up in /mnt/shared/ on each of the three servers.

To demonstrate the replication failure recovery process, we can stop the glusterd service on one of the nodes.

On node2 run:

service glusterd stop

On node1 run:

gluster peer status

It should now show that node2 has been disconnected:

Number of Peers: 2

Hostname: node2
Uuid: 69518256-84e7-42b4-bb89-7aed8d539fab
State: Peer in Cluster (Disconnected)

Hostname: node3
Uuid: 755bb822-9aa0-40d2-b924-caaa8f94f46c
State: Peer in Cluster (Connected)

On node1 run:

gluster volume status

We can see that node2 has dropped out of our volume:

Status of volume: mydata
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick node1:/gluster/data/mydata            49152     0          Y       1953
Brick node3:/gluster/data/mydata            49152     0          Y       1756
Self-heal Daemon on localhost               N/A       N/A        Y       1976
Self-heal Daemon on node3                   N/A       N/A        Y       2384

Task Status of Volume mydata
------------------------------------------------------------------------------
There are no active volume tasks

On node1 copy or create some new files files in /mnt/shared/:

touch /mnt/shared/file2-`hostname`.txt
touch /mnt/shared/file3-`hostname`.txt

On node3 verify the files are there.

-rw-r--r-- 1 root root     0 Sep  4 21:58 file-node1.txt
-rw-r--r-- 1 root root     0 Sep  4 22:01 file-node2.txt
-rw-r--r-- 1 root root     0 Sep  4 22:01 file-node3.txt
-rw-r--r-- 1 root root     0 Sep  4 22:10 file2-node1.txt
-rw-r--r-- 1 root root     0 Sep  4 22:10 file3-node1.txt

Then on node2 restart glusterd:

service glusterd start

Check to see that everything is in sync:

gluster peer status

gluster volume status

ls /mnt/shared/

You should see that the files created while node2 was offline have been replicated and are now available.

Gluster keeps several log files available in /var/log/glusterfs/ that may be helpful if something isn't working as expected and you aren't sure what is going on.

Summary

In this tutorial we configured GlusterFS for replication of data on three Ubuntu nodes. You can now begin exploring and experimenting with how GlusterFS works. I recommend taking a look at the excellent GlusterFS Documentation. I'm happy to review any feedback or questions you may have come up with while following the tutorial. Please comment below or start a new discussion in the Community section of this site.

 
  • Depending on when you are going through this tutorial, there may be a newer version available. run 3 2.0

Log In, Add a Comment