Full MERN Stack App: 0 to deployment on Kubernetes — part 7

In the seventh part, I will talk about how to connect persistent storage to your pods and deploy stateful apps.

6 min readSep 21, 2019

Welcome back to the seventh part of the series. Today we will in-detail talk about setting up a Google Compute Engine (GCE) Persistent Disk and connecting it to our Kubernetes cluster to be used by pods to deploy stateful apps.

If you haven’t read my sixth part yet, please follow the link below.

Full MERN Stack App: 0 to deployment on Kubernetes — part 6

In the sixth part, I will talk about configuring health and readiness checks and auto-scaling for our pods.

medium.com

Why do we need persistent storage?

Containers running on a pod are defaulted to be stateless. That means whatever data you create during its run is lost when the container or pod is restarted. This is not a problem if your app is stateless, that is the newly started container can work on its own, without needing any data from its predecessor container.

But for stateful apps, that save files in the local storage and need those files back whenever the app is restarted in order to continue work, this is a problem. An easy example would be a database. Imagine your pod had restarted and all your customer data is lost because the database was not stored in persistent storage!

The solution is to get some persistent storage from a provider and attach it to your cluster and tell the pods to utilize this storage whenever they need to save files that they may need again in the future. Since our cluster runs on Google Kubernetes Engine, our storage solution will be GCE disks. If the cluster was running on AWS, we can use an EBS volume.

Creating a volume on GCS

We can easily create a disk on Google Compute Engine using gcloud. Note that these are the same kind of disks that are attached to Compute Engine instances. You can specify any size but 5GB is enough for now. Make sure that the zone parameter is as same as your cluster.

gcloud compute disks create --size=5GB --zone=northamerica-northeast1-a cloudl-disk

Once the command is complete, you can see the disk on the Cloud Storage dashboard.

The other 100GB disk is the disk attached to our cluster which is shared among the nodes and whatever data saved in that disk is destroyed when the pods are stopped.

Attaching the disk to a pod

We will attach this newly created persistent disk to our MongoDB pod as the database can be saved in this storage. We will later restart the pod and verify that our data is still there in the MongoDB database. We need to modify our MongoDB pod definition for this.

mongodb-pod.yml

The volumes block is used to name our storage to be identified by the containers in the pod. Since we use a GCE persistent disk, we add the gcePersistentDisk block. If it was an Amazon EBS volume, we use the awsElasticBlockStore block. “pdName” parameter specifies the name of the disk we created using the above command.

Now we mount this disk as a path inside our pod. This path is specified using the “volumeMount” parameter. MongoDB saves its data at /data/db by default, so we would mount our storage at the same path so we don’t have to configure MongoDB explicitly to use the mount directory. Its name property gets the name we assigned as the name of the volume above.

Verifying that persistent storage works

Now, delete the current running MongoDB pod and create a new pod using the updated definition.

kubectl delete pod mongodbkubectl create -f mongodb-pod.yml

Let’s start a shell inside our MongoDB pod and create some data in the database.

kubectl exec -it mongodb /bin/bash

This will launch a new bash terminal inside the pod. We usually use this terminal to debug our apps inside the pod. Here we can use the MongoDB terminal client to add some dummy documents into the database. Inside the terminal, run

# mongo> use cloudl> db.dummy.insert({"name": "dummy_data"})> exit# exit

This will insert a new document into the “dummy” collection in the “cloudl” database with a single field “name” and “dummy_data” as its value.

Now delete this MongoDB pod, create another pod using the definition and start a shell inside the newly created pod.

# mongo> use cloudl> db.dummy.find()

This will query all the documents from the “dummy” collection.

We can see that the dummy document we inserted earlier is still there, even after deleting the pod, thus our persistent storage works!

PersistentVolumes and PersistentVolumeClaims

In the official Kubernetes docs, it is advised to use a separate disk in a production environment. Remember that there was another disk shown in the Cloud Storage dashboard? I said that disk is used by the nodes in the cluster. Rather than creating a separate disk, we can also use this disk space to create a volume. This is done using PersistentVolume and PersistentVolumeClaim definitions.

PersistentVolume

A PersistentVolume definition is used to allocate an amount of storage from a node’s disk to be used as persistent storage. This storage is not used by any pod unless we allow them to and whatever the data inside this storage is kept even when the pods are destroyed.

It’s a resource in the cluster which is independent of any individual pod that uses the PV. — TutorialsPoint

Here is a sample PersistentVolume definition to tell Kubernetes that we are using 5GBs from the node’s storage to be used as persistent storage.

cloudl-pv.yml

This will allocate a 5GB partition from the node’s disk and mount the whole storage at “/mnt/data”. The accessModes parameter specifies that this volume can be mounted as read-write by a single Node. Before creating this definition, delete the MongoDB pod and the secondary disk we created.

kubectl create -f cloudl-pv.yml

PersistentVolumeClaim

PersistentVolume definition will allocate a 5GB space from the node’s disk to be used for persistent storage. To allow a pod to utilize this storage, first, we need to claim a portion of this storage and bind that claimed storage to the pod. It is done using a PersistentVolumeClaim definition.

cloudl-pv-claim.yml

This definition will claim a 1GB portion of our 5GB persistent storage. We bind this definition to a pod definition so the pod can use it to request for storage resources. When Kubernetes receives a storage request from a pod, it will first look for all the available PersistentVolumes and if a suitable persistent volume is available, it will allocate the space from that storage and let the pod use that space.

Because of this approach, the developers do not need to be aware of the underlying storage architecture. They can just request for storage using a claim and Kubernetes will take care of provisioning those requests. It is the duty of the cluster administrators to keep some storage to be used as persistent storage.

Pod definition with PVC

Now that we are not using a GCE disk, we replace the gcePersistentDisk property with persistentVolumeClaim. The “claimName” parameter specifies the persistent volume claim we are going to use to request for storage.

mongodb-pod.yml

The rest is as same as when we used the GCE disk. Let’s deploy everything and verify that it works.

kubectl create -f cloudl-pv-claim.yml

If your cluster is running on GKE, you do not need to create PersistentVolumes explicitly. GKE will automatically create PersistentVolumes for your PersistentVolumeClaims.

We can see that our dummy data is still there in the database even after destroying the pod. Kubernetes will reattach the second MongoDB pod to the same provisioned 1GB storage used by the first MongoDB pod.

Conclusion

In the next article which is the eighth and final part of the series, I am going to talk about setting up a domain name and SSL certificates for our web app. I hope this article was interesting and you will also read the eighth part.

PS: Claps are appreciated!