Kubernetes and Databases: Deploying a Database on K8s with StatefulSets
Kubernetes is a major, modern improvement in development, and databases are an essential part of the application. In this post, we will see how we can deploy a database in K8s, and what approaches we can use to deploy a database in K8s
Databases (DBs) are a system for storing and taking care of data on a computer system. DB engines can create, read, update and delete on the database. A DB ic controlled by a Database Management System (DBMS). In most databases, data is modeled in rows and colums and called relational. There are also other alternative to store semi-structure data in NoSQL databases. Typical examples for NoSQL DB types are document and graph databases.
In this post, we are going to deploy a database in K8s, so we have to be aware of what StatefulSet is. A StatefulSet is the workload API object used to manage stateful applications. It manages the implementation and expansion of a set of Pods, and provides guarantee on the order and uniqueness of these Pods.
Like a deployment, it manages the Pod that has an identical container specification. Pods that are maintained by StatefulSets have a unique, persisten identity and stable hostname regardless of which node they are on. If we want persistance across strorage, we can create a PersistentVolume and use StatefulSet as a part of the solution. Although individual Pods in a StatefulSet are prone to failure, persistent Pod identifiers make it easier to match existing volumes to new Pods that replace any that have failed.
StatefulSets are valuable for applications that require one or more of the following:
- Stable, unique network identifiers
- Stable, persistent storage
- Ordered, graceful deployment and scaling
- Ordered, automated rolling updates.
When deploying a database on K8s, we need to use StatefulSets, but some of the limitations of using StatefulSets are:
- Required use of Persistent Volume provisioner to provision for pod-based on request storage class.
- Deleting or scaling down the replicas will not delete the volume attached to StatefulSet. It ensures the safety of the data.
- StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods.
- StatefulSet doesn't provide any guarantee to delete all Pods when StatefulSet is deleted, unlike deployment, which deletes all Pods associated with deployment when the deployment is deleted. You have to scale down Pod replicas to 0 prior deleting StatefulSet.
Databases on K8s
We can deploy a database to K8s as a stateful application. Usually, when we deploy Pods they have their own storage, but that storage is ephemeral - if the container kills its storage, it's gone with it.
So, we have a K8s object to tackle that scenario: When we want out data to persist, we attacha Pod with a respective Persistent Volume Claim. By doing it this way, if our container kills our data, it will be in the cluster, and the new Pod will access the data accordingly.
Pod --> PVC --> PV
Operators to Deploy Databases to K8s.
We can deploy databases with different operators developed by the DBs. For example:
- MySQL DB using K8s operator developed by Oracle
- PostgreSQL operator by Crunchydata to deploy PostgreSQL to K8s
- MongoDB owns an operator to deploy MongoDB Enterprise to K8s cluster.
Is it Feasible to Deploy a DB on K8s?
Why not? I guess that is totally depending on your IT operational model. There are many companies working on containerized technologies. Instead of utilizin ready-managed services, we can also deploy our own own databases in a scalable way. And as usual, always review your IT operational model before giving a decision for this. But here are some alternatives:
Fully Managed Databases
Fully managed databases are those that don't have to provision or manage the database. This management is done by cloud providers like AWS, Google, Azure, etc. Managed databases includes Aurora, DynamoDB (from AWS), Google Spanner and SQL, etc. These databases are used because of a low-ops choice, cloud providers handle many of the maintenance tasks, such as backup, scaling, etc. You just have to create a database to build your fancy app on top of it, and let the cloud providers handle the rest for you - which I like most :))
Deploying by Yourself on VM, or On-premise Machines
With this option, you can deploy the database to any VM (EC2 or Compute Engine), and you will have full control. You will be able to deploy any version of database, and you can set your own security and backup plans. On the other hand, this means that you will manage, scaling or provision the database on your own. You will also have to have an administor in place, who will manage the administer your database. This will add cost to your infrastructure, but has the advantage of flexibility.
Run It on K8s (Rock'n'Roll)
Deploying the database on K8s is closer to the full-ops option, but you'll get some benefits in terms of automation that K8s provides to keep the database application up and running. It is important to remember that Pods are ephemeral, so the possibility that the database restarts or fails is greater. Also, you will be responsible for the most specific database administrative tasks such as backups, scaling, etc.
Some important points to consider when choosing to deploy database on K8s are.
- There are custom resources and operators available to manage the database on K8s
- Databases that have caching layers and more transient storage are better fits for K8s
- You have to understand the replication mode available in database. Asynchronous modes of replication leave room for data loss, because the transactions might be committed to the primary database, but not to the secondary ones.
Above diagram can be used as a simple chart to show what the decision tree looks like when deploying databases on Kubernetes. First, we need to understand if the database has K8s-friendly features, such as MySQL or PostgreSQL, then we need to find/plan for K8s operators to package the database with additional features. The second question is how much workload is acceptable give what we have seen is needed to deploy a database in K8s? Do we have a team of operation site engineers, or would we find it feasible to deploy the database on a managed K8s?
Sample MySQL deployment on K8s
In this section of the post, I will provide the manifests to deploy a stateful MySQL database on a K8s cluster. I assume you already have a cluster up and running.
Step 1: Deploying the MySQL Service
apiVersion: v1 kind: Service metadata: name: mysql spec: ports: - port: 3306 selector: app: mysql clusterIP: None
First we deploy the service for MySQL database on port 3306, with all Pods having label key app and value
mysql. Next, create the resource
$ kubectl create -f mysql_service.yml
Step 2: MySQL Deployment
apiVersion: apps/v1 kind: Deployment metadata: name: mysql spec: selector: matchLabels: app: mysql strategy: type: Recreate template: metadata: labels: app: mysql spec: containers: - image: mysql:5.6 name: mysql env: # Use secret in real usage - name: MYSQL_ROOT_PASSWORD value: password ports: - containerPort: 3306 name: mysql volumeMounts: - name: mysql-persistent-storage mountPath: /var/lib/mysql volumes: - name: mysql-persistent-storage persistentVolumeClaim: claimName: mysql-pv-claim
This deployment creates Pods with image MySQL, with 5.6 tags, with an environment variable password on port 3306. We will also attach a Persistent Volume
mysql-pv-claim. To create the deployment resource run the following:
$ kubectl create -f mysql_deployment.yml
Step 3: Creating Persistent Volume
apiVersion: v1 kind: PersistentVolume metadata: name: mysql-pv-volume labels: type: local spec: storageClassName: manual capacity: storage: 20Gi accessModes: - ReadWriteOnce hostPath: path: "/mnt/data"
This manifest creates a Persistent Volume that we will use to attach the Pod, to ensure data safety on restart. Persistent Volume Claims 20GB from storage with
ReadWriteOnce access mode. Host path is
/mnt/data where all out data will reside. To create the resource, run the following command
$ kubectl create -f persistance_volume.yml
Step 4: Creating Persistent Volume Claim
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mysql-pv-claim spec: storageClassName: manual accessModes: - ReadWriteOnce resources: requests: storage: 20Gi
This creates the Persistent Volume Claim, that claims 20GB from the Persistent Volume we created above, with the same access mode
ReadWriteOnce as used above. To create the resource, run the following command
$ kubectl create -f pvClaim.yml
Step 5: Test the MySQL Database
kubectl run -it --rm --image=mysql:5.6 --restart=Never mysql-client -- mysql -h mysql -ppassword
This command creates a new Pod in the cluster running a MySQL client, and connects it to the server through the Service. If it connects, you can run your queries in the console
Waiting for pod default/mysql-client-274442439-zyp6i to be running, status is Pending, pod ready: false If you don't see a command prompt, try pressing enter. mysql>