Whenever we talk about data-intensive applications, we usually also consider how we are going to manage that data of those applications. If we talk specifically about the stateful applications, people are still a bit dubious about running them on top of Kubernetes, because of the data management issues.
If we talk about data management in a bit more detail, there are a lot of other things involved apart from backup and restore. For example, application-consistent backup of the database, migration of the database, disaster recovery, or point in time recovery of the database. In this post, we are going to look into an opensource project named Kanister that helps us backup and restore applications that are deployed on Kubernetes.
Kanister is an open-source project by Kasten, that enables us to manage (backup and restore) application data on Kubernetes. You can deploy Kanister as a helm application into your Kubernetes cluster and start exploring things. Kanister uses Kubernetes custom resources very heavily to perform actions and do other things.
The main custom resources (referred as CR) that get installed when we
install Kanister are Profile
, Blueprint
and ActionSet
. Now if take
a typical scenario where we would want to backup our application using
an utility, that has been provided by the application or database itself
(For example, in case of Elasticsearch esdump
or for MySQL
mysqldump
), and after taking the backup we would like to upload that
backup to an external persistent storage so that the backup can be
restored later. In the next section, we will see how Kanister CRs help
us achieve this.
In the scenario we mentioned right above, the object storage details are maintained in the CR Profile, it has all the details about the object storage that would later be used by the platform to upload the backup to the object storage. Similarly the steps that are to be taken to backup and restore the database should be maintained in the Blueprint CR, it is going to have all the details for example how to take backup of the database that is running and once backup is completed how we can restore that backup. Now that we have the steps to backup and restore the database in Blueprint, we call (run) those steps (or actions in terms of Kanister) by creating Actionset CRs.
Now considering what we have just discussed, if we try to look into below architecture diagram, we can see that when we create an Actionset, a Blueprint is discovered and then the logic that is there in the actions of the blueprint is executed and finally if that action is completed/failed the respective status is updated in the Actionset.
Now that we have a clear understanding of all the CRs involved, let’s go ahead and try to create this entire workflow, taking MySQL as an example.
Kanister is packaged and distributed through helm chart and we can use below command to install it
~ » helm repo add kanister https://charts.kanister.io/
"kanister" has been added to your repositories
~ » kubectl create ns kanister
namespace/kanister created
~ » helm install myrelease --namespace kanister kanister/kanister-operator --set image.tag=0.32.0 1 ↵ vivek@workmachine
NAME: myrelease
...
...
Thank you for trying Kanister.
# get all the pods from Kanister namespace to make sure it is installed successfully
~ » kubectl get pods -n kanister
NAME READY STATUS RESTARTS AGE
myrelease-kanister-operator-7d5cdb987c-9grdh 1/1 Running 0 4m43s
Once we have Kanister installed, if we go ahead and try to list all the CustomResourceDefinitions in the cluster we would see the CRs that we have discussed above
~ » kubectl get customresourcedefinitions.apiextensions.k8s.io
NAME CREATED AT
actionsets.cr.kanister.io 2020-07-28T09:16:39Z
blueprints.cr.kanister.io 2020-07-28T09:16:39Z
profiles.cr.kanister.io 2020-07-28T09:16:39Z
Once we have Kanister installed successfully, let’s go ahead and try to install the MySQL database that we are going to backup and restore using Kanister.
To install MySQL from stable helm chart, please run below command
~ » kubectl create namespace mysql-test
namespace/mysql-test created
~ » helm install mysql-release stable/mysql --namespace mysql-test \
--set mysqlRootPassword='asd#45@mysqlEXAMPLE' \
--set persistence.size=10Gi
NAME: mysql-release
...
...
mysql -h ${MYSQL_HOST} -P${MYSQL_PORT} -u root -p${MYSQL_ROOT_PASSWORD}
# we can list all the pod from mysql-test namespace to make sure mysql has been deployed successfully
~ » kubectl get pod -n mysql-test
NAME READY STATUS RESTARTS AGE
mysql-release-866dc87447-bdq96 1/1 Running 0 66s
Once we have MySQL installed now its turn to define the steps (actions) that should be taken to backup and restore this database. But before that let’s go ahead and create the Profile resource that is going to have the details about the object storage. I have details of the AWS S3 with me, but you can use any one of the supported object storage (S3, Azure or GCS).
To create the Kanister custom resources (For example profile or
actionset), Kanister provides a command line utility, kanctl
and
another utility kando
that is used to interact with your object
storage provider from blueprint and both of these utilities (kanctl & kando) can be
installed from here.
Please execute following command to create a profile resource
~ » kanctl create profile s3compliant --access-key $ACCESS_KEY \
--secret-key $SECRET_KEY \
--bucket $BUCKET --region ap-south-1 \
--namespace mysql-test
secret 's3-secret-4ratwf' created
profile 's3-profile-gzbmn' created
So, if we look a bit cluster into this, apart from the object storage
location we just specify which namespace our application, that we are
going to take backup of, is deployed on. And that, in our case is
mysql-test
.
Once Profile resource is created now its turn to create the Blueprint that is going to maintain the steps (actions) backup and restore and the commands that should be executed for each actions that are backup and restore.
Now if we take this example of MySQL and let’s say we want to backup the
database using the utility mysqldump
, what we can instruct blueprint
to do is: execute mysqldump
command onto the MySQL pod that is running
in the mysql-test
namespace. This execution of a command on the pod to
take backup is pretty generic and would be useful for a lot of other
databases and scenarios as well. To facilitate that Kanister provides us
with some functions (called as Kanister
functions) that can be used to
achieve this. And the specific function that might be used in this case
is KubeTask
function. This function spins up a new pod and then runs
the commands that are there in the blueprint into the specified pod. For
more details about the
KubeTask function,
we can refer to the Kanister docs. Kanister provides a lot of other
useful functions that the list can be found at this Kanister documentation.
So now we know that we have to specify backup action using the command
mysqldump
that would need the host where the MySQL is deployed. Since
we are going to have all these steps in the Blueprint, we should have a
way to specify where a specific database is running. Luckily Kanister
provides go template support
for Blueprint and we can leverage that to read the object that would be
passed to Actionset custom resource.
If we simplify this, we would need the connection details (service name
in case of Kubernetes) to run the mysqldump
command using KubeTask
function. While creating Actionset we also specify an object (that we
will see in a bit) and that object is passed to the Blueprint and
eventually can be read using go template. So if we take a look into
below snipped
mysqldump --column-statistics=0 -u root --password=${root_password} -h --single-transaction --all-databases
If we specify this command as function KubeTask for action Backup in
blueprint, and pass the deployment name (dep-name
) while creating
backup actionset, this would run the mysqldump command for the MySQL
deployment (service name dep-name
) that we have running in
mysql-test
namespace.
Now that we have some understanding of the blueprint let’s go ahead and have a look at the blueprint that the Kanister team provides for us and can be found in this raw Github repo here. Let’s look into what backup action has to say and you would be able to figure out details about restore action.
Take a look into the spec of the Blueprint below, we can see that the
action backup
is going to produce an artifact named mysqlCloudDump
and has a phase named dumpToObjectStore
that is going to be the
KubeTask kanister function. Since we would need MySQL password to run
mysqldump
command we are taking the password with the help of object
reference. The helm install
command that we used to install the MySQL
database created this secret (the secret name is same as the name of the
MySQL deployment) that has the MySQL root password for key
mysql-root-password
.
After that, we are just specifying the pod that would be created as part of KubeTask function and all the commands that should be executed in that pod are mentioned there for this phase. Now we should have one question about this spec and that would be where would we get the deployment and profile from, to get to know that let’s go ahead and create the backup actionset.
backup:
type: Deployment
outputArtifacts:
mysqlCloudDump:
keyValue:
s3path: "{{ .Phases.dumpToObjectStore.Output.s3path }}"
phases:
- func: KubeTask
name: dumpToObjectStore
objects:
mysqlSecret:
kind: Secret
name: '{{ .Deployment.Name }}'
namespace: '{{ .Deployment.Namespace }}'
args:
image: kanisterio/mysql-sidecar:0.31.0
namespace: "{{ .Deployment.Namespace }}"
command:
- bash
- -o
- errexit
- -o
- pipefail
- -c
- |
s3_path="/mysql-backups/{{ .Deployment.Namespace }}/{{ .Deployment.Name }}/{{ toDate "2006-01-02T15:04:05.999999999Z07:00" .Time | date "2006-01-02T15-04-05" }}/dump.sql.gz"
root_password="{{ index .Phases.dumpToObjectStore.Secrets.mysqlSecret.Data "mysql-root-password" | toString }}"
mysqldump --column-statistics=0 -u root --password=${root_password} -h {{ .Deployment.Name }} --single-transaction --all-databases | gzip - | kando location push --profile '{{ toJson .Profile }}' --path ${s3_path} -
kando output s3path ${s3_path}
To create this blueprint execute below command, please make a note that we create the blueprint in the namespace where Kanister controller is deployed
~ » kubectl create -f https://raw.githubusercontent.com/kanisterio/kanister/master/examples/stable/mysql/mysql-blueprint.yaml -n kanister
blueprint.cr.kanister.io/mysql-blueprint created
Now that we have blueprint, that has details on how to backup and restore the database, created. We can go ahead and create the actionset resource to actually run the backup and restore phases from the blueprint. But before that lets quickly insert some dummy details into the MySQL database so that we can make sure the records have been restored.
# exec into the mysql pod and insert some records into a table
~ » kubectl exec -it -n mysql-test mysql-release-866dc87447-bdq96 bash
root@mysql-release-866dc87447-bdq96:/# mysql -u root --password=asd#45@mysqlEXAMPLE
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 590
Server version: 5.7.30 MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> create database test;
Query OK, 1 row affected (0.01 sec)
mysql> use test;
Database changed
mysql> create table employees (name varchar(100), age int);
Query OK, 0 rows affected (0.29 sec)
mysql> insert into employees values ("Robert", "31");
Query OK, 1 row affected (0.01 sec)
mysql> insert into employees values ("John", "28");
Query OK, 1 row affected (0.01 sec)
mysql> select * from employees;
+--------+------+
| name | age |
+--------+------+
| Robert | 31 |
| John | 28 |
+--------+------+
2 rows in set (0.00 sec)
To create actionset please run below command
~ » kanctl create actionset --action backup --namespace kanister --blueprint mysql-blueprint --deployment mysql-test/mysql-release --profile mysql-test/s3-profile-gzbmn
actionset backup-skngt created
Now I think things would make sense because as you can see we are
providing the deployment and the profile here, similarly you can pass
any object that you want to, for example, statefulset or even
configmaps, secrets and any other Kubernetes object using –objects flag
in the format --objects
group/version/resource/namespace1/name1,group/version/resource/namespace2/name2
.
Once you have created the blueprint you can check the status of the blueprint by describing it, to make sure the actionset was successful. To debug the things further you can also check the logs of the controller pod that is deployed in the Kanister namespace.
~ » kubectl describe actionset -n kanister backup-skngt
Name: backup-skngt
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Started Action 6m40s Kanister Controller Executing action backup
Normal Started Phase 6m40s Kanister Controller Executing phase dumpToObjectStore
Normal Ended Phase 6m26s Kanister Controller Completed phase dumpToObjectStore
Normal Update Complete 6m26s Kanister Controller Updated ActionSet 'backup-skngt' Status->complete
Once backup actionset is complete, let’s go ahead and delete the data from the database to imitate disaster and then we would run restore action to actually restore the data from the backup that we have already taken.
# exec into the mysql pod and run below in mysql shell
mysql> drop table employees;
Query OK, 0 rows affected (0.26 sec)
mysql> select * from employees;
ERROR 1146 (42S02): Table 'test.employees' doesn't exist
Now that we have deleted the data from the mysql database let’s go ahead and restore the backup that we have already taken by creating restore actionset
~ » kanctl --namespace kanister create actionset --action restore --from backup-skngt
actionset restore-backup-skngt-5q2k2 created
To confirm the actionset has been completed successfully we can describe the actionset like we described the backup actionset. Once the we have made sure the actionset is completed we can exec into the mysql pod once again to make sure the data has been restore.
Please login into the MySQL pod and run below command to make sure the data has been restored
mysql> select * from employees;
+--------+------+
| name | age |
+--------+------+
| Robert | 31 |
| John | 28 |
+--------+------+
2 rows in set (0.00 sec)
# As you can see the data has been restored successfully
And as you can see we were successfully able to backup and restore the MySQL database using the blueprint that was already provided by the Kanister team. They already provide blueprints for some databases that can be found in this Kanister repo here, but you are totally free to create your own blueprint as well.
To summarise below are the steps that we have taken to backup or MySQL database
If you encounter any issues while going through this article you can, do connect with me via Twitter or you can also reach out to the Kanister team via Slack.
Looking for help with Kubernetes adoption or Day 2 operations? do check out how we’re helping startups & enterprises with our managed services for Kubernetes.