This blog post explores the K10 data management platform by Kasten. This exercise was performed for a customer where I had to propose a Disaster Recovery (DR) strategy for Kubernetes clusters. The K10 platform functions as a way to perform backup/restore of Kubernetes applications and their volumes. This is specially useful in scenarios of disaster recovery and application migration as we will explore in the post below.
The objective of this post is to share the learnings of exploring backup and disaster recovery with the K10 platform and how you can implement it. One of the requirements by customer was to have a cross region active-passive setup of kubernetes clusters along with its data volumes replicating data from the source to the destination at a regular cadence. This is the first part of the series of evaluating platforms that provide a Backup and DR.
Some of the key evaluation criteria were:
Let’s explore the K10 platform and how it can help implement and simply disaster recovery on your Kubernetes clusters.
The K10 platform is completely Kubernetes native and gets installed on the Kubernetes cluster which needs to be backed up. The platform installs as a bunch of CRDs and controllers by a helm chart. It lives in its own namespace on the cluster and autodiscovers the installed applications within the cluster.
Applications An Application represent the collection of resources such as configmaps, secrets, services, and application workloads within a namespace which are autodiscovered by K10 . A policy can be associated to the application or to a subset of resources within the application for backup/restore.
Policies Policies allow defining the actions that need to be taken when the policy is executed. The action could be to perform a snapshot or to perform an import of a previously exported backup. The frequency at which the action should be performed, the retention period, the selection of resources to backup are all defined in the policy. The policy also refers to a Location profile
Profiles A Location profile is an objectstore which is used to store the backup metadata as well as the volume data.
K10 Disaster recovery The K10 Disaster recovery performs a backup of the K10 namespace and its metadata alongwith the restore points for all the applications to enable the recovery of the K10 platform itself in case of a disaster.
K10 Dashboard The K10 dashboard is a minimalistic UI application which allows administration or crdreation of all the above mentioned objects. It allows administrators to perform adhoc backups, change retention periods of these backups among many other things without needing access to the kubectl CLI.
Note: K10 allows creation of snapshots and backups or both while defining the Policy.
A snapshot in the Kubernetes cluster is typically talked in context to a VolumeSnapshot and VolumeSnapshotContent resources which are the representation of a snapshot of a storage volume. Since the snapshot objects are present in the kubernetes cluster itself they fall in the same failure plane since a failure in the cluster also causes loss of the snapshots themselves. Hence while creating a policy K10 allows these snapshots to be backed up in an object store (Location profile) and are referred to as backups.
Let us look at some of the use-cases that were used to evaluate K10 platform.
Install the snapshot CRDs if they are not installed by default on the source and target clusters.
# Install the CSI snapshotter
SNAPSHOTTER_VERSION=v2.1.1
# Apply VolumeSnapshot CRDs
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/${SNAPSHOTTER_VERSION}/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/${SNAPSHOTTER_VERSION}/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/${SNAPSHOTTER_VERSION}/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml
Install the CSI snapshot controller (This controller will watch over the lifecycle of the VolumeSnapshot CRD)
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/${SNAPSHOTTER_VERSION}/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/${SNAPSHOTTER_VERSION}/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml
Follow the below two steps if the setup is on a Kind cluster
git clone https://github.com/kubernetes-csi/csi-driver-host-path.git
cd csi-driver-host-path
./deploy/kubernetes-1.18/deploy.sh
Configure the CSI hostpath StorageClass to be the default StorageClass
if you use a kind
cluster.
cat << eof | kubectl apply -f -
apiVersion: snapshot.storage.k8s.io/v1beta1
driver: hostpath.csi.k8s.io
kind: VolumeSnapshotClass
metadata:
annotations:
k10.kasten.io/is-snapshot-class: "true"
snapshot.storage.kubernetes.io/is-default-class: "true"
name: csi-hostpath-snapclass
deletionPolicy: Delete
eof
helm repo add bitnami https://charts.bitnami.com/bitnami
kubectl create ns mongo
helm install mongo --namespace mongo bitnami/mongodb
Create some data in the Mongo database
export MONGODB_ROOT_PASSWORD=$(kubectl get secret --namespace mongo mongo-mongodb -o jsonpath="{.data.mongodb-root-password}" | base64 --decode)
kubectl run --namespace mongo mongo-mongodb-client --rm --tty -i --restart='Never' --image docker.io/bitnami/mongodb:4.2.8-debian-10-r7 --command -- mongo admin --host mongo-mongodb --authenticationDatabase admin -u root -p $MONGODB_ROOT_PASSWORD
db.demodb.insert({ name: "Jane Doe", twitter_id: "jane_doe", })
db.demodb.insert({ name: "John Doe", twitter_id: "john_doe", })
Follow the instructions to setup K10 from their documentation
The object store is required by the Location profile. This object store holds the backups from the cluster.
A GCS bucket is used in this demo.
Create a bucket named k10-backups
in GCS.
A location profile specifies the object store where the backup metadata is stored. In this example we create a gcs storage bucket and use that in the Location profile
The service account should have permissions to create a storage bucket
Fill in the `project-id` and `service-account.json` as relevant.
kubectl create secret generic k10-gcs-secret \
--namespace kasten-io \
--from-literal=project-id=<gcp-projectid> \
--from-file=service-account.json=<./sa-key.json>
cat << eof | kubectl apply -f -
apiVersion: config.kio.kasten.io/v1alpha1
kind: Profile
metadata:
name: backup-default
namespace: kasten-io
spec:
locationSpec:
credential:
secret:
apiVersion: v1
kind: secret
name: k10-gcs-secret
namespace: kasten-io
secretType: GcpServiceAccountKey
objectStore:
name: k10-backups
objectStoreType: GCS
pathType: Directory
region: asia-south1
type: ObjectStore
type: Location
eof
This backup policy backs up the application and the volume snapshots and
exports them to the GCS bucket every 15th minute of an hour.
This Policy assumes that the MongoDB installation is in a namespace
called mongo
.
cat <<EOF | kubectl apply -f -
apiVersion: config.kio.kasten.io/v1alpha1
kind: Policy
metadata:
name: backup-policy
namespace: kasten-io
spec:
comment: "Backup policy"
frequency: '@hourly'
subFrequency:
minutes:
- 15
paused: false
retention:
hourly: 24
daily: 7
weekly: 4
monthly: 12
yearly: 7
actions:
- action: backup
- action: export
exportParameters:
frequency: '@hourly'
profile:
name: backup-default
namespace: kasten-io
exportData:
enabled: true
selector:
matchExpressions:
- key: k10.kasten.io/appNamespace
operator: In
values:
- mongo
EOF
Get the receive string from the source cluster by describing the policy. The receiveString is a token that is present on the policy of the source cluster and is used when we need to migrate applications across clusters. The receiveString is used while creating the restore policy on the target cluster.
EXPORT_STRING=$(kubectl get policy backup-policy -n kasten-io -o jsonpath='{.spec.actions[1].exportParameters.receiveString}')
Use the EXPORT_STRING from the backup policy on the source while applying the restore policy on the target cluster.
cat << eof | kubectl apply -f -
kind: Policy
apiVersion: config.kio.kasten.io/v1alpha1
metadata:
name: restore-default
namespace: kasten-io
spec:
comment: Restore policy for Postgres
frequency: "@hourly"
subFrequency:
minutes:
- 15
hours:
- 0
weekdays:
- 0
days:
- 1
months:
- 1
selector: {}
actions:
- action: import
importParameters:
receiveString: bIzAPpoanmEU0S57nj9FqtUkRn8TD0ig+TKu4Gg0KaE7acJYzjyDRti0e+nbkKsGfFjezKuNGWik9SNd1g6xyGY0+AYfLO+bYbay8eWagcya56Fh53Acb1moutKRBLJlQJEXpAoOkeJJsuvRtK3Sw0mnMsHTxQIVp1/rBhjUisGH1YpeUQKJyTvL7jWIOEtupek9PYKhqyEf3goMMHjXqtjxHy24Sj/i7jNKpoSNJI5YspGNdGaVY4YStbqUj8WyNYGfKqqXc8E/WHTxu1ty7TLd8+OEeuvNyQ2NDyU7CXVyQnjzonU3ti75lNbQ8Mp5y1w5apYKk3MNn8Uk2GTcGfNH9/lSZAgX4sZmld/rqr7nhFycy/fVuH141DDp3mw874DseI9W3+2kHjI/l9y0tWcW+rdfoWIOEFMSNvofYQ
profile:
name: backup-default
namespace: kasten-io
- action: restore
restoreParameters: {}
eof
This is all that is needed for setup of a migration from a source cluster to a target cluster for a particular application in the K10 platform. The migration from the source to the target cluster works across regions as well.
Application consistent snapshots require quiescing the application to get consistent snapshots. Quiescing the application in contexts of cloud native applications such as Cassandra would mean flushing the in memory data to disk and then take a snapshot. K10 delegates this to Kanister in order to take application consistent snapshots. Kanister is an open source framework (built by Kasten) which allows defining blueprints for taking application consistent snapshots. The Kanister deployment consists of a kanister controller and 3 CRDs. namely,
Although covering the features of Kanister would require a seperate blog post entirely, but we will quickly look at how Kanister blueprints work.
A Kanister Blueprint is a collection of actions which are performed on the target application. Each action defines phases which are executed in order. A phase defines a Kanister function to execute.
ActionSets are required when we use Kanister alone without the K10 platform. Creating an ActionSet instructs the Kanister controller to execute the Blueprint on the target application.
In order to take application consistent snapshots apply the following blueprint on the mongo installation at the source cluster.
cat <<eof | kubectl apply -f -
apiVersion: config.kio.kasten.io/v1alpha1
kind: Profile
metadata:
name: kanister-profile
namespace: kasten-io
spec:
type: Kanister
kanister:
credential:
secretType: GcpServiceAccountKey
secret:
apiVersion: v1
kind: Secret
name: k10-gcs-secret
namespace: kasten-io
location:
type: ObjectStore
objectStore:
name: k10-backups
objectStoreType: GCS
region: asia-south1
eof
cat << eof | kubectl apply -f -
apiVersion: cr.kanister.io/v1alpha1
kind: Blueprint
metadata:
name: mongodb-blueprint
actions:
backupPrehook:
type: Deployment
phases:
- func: KubeExec
name: lockMongo
objects:
mongodb:
kind: Secret
name: mongo-mongodb
namespace: ''
args:
namespace: ""
pod: ""
container: mongo-mongodb
command:
- bash
- -o
- errexit
- -o
- pipefail
- -c
- |
export MONGODB_ROOT_PASSWORD=""
mongo --authenticationDatabase admin -u root -p "${MONGODB_ROOT_PASSWORD}" --eval="db.fsyncLock()"
backupPosthook:
type: Deployment
phases:
- func: KubeExec
name: unlockMongo
objects:
mongodb:
kind: Secret
name: mongo-mongodb
namespace: ''
args:
namespace: ""
pod: ""
container: mongo-mongodb
command:
- bash
- -o
- errexit
- -o
- pipefail
- -c
- |
export MONGODB_ROOT_PASSWORD=''
mongo --authenticationDatabase admin -u root -p "${MONGODB_ROOT_PASSWORD}" --eval="db.fsyncUnlock()"
eof
The interesting part about Kanister is that it allows creating custom blueprints for the application. It provides Kanister functions to execute as part of the phase and most of them should be sufficient but it is quite easy to create a function if they do not satisfy specific requirements. If the K10 platform is not in use then the use of ActionSets is much more relevant where application an ActionSet resource would define the Actions to run as part of the Blueprint.
The K10 platform does deploy a Prometheus server while installation but
in most cases we would already have an elaborate setup for monitoring on
the kubernetes cluster by the time the K10 platform is deployed.
The platform does expose some of the metrics for each service
deployed.
The catalog service exports a metric catalog_actions_count
which is a
gauge.
This metric can be used to setup an alert either in Alertmanager or in Grafana to see if a backup / export is failing for a certain period of time.
eg:
catalog_actions_count{liveness="live",status="complete",type="backup"} 2
catalog_actions_count{liveness="live",status="complete",type="export"} 2
The K10 platform comes with its own disaster recovery. This requires creating a Location profile and enabling K10 DR. This creates a default policy which backs up the K10 platform itself. In case of a DR the restore requires a passphrase which is generated when K10 DR is enabled.
All the functionality of creating Policies, Location profiles, and specifying the schedule of the backups and configuring the retention periods of these backups/imports are totally configurable via the UI dashboard.
The K10 platform provides a very flexible approach at backup and recovery for kubernetes applications as well as their data. The platform is extensible via the use of Kanister Blueprints and the dashboard is a nice addition to manage all these features right through the UI without requiring kubectl access.
I hope you enjoyed the blog post and gained insights into implementing disaster recovery on your Kubernetes clusters. If you have any queries or follow-up questions, please feel free to start a conversation on Twitter.
Looking to implement Kubernetes backup and disaster recovery? learn more about our capabilities and why startups & enterprises consider as one of the best Kubernetes consulting services companies.
Images are used from the K10 platform documentation Kasten Documentation Kanister Prometheus consulting and enterprise support capabilities Kasten K10 support