Do you run Oracle database in containers? Do you run it on Kubernetes? Are you worried about data management – Backup & Recovery? To answer these questions, please read this article. In this post, we explore backup and recovery mechanism of a Oracle database deployed on Kuberentes. using the Kasten K10 data management platform. The backup and recovery strategy is to protect the database against data loss and reconstruct the database after data loss. Oracle database has multiple ways to create and manage the backup and restore, here we will talk about the application consistent backup and restore using third party snapshot technology like AWS EBS Snapshot.
In Application consistent backup, the Application is notified that the backup is being taken and application transactions/writes are quiesced for a short duration. Application consistent snapshots require quiescing the application to get consistent snapshots. Quiescing the application in contexts of cloud-native applications such as Oracle would mean flushing the in-memory data to disk and then take a snapshot.
In the case of Oracle database, a consistent backup would be done when:
Our solution is based on the below support notes available on Oracle support [you would need a login to access them, see references].
Note: Block Corruption (particularly lost write) can not be fixed by Snapshot based recovery. It requires RMAN or Dataguard for full protection.
Kasten K10 Ecosystem Source: Kasten Documentation
The Kasten K10 data management platform, purpose-built for Kubernetes, provides enterprise operations teams an easy-to-use, scalable, and secure system for backup/restore, disaster recovery, and mobility of Kubernetes applications.Kasten K10 comes with an excellent dashboard to manage the applications in the cluster, and you can create a policy to take a snapshot and restore it. K10 have multiple components, lets understand a few of them which we will require to achieve our goal.
You can read more about Kasten K10 here.
We will create an Oracle database deployment on the Kubernetes cluster with persistent storage and then perform the backup and restore using Kasten.
To install the Oracle database using deployment file with different available 12c, 18c and 19c please refer to Github repo’s oracle12c, oracle18ee, and oracle19ee directory respectively. If your Oracle database is not in the container, then please follow this Github repo by Oracle to run the database in a docker container.
Make sure the Kasten K10 application is installed in your cluster, if not please install it, refer to Kasten documentation
Once you access the K10 dashboard, create a K10 profile.
](https://github.com/infracloudio/Kasten-K10-Oracle-Blueprint/blob/master/Images/location_profile.png)
Now create a K10 policy for your application.
The platform allows domain experts to capture application specific data management tasks in blueprints which can be easily shared and extended. The framework takes care of the tedious details around execution on Kubernetes and presents a homogeneous operational experience across applications at scale. Further, it gives you a natural mechanism to extend the K10 platform by adding your own code to modify any desired step performed for data lifecycle management.
To create Blueprint in the same namespace as the Kasten controller. This blueprint is having 3 actions which are performed while backup and restore.
backupPrehook
: This action will be executed before the actual snapshot backup, create a restore.sql script with recovery time and SQL statement to recover from the database first in host persistence location, and then archive the current logs.backupPosthook
: This action will be executed after the snapshot backup is done and archive the current logs.restore
: After restoring from the k10 dashboard and once the pod is in the ready state this action will be executed and run the restore.sql script to make sure that recovery is done properly.$ kubectl create -f blueprint.yaml -n kasten-io
Annotate the Oracle Database deployment with a newly created blueprint.
$ kubectl -n {NAMESPACE} annotate deployment.apps/{DEPLOYMENT_NAME} kanister.kasten.io/blueprint={BLUEPRINT_NAME}
Once the Oracle database is running, you can populate it with some data. Let’s create a schema and generate some load. To generate data in the Oracle database for simulating load, we are using Swingbench. Swingbench is equipped with a utility like oewizard
which loads data. You can use SwingBench GUI from outside the pod to run the load but may lead to an increase in latency. My suggestion is to copy swingbench in the Oracle container/Pod and use CLI options as shown below.
$ cd /opt/oracle/oradata/swingbench/bin/
$ ./oewizard
You can copy the Swingbench to your running pod and run from there for faster operation.
# Download and copy SwingBench zip file to pod persistant volume.
$ kubectl -n NAMESPACE cp swingbenchlatest.zip {POD_NAME}:/opt/oracle/oradata/
$ kubectl -n oracle19ee exec -it pod/{POD_NAME} -- bash
# Set Java path
$ export PATH=$ORACLE_HOME/jdk/bin:$PATH
# Unzip the downloaded file
$ cd /opt/oracle/oradata/
$ unzip swingbenchlatest.zip
$ cd swingbench/bin/
# Create a schema using oewizard
$ ./oewizard -scale 1 -dbap {DB_PASSWD} -u {username} -p {password} -cl -cs //{ORACLE_DB_CONNECTION_STRING} -df '/opt/oracle/oradata/ORCL/{PDB_NAME}/{TABLESPACE_NAME}.dbf' -create
$ ./oewizard -scale 1 -dbap "Kube#2020" -u soe -p soe -cl -cs //localhost:1521/ORCL -df '/opt/oracle/oradata/ORCL/ORCLPDB1/soe.dbf' -create
[![K10
oewizard](https://github.com/ajayinfracloud/Kasten-K10-Oracle-Blueprint/raw/master/Images/oewizard_output.png)](https://github.com/infracloudio/Kasten-K10-Oracle-Blueprint/blob/master/Images/oewizard_output.png)
Now start the load from the pod itself.
```shell
# concurrent number of users
$ USERS=30
# for five minutes
$ RUNTIME="00:05:00"
$ ./charbench -c ../configs/SOE_Server_Side_V2.xml -cs //localhost:1521/ORCL -dt thin -u soe -p soe -uc $USERS -v users,tps,tpm,dml,errs,resp -mr -min 0 -max 100 -ld 1000 -rt $RUNTIME
# Note you can change the values as per your requirement.
You can also use the Swingbench GUI app to generate load from outside the pod as well. Do note that there will more latency so it is going to take a bit more time. You can refer to the Swingbench guide to understand how it works.
$ cd /opt/oracle/oradata/swingbench/bin/
$ ./swingbench
You can now take a backup of the Oracle data using K10 policies for this application. Go to the K10 dashboard and run the policy. Please note down the timestamp while running the policy to make sure the DB will be restored at the same point. This will start the backup process and you can check the logs of currently running pod to see the pre backup and post backup actions are performed.
# To check the deployment pod logs
$ kubectl -n {NAMESPACE} logs {POD} -f
Confirm the backup action policy ran completely in the K10 dashboard.
Under the hoods, depending on the Cloud platform’s volume snapshot technology. The backup will trigger a snapshot of volume along with all the configuration required to run the workload. As this is native to Kubernetes, it will backup objects like ConfigMap, Deployment, Persistent Volume, etc.
In our example, we have used AWS where the EBS snapshot happens immediately (point in time) and then it asynchronously is written to the S3 (in our case). The snap is immediate and doesn’t matter how big is your volume. But depending on the changes between the current snapshot and previous snapshot, the asynchronous write will take variable timing. And you can restore the full copy from any snapshot without about if you have copies of previous snapshots.
Let’s say one fine day, someone drops the Order table or corrupt it which is not an uncommon scenario. You can simulate data loss by dropping a table or modifying its content.
DROP TABLE SOE.ORDERS;
To restore the missing data, you should use the snapshot that you created before. Let’s say someone accidentally deleted some data and want to restore data to the last backup point. In Kasten, you will get options to restore a subset of application, you can also restore a copy in different Kubernetes namespace or also if the snapshot is stored in the object store (location profile) in the cloud, you can take it to next level and restore in different cluster altogether.
Go to the K10 dashboard, select application restore point from which you want to restore, and confirm restore. This will start the restore process and create a new pod with the restore point snapshot.
Confirm the restore is done completely.
# Check If Pod is in ready state.
$ kubectl -n {NAMESPACE} get pods
# Once Pod is ready, connect to the DB and verify the table ORDERS last record and it should equivalent to start time.
$ kubectl -n {NAMESPACE} exec -it <POD> -- bash
sqlplus soe/soe@ORCLPDB
select SYS_EXTRACT_UTC(max(order_date)) from soe.orders;
exit;
If you run into any issues with the above commands, you can check the logs of the controller using:
$ kubectl --namespace kasten-io logs <KANISTER_SERVICE_POD_NAME>
While we were running synthetic load through Swingbench, we also took a series of AWR snapshots to analyze the performance penalty on the system. What we see is that with Oracle’s third party snapshot storage optimization (i.e. when you don’t need to put the database in backup mode), the impact is nearly 5 times lesser as compared to hot backup mode. Our recommendation is to test this with your setup as mileage will vary for each application and workload.
$ kubectl -n {NAMESPACE} exec -it {POD_NAME} -- bash
# to generate AWR report
$ sqlplus / as sqlplus
$ @?/rdbms/admin/awrrpt.sql
# provide the details like type, begin and end snapshot id, file name, it will create a AWR report, copy this file to local and analyze the report fot performace impact.
# | Database Version | CPU / SGA | Users | Avg TPS | Total Tx / 5 min | Snapshot Size | Snapshot Time | Notes |
---|---|---|---|---|---|---|---|---|
1 | 12.2.0.1 | 4C / 5G | 30 | 1000 | 300031 | 40G | 00:02:33 | Â |
2 | 12.2.0.1 | 4C / 5G | 200 | 1075 | 322516 | 40G | 00:04:30 | Â |
Below is the database performance charts showing the number of commits, database time, and redo generated during the test. As you can see in the below chart, we ran a test at 7:52, Database time has a spike but there is not much impact on the number of transactions processed by the database. The end-user response time was intact. The spike is mainly due to increased time took the log file parallel write
event for the short duration when the snapshot was taken.
To uninstall/delete the oracle
deployment:
$ kubectl delete ns {NAMESPACE}
Remove Oracle blueprint from the Kasten namespace.
$ kubectl delete blueprints oracle-blueprint -n kasten-io
As we move to a cloud native platform from the monolith enterprise architectures, the very popular database like Oracle is often the bottleneck or show stopper. You might already have an application modernization roadmap and you want to go fully cloud native, but sometimes, it might be hard for your organization to rewrite the application that uses Oracle due to complex customization, heavy database-centric design, or you might be using applications that need Oracle database as a prerequisite. In many cases, you can containerize the Oracle database and orchestrate using Kubernetes. It improves availability and reduces the operational cost of managing the databases and works in the same way how other cloud native systems work in your enterprise. When you decide to move on, the next important step is the need for data management i.e. backup and recovery, disaster recovery, and application mobility where the Kasten K10 platform can help. In this post, we have suggested a solution to how we can use K10’s extensible framework for providing a data management layer for the Oracle database.
If you need more information, please feel free to contact us directly or through Twitter.
Below screencast is the summary of what we have showcased above.
This solution is jointly written by Infranauts – Ajay Kangare and Anjul Sahu