This blog post is written by Infranaut – Bhavin Gandhi for Yugabyte.
HashiCorp’s Terraform is an infrastructure as code (IaC) tool as well as a framework. It is used for writing the infrastructure configuration in a declarative way using HashiCorp Configuration Language (HCL) or JSON. It has now become a preferred tool when it comes to provisioning infrastructure on different cloud platforms. This is mainly due to the wide range of supported providers and the extensible nature of the tool itself. One can write provider plugins, modules to extend Terraform.
In this blog post, we will be looking into the Terraform modules for provisioning YugabyteDB on different cloud providers. We will be focusing mostly on the terraform-aws-yugabyte module which is specific to Amazon Web Services (AWS) while pointing out minor differences in other modules. We will cover the topics including provisioning a cluster, managing nodes, understanding different configuration options, and more.
Let’s create a 3 node cluster using the terraform-aws-yugabyte module. Before starting with that, make sure you have installed Terraform on your machine. Follow these instructions for Terraform intallations. We are using Terraform v0.13.3 in this post.
Create a new Terraform configuration file deploy-yugabytedb.tf
with
the following contents:
module "yugabyte-db-cluster" {
source = "github.com/yugabyte/terraform-aws-yugabyte"
region_name = "us-east-1"
vpc_id = "<vpc id>"
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
subnet_ids = ["<subnet id from 1a>", "<subnet id from 1b>", "<subnet id from 1c>"]
ssh_keypair = "yb-cluster-key"
ssh_private_key = "/opt/keys/yb-cluster-key.pem"
cluster_name = "yb-cluster"
replication_factor = "3"
num_instances = "3"
}
output "outputs" {
value = module.yugabyte-db-cluster
}
The values we set under the module block are the configuration variables for our terraform-aws-yugabyte module.
source
is the location where the module is hosted. This can be a
link to the Git repository or local path to the clone.region_name
is the AWS region where the cluster will be created.vpc_id
is the ID of a pre-existing VPC in the region_name
.availability_zones
is a list of availability zones from the
region_name
above. The cluster nodes are distributed across these
zones.subnet_ids
is a list of subnets corresponding to the availability
zones. The first subnet should belong to the first zone from
availability_zones
list and so on.ssh_keypair
is the name of the AWS EC2 keypair which is already
present in the AWS account. This is used to connect with the nodes.ssh_private_key
is the local path to the .pem file of the above
keypair.allowed_sources
specifies a list of allowed CIDR source blocks in
the security group. We will be talking more about this shortly.cluster_name
is used for naming different resources in AWS.replication_factor
controls the replication of the data in the
YugabyteDB cluster. Read more about replication factor (RF)
here.num_instances
specifies the number of instances to create for this
cluster. This number must be greater than or equal to the
replication_factor
.Take a look at the variables.tf file for a complete list of supported configuration variables.
The output block of deploy-yugabytedb.tf
specifies which values to
print once the Terraform stack is created. In our case we are printing
all the output values from yugabyte-db-cluster, which we have defined
using the module
block.
To set the required AWS configuration and credentials, export following environment variables with appropriate values.
export AWS_ACCESS_KEY_ID="AKIAIOSFODNN7EXAMPLE"
export AWS_SECRET_ACCESS_KEY="wJal/…/bPxRfiCYEXAMPLEKEY"
Take a look at the environment variables section from the AWS Provider docs for more information.
NOTE: The method to pass access credentials to Terraform is provider specific. These details are given in the documentation for the respective modules – terraform-gcp-yugabyte and terraform-azure-yugabyte (and in this post for AWS).
With all the housekeeping done, let’s create our cluster now.
To initialize the Terraform directory where we have created the
deploy-yugabytedb.tf
file, run the following command.
$ terraform init
Initializing modules...
Downloading github.com/yugabyte/terraform-aws-yugabyte for yugabyte-db-cluster...
- yugabyte-db-cluster in .terraform/modules/yugabyte-db-cluster
Initializing the backend...
Initializing provider plugins...
- Finding latest version of hashicorp/null...
- Finding hashicorp/aws versions matching "~> 3.0"...
- Installing hashicorp/null v2.1.2...
- Installed hashicorp/null v2.1.2 (signed by HashiCorp)
- Installing hashicorp/aws v3.7.0...
- Installed hashicorp/aws v3.7.0 (signed by HashiCorp)
Terraform has been successfully initialized!
This will download the terraform-aws-yugabyte module as well as other dependencies like plugins.
To see the list of resources we will be creating as part of this deployment, run the following command.
$ terraform plan
Refreshing Terraform state in-memory prior to plan...
module.yugabyte-db-cluster.data.aws_ami.yugabyte_ami: Refreshing state...
------------------------------------------------------------------------
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
[Terraform](/terraform-consulting/) will perform the following actions:
# module.yugabyte-db-cluster.aws_instance.yugabyte_nodes[0] will be created
+ resource "aws_instance" "yugabyte_nodes" {
+ ami = "ami-0affd"
+ arn = (known after apply)
+ associate_public_ip_address = true
+ availability_zone = "us-east-1a"
+ instance_type = "c4.xlarge"
+ root_block_device {
+ volume_size = 50
+ volume_type = "gp2"
}
}
# module.yugabyte-db-cluster.aws_security_group.yugabyte will be created
+ resource "aws_security_group" "yugabyte" {
+ arn = (known after apply)
+ description = "Managed by Terraform"
+ egress = (known after apply)
+ id = (known after apply)
+ ingress = [
+ {
+ cidr_blocks = [
+ "0.0.0.0/0",
]
+ description = ""
+ from_port = 22
+ protocol = "tcp"
},
]
+ vpc_id = "vpc-dab123"
}
# module.yugabyte-db-cluster.null_resource.create_yugabyte_universe will be created
+ resource "null_resource" "create_yugabyte_universe" {
+ id = (known after apply)
+ triggers = (known after apply)
}
Plan: 6 to add, 0 to change, 0 to destroy.
The above output is redacted for brevity. If everything looks good, we can run the following command to create the actual resources.
$ terraform apply
# …
Plan: 6 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
Apply complete! Resources: 6 added, 0 changed, 0 destroyed.
Outputs:
outputs = {
"JDBC" = "postgresql://yugabyte@35.135.36.37:5433"
"YCQL" = "ycqlsh 35.135.36.37 9042"
"YEDIS" = "redis-cli -h 35.135.36.37 -p 6379"
"YSQL" = "ysqlsh -U yugabyte -h 35.135.36.37 -p 5433"
"master-ui" = "http://35.135.36.37:7000"
"private_ips" = [
"10.0.0.14",
"10.0.0.30",
"10.0.0.45",
]
"public_ips" = [
"35.135.36.37",
"35.135.36.38",
"35.135.36.39",
]
"security_group" = "sg-00e12"
"ssh_key" = "/opt/keys/yb-cluster-key.pem"
"ssh_user" = "centos"
"tserver-ui" = "http://35.135.36.37:9000"
}
To check the status of the cluster, we can visit one of the masters’ web
UI at http://35.135.36.37:7000
.
Make sure you persist the Terraform state. It is a record of real world
state and metadata of our resources. It is used when we apply the
changes made to our infrastructure code. It’s usually saved as a file
with name terraform.tfstate
in the same directory as the configuration
file when using the default configuration.
To check the connectivity to the YCQL and YSQL APIs, we can run following command line clients:
$ docker run --rm -it yugabytedb/yugabyte-client ysqlsh -h 35.135.36.37
ysqlsh (11.2-YB-2.2.0.0-b0)
Type "help" for help.
yugabyte=# \conninfo
You are connected to database "yugabyte" as user "yugabyte" on host "35.135.36.37" at port "5433".
$ docker run --rm -it yugabytedb/yugabyte-client ycqlsh 35.135.36.37
Connected to local cluster at 35.135.36.37:9042.
[cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> SHOW HOST;
Connected to local cluster at 35.135.36.37:9042.
As we have the cluster up and running, now we will run some client workload on it. We will be using the YugabyteDB workload generator container image to do so. This application will connect to the cluster like a real world application will. It will try to insert, retrieve, and update the data.
To run the application, execute the following command:
$ docker run --rm -it yugabytedb/yb-sample-apps \
--workload SqlInserts \
--nodes 35.135.36.37:5433 \
--num_threads_write 1 \
--num_threads_read 4
…
105109 [Thread-1] INFO com.yugabyte.sample.common.metrics.MetricsTracker - Read: 9.60 ops/sec (409.58 ms/op), 718 total ops | Write: 2.40 ops/sec (409.58 ms/op), 185 total ops | Uptime: 100043 ms |
After running for a few minutes, we can see the new tablets and
increased read write operation metrics. To view the list of tablet
servers, visit http://35.135.36.37:7000/tablet-servers
.
To ensure high availability (HA) of the data, it is recommended to at least have replication factor (RF) of 3. For replication factor of 3, the cluster needs to have at least 3 nodes, with YB-Master and YB-TServer running on all of them. If the supplied replication_factor is greater than the num_instances, then the stack creation will fail.
To make the cluster tolerant to availability zone (AZ) failures, it’s
better to use a number of availability zones which is greater than or
equal to the RF. In our example, we are using 3 AZs (us-east-1a
,
us-east-1b
, and us-east-1c
).
The cluster provisioning script starts YB-Masters equal to the given RF. It also tries to make sure that they are spread across the given AZs.
For example, if we have RF of 3, 3 AZs (a
, b
, c
), and 6 nodes,
then the script will start a total of 6 nodes, and 3 of them will have
YB-Master running on them. The 3 YB-Masters will be running on one node
from each of the AZs – a
, b
, and c
. All 6 nodes will have the
YB-TServer process running on them.
You can read the documentation sections Replication factor, High availability, and Fault tolerance for more information.
NOTE: The terraform-gcp-yugabyte module automatically selects all the availability zones from the given region. Basically, we don’t have to pass the AZ or subnet list.
The YB-TServer process / service from the cluster is responsible for the user IO. It stores all the tablets (the user data). Client communication happens with this service.
You might need to scale this service horizontally in case of more storage or throughput requirements. In our case, this translates to increasing the number of nodes we have in the cluster.
To increase the number of nodes, modify the deployment configuration in
deploy-yugabytedb.tf
with a new value of num_instances
.
module "yugabyte-db-cluster" {
# …
num_instances = "5"
}
Applying these changes will add two new nodes to the cluster while keeping other nodes untouched.
$ terraform plan
Terraform will perform the following actions:
# module.yugabyte-db-cluster.aws_instance.yugabyte_nodes[3] will be created
…
# module.yugabyte-db-cluster.aws_instance.yugabyte_nodes[4] will be created
…
# module.yugabyte-db-cluster.null_resource.create_yugabyte_universe must be replaced
-/+ resource "null_resource" "create_yugabyte_universe" {
~ id = "1450910725944170537" -> (known after apply)
~ triggers = {
- "cluster_instance_ids" = "i-0a32,i-08b5,i-0a25"
} -> (known after apply) # forces replacement
}
Plan: 3 to add, 0 to change, 1 to destroy.
$ terraform apply
…
Apply complete! Resources: 3 added, 0 changed, 1 destroyed.
Outputs:
…
The newly added YB-TServers should be visible in the YB-Master UI at
http://35.135.36.37:7000/tablet-servers
. These newly added TServers
will automatically pick up tablets from the existing TServers to balance
the distribution of tablets.
The scale down operation is a special operation as directly terminating nodes might result in data loss.
For example, if we terminate one node from a cluster with RF 3 and 6 nodes, YB-Master will mark the node as dead after a configurable amount of time. It will use 2 other replicas to recover the tablets and replicate those to some other node. If we try to terminate more than or equal to RF number of nodes at the same time, then it might result in complete deletion of a tablet in a non-recoverable way.
Let’s scale down the newly added nodes from the cluster. Those are the
ones with private IPs 10.0.0.10 and 10.0.0.22. Make sure any of the
nodes you are scaling down don’t have the YB-Master process running on
them. This can be checked by going to the YB-Master UI at
http://35.135.36.37:7000/
.
NOTE: You can run terraform output
to get the output of the Terraform stack at any time.
First, let’s move all data out of the nodes that we are going to remove from the cluster. SSH into the first master node and run the following commands:
$ ssh -i /opt/keys/yb-cluster-key.pem centos@35.135.36.37
[centos@ip-10-0-0-14 ~]$ export YB_MASTERS="10.0.0.14:7100,10.0.0.30:7100,10.0.0.45:7100"
[centos@ip-10-0-0-14 ~]$ /home/centos/yugabyte-db/master/bin/yb-admin \
-master_addresses $YB_MASTERS \
change_blacklist ADD 10.0.0.10:9100 10.0.0.27:9100
This exports the private IPs of all the master nodes. It modifies the blacklist to add the two nodes which we want to remove.
To verify that the nodes are added correctly, run the following command:
[centos@ip-10-0-0-14 ~]$ /home/centos/yugabyte-db/master/bin/yb-admin \
-master_addresses $YB_MASTERS \
get_universe_config
{
"version": 7,
…
"serverBlacklist": {
"hosts": [
{
"host": "10.0.0.10",
"port": 9100
},
{
"host": "10.0.0.27",
"port": 9100
}
],
"initialReplicaLoad": 13
},
}
Once the nodes are added into the blacklist, YB-Master will start moving
out the data from them to other nodes. To check the progress of the data
move, run the get_load_move_completion
command. Run the command
periodically to see the progress.
[centos@ip-10-0-0-14 ~]$ /home/centos/yugabyte-db/master/bin/yb-admin \
-master_addresses $YB_MASTERS \
get_load_move_completion
Percent complete = 53.8466 : 6 remaining out of 13
[centos@ip-10-0-0-14 ~]$ /home/centos/yugabyte-db/master/bin/yb-admin \
-master_addresses $YB_MASTERS \
get_load_move_completion
Percent complete = 100 : 0 remaining out of 13
Now the YB-Master UI should show that the two nodes have 0 tablets. The
next step is to modify our Terraform configuration. We will set
num_instances
back to 3 and apply the changes.
$ terraform plan
Terraform will perform the following actions:
# module.yugabyte-db-cluster.aws_instance.yugabyte_nodes[3] will be destroyed
- resource "aws_instance" "yugabyte_nodes" {
- id = "i-0a95" -> null
- private_dns = "ip-10-0-0-10.ec2.internal" -> null
- private_ip = "10.0.0.10" -> null
- public_ip = "35.135.36.40" -> null
- …
}
# module.yugabyte-db-cluster.aws_instance.yugabyte_nodes[4] will be destroyed
- resource "aws_instance" "yugabyte_nodes" {
- id = "i-0dab" -> null
- private_dns = "ip-10-0-0-27.ec2.internal" -> null
- private_ip = "10.0.0.27" -> null
- public_ip = "35.135.36.41" -> null
- …
}
# module.yugabyte-db-cluster.null_resource.create_yugabyte_universe must be replaced
-/+ resource "null_resource" "create_yugabyte_universe" {
~ id = "5810245081236049803" -> (known after apply)
~ triggers = { # forces replacement
~ "cluster_instance_ids" = "i-0a32,i-08b5,i-0a25,i-0a95,i-0dab" -> "i-0a32,i-08b5,i-0a25"
}
}
Plan: 1 to add, 0 to change, 3 to destroy.
$ terraform apply
…
Apply complete! Resources: 1 added, 0 changed, 3 destroyed.
Remove the nodes from the blacklist; this will avoid issues if a node with the same IP address is added later to the cluster.
$ ssh -i /opt/keys/yb-cluster-key.pem centos@35.135.36.37
[centos@ip-10-0-0-14 ~]$ export YB_MASTERS="10.0.0.14:7100,10.0.0.30:7100,10.0.0.45:7100"
[centos@ip-10-0-0-14 ~]$ /home/centos/yugabyte-db/master/bin/yb-admin \
-master_addresses $YB_MASTERS \
change_blacklist REMOVE 10.0.0.10:9100 10.0.0.27:9100
Once the nodes are terminated, YB-Master will show those as dead. As of now there is no way to remove those entries without restarting the leader YB-Master node (this will change in the future).
$ ssh -i /opt/keys/yb-cluster-key.pem centos@35.135.36.39
[centos@ip-10-0-0-45 ~]$ kill -TERM $(pgrep yb-master); sleep 10s; ./start_master.sh
yb-master process is not running - restarting..
Take a look at the Perform data move section from the ‘Change cluster configuration’ page for more information.
The default value of the configuration variable allowed_sources
is
["0.0.0.0/0"]
. This allows anyone to access the nodes from our
cluster. The default configuration is only for usability purposes and
it’s recommended to change it.
The value can be set to a list of CIDR blocks and security group IDs.
This can be an office IP address, CIDR block of the VPC or some specific
security group. That way the cluster nodes will be accessible only from
those IPs. The allowed ports are YB-Master UI (7000), YB-TServer UI
(9000), Yedis (6379), YCQL (9042), YSQL (5433), and SSH (22). For
example, setting the value to ["1.2.3.4/32", "5.6.7.8/32"]
will allow
connections from these IP addresses only.
Refer to the Security groups for your VPC document from AWS for more information.
NOTE: The security group restriction is only available in the AWS module as of now. Similar configuration will be added to the modules for GCP and Azure soon.
That’s it! You now have a YugabyteDB cluster running on AWS that was provisioned using Terraform. To recap, in this blog post, we showed you how you can use Terraform modules for provisioning YugabyteDB on different cloud providers. We walked you through provisioning a cluster, managing nodes, understanding different configuration options. For the official documentation, usage description, and code, check out the AWS/Terraform page on GitHub.