Enterprises are increasingly turning to Kubernetes for container orchestration, drawn by its promise of scalability, agility, and portability. While cloud-based Kubernetes solutions have gained traction, on-premises deployments remain a compelling option for organizations seeking greater control, security, and cost-efficiency. In this blog post, we’ll explore how to set up a Kubernetes cluster on bare metal using Rancher Kubernetes Engine (RKE2). Plus, we will configure MetalLB as a load balancer and integrate Rook-Ceph as a storage orchestrator for persistent storage.
Let’s begin with understanding Rancher Kubernetes Engine 2 (RKE2).
RKE2, affectionately dubbed RKE Government, stands as Rancher’s pioneering Kubernetes distribution, embodying the next evolution in container orchestration solutions. Engineered with a steadfast commitment to fortifying security and ensuring compliance, RKE2 emerges as a fully conformant Kubernetes distribution tailored to the stringent demands of the U.S. Federal Government sector.
Here’s how RKE2 raises the bar over other bare-metal setups:
Strategic default configurations: RKE2 arrives equipped with meticulously crafted defaults and configuration options, meticulously curated to align with the CIS Kubernetes Benchmark v1.6 or v1.23 standards. These out-of-the-box settings minimize the need for operators to intervene manually, streamlining the path to compliance without compromising security.
FIPS 140-2 compliance: Recognizing the paramount importance of cryptographic standards, RKE2 empowers organizations to achieve FIPS 140-2 compliance effortlessly. By seamlessly integrating FIPS 140-2 validated cryptographic modules, RKE2 ensures data integrity and confidentiality in line with government regulations.
Continuous vulnerability monitoring: In an era defined by ever-evolving threat landscapes, RKE2 remains vigilant. Leveraging advanced scanning mechanisms, RKE2 conducts regular assessments of its components for Common Vulnerabilities and Exposures (CVEs) using Trivy in its robust build pipeline. This proactive approach enables timely detection and remediation of potential security vulnerabilities, fortifying the resilience of Kubernetes clusters.
In essence, RKE2 redefines the paradigm of Kubernetes distributions, combining cutting-edge technology with security and compliance. With RKE2 at the helm, organizations operating within the U.S. Federal Government sector can confidently navigate the complexities of containerized environments, assured of a solution that not only meets but exceeds the most stringent regulatory requirements.
RKE2, the latest iteration of Rancher Kubernetes Engine (RKE), serves as a testament to the evolution of Kubernetes distributions, combining the strengths of its predecessors, RKE1, and K3s, while introducing novel approaches to deployment and management.
Inheriting simplicity and usability from K3s: From K3s, RKE2 inherits a user-friendly experience marked by simplicity and ease of operation. The deployment model, renowned for its lightweight footprint and streamlined setup process, ensures that even novice users can deploy Kubernetes clusters effortlessly.
Maintaining alignment with upstream Kubernetes: RKE2 goes beyond mere simplicity by maintaining close alignment with upstream Kubernetes, a trait inherited from RKE1. While K3s diverge from upstream Kubernetes in certain areas to optimize for edge deployments, RKE1 and RKE2 stay closely aligned, ensuring compatibility and conformity with the broader Kubernetes ecosystem.
Shift away from Docker dependency: One of the most notable departures from RKE1 lies in RKE2’s approach to containerization. While RKE1 relied on Docker for deploying and managing control plane components, along with the Kubernetes container runtime, RKE2 took a different route. It leverages containers and launches control plane components as static pods managed by the kubelet. This shift not only reduces dependencies but also enhances stability and reliability.
As we now understand why RKE 2 is a better option than RKE and K3s, we will install RKE 2. Then, we can add MetalLB and set up Rook-Ceph. However, when following the official documentation for these installations, you may expect issues related to storage and memory. Basically, a lack of proper storage and memory can cause the master to become unavailable. The solution is to select the correct configuration for the master.
For the RKE2 cluster, select 4 GB memory along with a minimum of 50 GB storage. Sometimes, you can see your master is up with 2 GB, and the next moment, after restarting or configuring another master, you might find that the master is not up. You may have no answers as to why the master is not up and running, so it is always advisable to use correct memory and storage to avoid such issues. You can test the RKE2 setup on a cloud VM, but make sure to open the API port on master in the security group/firewall, depending on the cloud. You do not need to think of a port if you are testing it on VirtualBox.
For MetalLB, the first thing we need to keep in mind is that it doesn’t support most of the cloud providers like AWS, Azure, and GCP because these cloud providers do not support the networking protocols that MetalLB requires. Even if they offer dedicated servers it does not work well with BGP (Border Gateway Protocol) or L2 mode. If you are going with VirtualBox, you can use the CIDR range provided for nodes, or you can provide a range of IP addresses from Host-only-networks so that you can hit allocated IP to services in your browser to test.
If you are configuring the Rook-Ceph cluster on VirtualBox, you must configure extra raw disk for nodes. Then, you can use these blocks as specified in the Rook-Ceph installation step.You can follow the Rook-Ceph section if you are trying to achieve this on bare metal. You can configure Rook-Ceph in a cloud environment by using cluster configurations file and changing the storage class in manifest depending on the cloud provider.
With tailored solutions in hand to navigate common challenges in RKE 2. MetalLB, and Rook-Ceph installation, let’s move to the next step, ensuring a smooth and efficient deployment process.
We will now move with the installation of RKE2 on Ubuntu 20.04 Operating System. For our setup, we have used 1 master and 2 workers. You can use more than 1 master node but it should be in odd numbers. An odd number is needed to maintain a quorum. You can refer to the official document for more details to achieve high availability.
RKE2 provides an installation script for a convenient way to install it as a service.
Run installer script
curl -sfL https://get.rke2.io | sh -
Enable rke2-server.service
systemctl enable rke2-server.service
Start the service
systemctl start rke2-server.service
Get the logs
journalctl -u rke2-server -f
After all the steps are successfully executed, you can see your application running. Your services will restart automatically after node reboots.
A kubeconfig file is written to /etc/rancher/rke2/rke2.yaml
.
A token that will be used later to configure the worker with a master can be found at /var/lib/rancher/rke2/server/node-token
.
You can run the command export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
to configure your kube-config file.
Remember to install kubectl depending on your Linux distribution. We are using Ubuntu systems so to install kubectl, you can run snap install kubectl –classic
command.
Once your master server is ready, run kubectl get nodes
command:
root@master4:/home/master4# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master4 Ready control-plane,etcd,master 12d v1.26.12+rke2r1
worker6 Ready <none> 12d v1.26.12+rke2r1
Note: You cannot run kubectl commands until master is ready. It can take some time to become ready after node restarts.
Run the below command to save the above credentials in the kube-config file.
mkdir ~/.kube
touch ~/.kube/config
cp /etc/rancher/rke2/rke2.yaml ~/.kube/config
Run Worker Installation Command
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sh -
Enable rke2-agent.service
systemctl enable rke2-agent.service
Configure rke2-agent service
mkdir -p /etc/rancher/rke2/
vim /etc/rancher/rke2/config.yaml
Content for config.yaml:
server: https://<server>:9345
token: <token from server node>
Server: private IP of master server, you can get this with ifconfig command.
Token: can be found by cat /var/lib/rancher/rke2/server/node-token
Note: rke2-server listens on port 9345 for new nodes to register. Kubernetes API runs on 6443 normally.
Start rke2-agent service
systemctl start rke2-agent.service
Follow the logs after the above command to check if it’s running.
journalctl -u rke2-agent -f
Note: Each machine must have a unique hostname. If your machines do not have unique hostnames, set the node-name parameter in the config.yaml file and provide a value with a valid and unique hostname for each node.
If you need to create multiple worker nodes, instead of doing these steps manually, just run the bash script mentioned below by replacing the token and server IP.
#!/bin/bash
sudo apt-get update
sudo su root
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sh -
systemctl enable rke2-agent.service
mkdir -p /etc/rancher/rke2/
touch /etc/rancher/rke2/config.yaml
yaml_file="/etc/rancher/rke2/config.yaml"
server_value="https://10.0.2.250:9345" # Replace IP with your master node IP
token_value="K103d03052bf787164d1kce1b158708b5eb644b0a804a859c7f7553ea7f9a50993c::server:09f67c5eb986a4f6349h9ag9827j975b" # Replace with your token
# Check if the file exists
if [ -f "$yaml_file" ]; then
# Append values to the YAML file
echo "server: $server_value" >> "$yaml_file"
echo "token: $token_value" >> "$yaml_file"
echo "Values added to $yaml_file"
else
echo "Error: File not found at $yaml_file"
fi
systemctl start rke2-agent.service
In traditional cloud environments, load balancers are often provided as managed services. However, when deploying Kubernetes on bare metal, there is no native load balancing solution out of the box. This can pose challenges for applications that need to be exposed to external users and load distribution.
MetalLB fills the gap by offering a straightforward, software-based load balancing solution for bare metal Kubernetes clusters. It allows you to expose services externally by allocating external IP addresses and distributing traffic across the nodes in the cluster. According to official documentation, MetalLB cannot create IP addresses out of thin air, so you do have to give it pools of IP addresses it can use. It will take care of assigning and unassigning individual addresses as services come and go, but it will only ever hand out IPs that are part of its configured pools.
MetalLB supports two primary modes: Layer 2 mode and BGP mode. In Layer 2 mode, MetalLB assigns IP addresses from a predefined pool to services. BGP mode, on the other hand, allows MetalLB to dynamically announce the service IPs using the Border Gateway Protocol, making it suitable for more complex networking setups.
We are going to configure MetalLB with a manifest file. You can choose the Helm or Kustomize option from official documents to install MetalLB.
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.2/config/manifests/metallb-native.yaml
Note: We are using the 0.14.2 version, which might change in the future. To get the latest release, you can follow the official install MetalLB for the manifest section or refer to their GitHub Repo.
This will deploy MetalLB to your cluster under the metallb-system namespace. The components in the manifest are:
metallb-system/controller
deployment. This is the cluster-wide controller that handles IP address assignments.metallb-system/speaker daemonset
. This component speaks to the protocol(s) of your choice to make the services reachable.This does not include configuration files, so components will start but remain idle.
NAME READY STATUS RESTARTS AGE
controller-586bfc6b59-6gdrh 1/1 Running 9 (61m ago) 11d
speaker-cf9zv 1/1 Running 24 (61m ago) 12d
speaker-qnhct 1/1 Running 8 (2m45s ago) 12d
To assign IP to services, MetalLB must provide with an IPAddressPool that will be used to assign IP to services. Save this file with the name ip.yaml and run kubectl apply -f ip.yaml
.
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: first-pool
namespace: metallb-system
spec:
addresses:
- 192.168.56.20-192.168.56.30
Save this file with the name ip.yaml and run kubectl apply -f ip.yaml
and you have your IPAddressPool ready in the cluster.
root@master4:/home/master4# kubectl get IPAddressPool -A
NAMESPACE NAME AUTO ASSIGN AVOID BUGGY IPS ADDRESSES
metallb-system first-pool true false ["192.168.56.20-192.168.56.30"]
Here, we are using a range of IPs which is being used to SSH these servers as well. You can also use the range of IP that has been assigned to your nodes or CIDR range. To learn more, you can follow the documentation for L2 configuration.
Layer 2 mode is the simplest to configure: in many cases, you don’t need any protocol-specific configuration, only IP addresses.
Layer 2 mode does not require the IPs to be bound to the network interfaces of your worker nodes. It works by responding to ARP requests on your local network directly, to give the machine’s MAC address to clients.
To advertise the IP coming from an IPAddressPool
, an L2Advertisement
instance must be associated with the IPAddressPool
. To learn more, follow the documentation for L2 configuration.
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: example
namespace: metallb-system
spec:
ipAddressPools:
- first-pool
Save the file with the name L2.yaml and run kubectl apply -f L2.yaml
.
root@master4:/home/master4# kubectl get L2Advertisement -A
NAMESPACE NAME IPADDRESSPOOLS IPADDRESSPOOL SELECTORS INTERFACES
metallb-system example ["first-pool"]
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx-container
image: nginx:latest
---
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 80
type: LoadBalancer
Save this as nginx-deploy.yaml and run kubectl apply -f nginx-deploy.yaml
.
Run kubectl get svc
to get External IP:
root@master4:/home/master4# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx-service LoadBalancer 10.43.102.163 192.168.56.21 80:32310/TCP 4m24s
Now run curl -v {external-IP} like curl -v 192.168.56.21
or copy and paste the IP to your browser, and you can see the Nginx welcome page.
Storage is a critical component in any Kubernetes environment, and Rook-Ceph emerges as a game-changer in this domain. Rook is an open source cloud native storage orchestrator, providing the platform, framework, and support for Ceph storage to integrate natively with cloud native environments.
Ceph is a distributed storage system that provides file, block, and object storage and is deployed in large-scale production clusters.
Rook automates deployment and management of Ceph to provide self-managing, self-scaling, and self-healing storage services. The Rook operator does this by building on Kubernetes resources to deploy, configure, provision, scale, upgrade, and monitor Ceph.
To configure the Ceph storage cluster, at least one of these local storage types is required: You should have one extra raw partition to be attached. Filesystems attached during creation won’t work for Ceph Cluster.
Ceph OSDs (Object Storage Daemon) have a dependency on LVM in the following scenarios:
LVM is not required for OSDs in these scenarios:
First, we will clone the repo and install the Rook Operator:
git clone --single-branch --branch master https://github.com/rook/rook.git
cd rook/deploy/examples
kubectl create -f crds.yaml -f common.yaml -f operator.yaml
You can also install Rook Operator with different approaches by following the documentation of Quickstart - Rook Ceph. Verify that the Rook Operator is up and running by command kubectl -n rook-ceph get pod
.
There are several types of Ceph clusters depending on your environment like bare-metal, cloud, test, and production level. Different manifests are provided for different types of installation. We are going with 1 node in this setup. If you want to learn more, you can find various examples in example configurations document.
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: my-cluster
namespace: rook-ceph # namespace:cluster
spec:
dataDirHostPath: /var/lib/rook
cephVersion:
image: quay.io/ceph/ceph:v18
allowUnsupported: true
mon:
count: 1
allowMultiplePerNode: true
mgr:
count: 1
allowMultiplePerNode: true
dashboard:
enabled: true
crashCollector:
disable: true
storage:
useAllNodes: true
useAllDevices: true
#deviceFilter:
monitoring:
enabled: false
healthCheck:
daemonHealth:
mon:
interval: 45s
timeout: 600s
priorityClassNames:
all: system-node-critical
mgr: system-cluster-critical
disruptionManagement:
managePodBudgets: true
cephConfig:
global:
osd_pool_default_size: "1"
mon_warn_on_pool_no_redundancy: "false"
bdev_flock_retry: "20"
bluefs_buffered_io: "false"
mon_data_avail_warn: "10"
---
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: builtin-mgr
namespace: rook-ceph # namespace:cluster
spec:
name: .mgr
replicated:
size: 1
requireSafeReplicaSize: false
Save the file with the name cluster-test.yml and run kubectl apply -f cluster-test.yaml
.
Check all your pods are running by kubectl get pods -n rook-ceph
. Make sure your OSD and ceph-mon are up.
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-ld862 2/2 Running 24 (15m ago) 7d23h
csi-cephfsplugin-nk2m4 2/2 Running 17 (12m ago) 7d23h
csi-cephfsplugin-provisioner-744b68955d-8xj69 5/5 Running 45 (15m ago) 7d23h
csi-cephfsplugin-provisioner-744b68955d-vqj7k 5/5 Running 11 (12m ago) 18h
csi-rbdplugin-7x2b2 2/2 Running 24 (15m ago) 7d23h
csi-rbdplugin-7xkjj 2/2 Running 16 (12m ago) 7d23h
csi-rbdplugin-provisioner-54c58c7d4c-2zkzz 5/5 Running 45 (15m ago) 7d23h
csi-rbdplugin-provisioner-54c58c7d4c-nsvq7 5/5 Running 12 (12m ago) 18h
rook-ceph-exporter-master4-65bd589b7-82xdr 1/1 Running 16 (15m ago) 6d17h
rook-ceph-exporter-worker6-86c5477549-lbs9w 1/1 Running 2 (12m ago) 18h
rook-ceph-mgr-a-765cc75d8c-rq9cw 1/1 Running 50 (11m ago) 7d13h
rook-ceph-mon-a-5c55b57784-pnxvh 1/1 Running 2 (12m ago) 18h
rook-ceph-operator-7bc5dfd788-bll7p 1/1 Running 11 (15m ago) 11d
rook-ceph-osd-0-6b76cbdc46-95wtr 1/1 Running 3 (33m ago) 6d17h
rook-ceph-osd-1-94f7d466b-xwktl 1/1 Running 1 (27m ago) 18h
rook-ceph-osd-prepare-master4-6hbh7 0/1 Completed 0 11m
rook-ceph-osd-prepare-worker6-xzq5h 0/1 Completed 0 11m
rook-ceph-tools-6d6d694fb9-l65f2 1/1 Running 8 (15m ago) 7d13h
Depending on your cluster type, we will configure storageclass. We are going with 1 node, so we will use storageclass-test.yaml. Run below command to configure storage kubectl apply -f https://raw.githubusercontent.com/rook/rook/master/deploy/examples/csi/rbd/storageclass-test.yaml
. Now, check storage is created with the command kubectl get storageclass
The Rook toolbox is a container with common tools used for Rook debugging and testing. You can use it to run as Ceph CLI for all the queries and information related to Ceph cluster. Run “kubectl apply -f https://raw.githubusercontent.com/rook/rook/master/deploy/examples/toolbox.yaml” . To learn more, read the Toolbox documentation.
Once your deployment is ready, run below command to connect with the Toolbox:
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
Now run ceph status to check the status of ceph cluster, you should get output similar to this.
bash-4.4$ ceph status
cluster:
id: 96f59e8f-06ec-4806-ac7a-38d32d6a7e4e
health: HEALTH_OK
services:
mon: 1 daemons, quorum a (age 31m)
mgr: a(active, since 31m)
osd: 2 osds: 2 up (since 31m), 2 in (since 6d)
data:
pools: 1 pools, 32 pgs
objects: 16 objects, 3.2 MiB
usage: 65 MiB used, 55 GiB / 55 GiB avail
pgs: 32 active+clean
Now, run ceph osd status, and output should look similar to this.
bash-4.4$ ceph osd status
ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 master4 32.3M 24.9G 0 0 0 0 exists,up
1 worker6 32.9M 29.9G 0 0 0 0 exists,up
Note: The term OSD here represents the number of raw volumes you have attached in each node. In my case, you can see 2 OSD because two raw volumes are attached i.e one for master and one for worker.
Now, our cluster is ready along with storageclass and ceph cluster.
Once everything is up and running, let’s create a pod, pvc, and attach pvc with that pod.
Create a file with the name pod.yaml and save the below manifest. Run kubectl apply -f pod.yaml
.
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
spec:
containers:
- name: nginx-container
image: nginx:latest
volumeMounts:
- name: nginx-pvc-mount
mountPath: /usr/share/nginx/html
volumes:
- name: nginx-pvc-mount
persistentVolumeClaim:
claimName: nginx-pvc # Replace with your PVC name
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nginx-pvc # Replace with your PVC name
spec:
accessModes:
- ReadWriteOnce
storageClassName: rook-ceph-block # Replace with your storage class name
resources:
requests:
storage: 1Gi # Replace with your desired storage size
Now, you can see your pod running and run kubectl get pvc and kubectl get pv
command to see pv getting bound to pvc.
In this blog post, we learned how to set up an on-prem cluster using RKE, MetalLB for load balancing, and Rook-Ceph for storage orchestration. This architecture provides a robust foundation for deploying and scaling containerized applications in a bare metal environment. We also discussed common challenges faced while setting up the RKE2 cluster.
I hope this article was informative to you. I would like to hear your thoughts on this post. If you wish to share your opinion about this article, let’s connect and start a conversation on LinkedIn.