Machine Management in action into OpenShift 4
When you deploy an OpenShift 4.1 cluster using the UPI (User-Provisioned Infrastructure) AWS installation, the deployment can be performed using the AWS CloudFormation templates provided, to create the required infrastructure and afterwards deploy the OCP cluster on top.
By default, the CloudFormation templates provided deploy 3 masters but only one worker (in IPI installations, 2 workers are deployed instead using the Machine Config Operator).
On the other hand, also by default, when the OpenShift cluster is deployed, two routers are deployed (to provide proper HA for the OCP routes) on the worker nodes of our cluster. To avoid having both OCP routers running on the same worker node, a minimum of two worker nodes are needed to host these routers, but only one is deployed with the CloudFormation templates and only one router is running (the other is in Pending state, as we will see below).
The solution for this problem is to use the Machine API Operator to deploy and scale to 2 (or 3) the workers in our cluster. With the proper number of workers, the OCP routers will run perfectly (one on each worker node) and will have the proper HA required for production environments.
Overview
First of all, we need our OCP4 deployed on AWS using the UPI installation. The result of this installation will be something like this:
[root@clientvm 0 ~/new-ocp4-cf]# oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-143-54.eu-west-1.compute.internal Ready master 23h v1.13.4+cb455d664
ip-10-0-150-130.eu-west-1.compute.internal Ready master 23h v1.13.4+cb455d664
ip-10-0-158-99.eu-west-1.compute.internal Ready worker 22h v1.13.4+cb455d664
ip-10-0-165-234.eu-west-1.compute.internal Ready master 23h v1.13.4+cb455d664
As we see, only one worker node is deployed in our cluster.
[root@clientvm 0 ~/new-ocp4-cf]# oc get nodes | grep worker
ip-10-0-158-99.eu-west-1.compute.internal Ready worker 23h v1.13.4+cb455d664
For this reason, only one of the two OCP routers in our cluster is running properly in the openshift-ingress namespace:
[root@clientvm 0 ~/new-ocp4-cf]# oc get pod -n openshift-ingress -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
router-default-76f869f9dc-s48rw 1/1 Running 0 23h 10.131.0.6 ip-10-0-158-99.eu-west-1.compute.internal <none> <none>
router-default-76f869f9dc-xh7w4 0/1 Pending 0 98s <none> <none> <none>
As we can observe, the second router is not running because there isn’t a second worker node to host it, and it is waiting in “Pending” state.
Machine Management - Deploy additional worker nodes
So, to fix this and have two routers fully available, we can deploy a new worker node using a MachineSet and the Machine API Operator.
First of all, we can check that there are no MachineSets or Machines present/available:
[root@clientvm 0 ~/new-ocp4-cf]# oc get machinesets -n openshift-machine-api
No resources found.
[root@clientvm 0 ~/new-ocp4-cf]# oc get machine
No resources found.
NOTE: the only worker node present in our cluster was deployed with AWS Cloudformation:
[root@clientvm 0 ~/new-ocp4-cf]# aws ec2 describe-instances | jq -r '.Reservations[].Instances[] | select(.Tags[].Value|test(".*rcarrata-cf.*worker.*"))? | select(.State.Name=="running") | .InstanceId'
i-008f1d37b01200ffb
[root@clientvm 0 ~/new-ocp4-cf]# oc get nodes | grep worker
ip-10-0-158-99.eu-west-1.compute.internal Ready worker 23h v1.13.4+cb455d664
We can create a MachineSet to deploy a new worker node in eu-west-1a (AZ1 of the AWS Ireland region):
[root@clientvm 0 ~/new-ocp4-cf]# cat machineset_worker.yml
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
labels:
machine.openshift.io/cluster-api-cluster: rcarrata-cf-7lk9g
name: rcarrata-cf-7lk9g-worker-2-eu-west-1a
namespace: openshift-machine-api
spec:
replicas: 1
selector:
matchLabels:
machine.openshift.io/cluster-api-cluster: rcarrata-cf-7lk9g
machine.openshift.io/cluster-api-machine-role: worker
machine.openshift.io/cluster-api-machine-type: worker
machine.openshift.io/cluster-api-machineset: rcarrata-cf-7lk9g-worker-eu-west-1a
template:
metadata:
labels:
machine.openshift.io/cluster-api-cluster: rcarrata-cf-7lk9g
machine.openshift.io/cluster-api-machine-role: worker
machine.openshift.io/cluster-api-machine-type: worker
machine.openshift.io/cluster-api-machineset: rcarrata-cf-7lk9g-worker-eu-west-1a
spec:
metadata:
labels:
node-role.kubernetes.io/worker: ""
providerSpec:
value:
ami:
id: ami-00d18cbff03587a41
apiVersion: awsproviderconfig.openshift.io/v1beta1
blockDevices:
- ebs:
iops: 0
volumeSize: 120
volumeType: gp2
credentialsSecret:
name: aws-cloud-credentials
deviceIndex: 0
iamInstanceProfile:
id: clustersecurity-WorkerInstanceProfile-HGYXROC3XT5E
instanceType: m4.large
kind: AWSMachineProviderConfig
placement:
availabilityZone: eu-west-1a
region: eu-west-1
securityGroups:
- filters:
- name: "tag:aws:cloudformation:logical-id"
values:
- WorkerSecurityGroup
subnet:
filters:
- name: tag:Name
values:
- rcarrata-xbyo-m2d2l-private-eu-west-1a
tags:
- name: kubernetes.io/cluster/rcarrata-cf-7lk9g
value: owned
userDataSecret:
name: worker-user-data
status:
replicas: 0
Apply the machineset worker yaml with the definitions of the worker node:
[root@clientvm 0 ~/new-ocp4-cf]# oc apply -f machineset_worker.yml
machineset.machine.openshift.io/rcarrata-cf-7lk9g-worker-2-eu-west-1a created
And check the machinesets and the machines created:
[root@clientvm 0 ~/new-ocp4-cf]# oc get machineset
NAME DESIRED CURRENT READY AVAILABLE AGE
rcarrata-cf-7lk9g-worker-2-eu-west-1a 1 1 4s
[root@clientvm 0 ~/new-ocp4-cf]# oc get machine
NAME INSTANCE STATE TYPE REGION ZONE AGE
rcarrata-cf-7lk9g-worker-2-eu-west-1a-r4sqk i-0a07c73b9d8b438bc pending m4.large eu-west-1 eu-west-1a 7s
Check with the aws-cli tool the new instance that will be our new worker node:
[root@clientvm 0 ~/new-ocp4-cf]# aws ec2 describe-instances | jq -r '.Reservations[].Instances[] \
| select(.Tags[].Value|test(".*rcarrata-cf.*worker.*"))? \
| select(.State.Name=="pending") | .InstanceId'
i-0a07c73b9d8b438bc
Once the worker node is in “Running” state, the machineset reflects that the Desired State and the Ready / Available States have matching resources (1 worker in this case):
[root@clientvm 0 ~/new-ocp4-cf]# aws ec2 describe-instances | jq -r '.Reservations[].Instances[] | \
select(.Tags[].Value|test(".*rcarrata-cf.*worker.*"))? \
| select(.State.Name=="running") | .InstanceId'
i-008f1d37b01200ffb
i-0a07c73b9d8b438bc
[root@clientvm 0 ~/new-ocp4-cf]# oc get machine
NAME INSTANCE STATE TYPE REGION ZONE AGE
rcarrata-cf-7lk9g-worker-2-eu-west-1a-r4sqk i-0a07c73b9d8b438bc running m4.large eu-west-1 eu-west-1a 42s
[root@clientvm 0 ~/new-ocp4-cf]# oc get machineset
NAME DESIRED CURRENT READY AVAILABLE AGE
rcarrata-cf-7lk9g-worker-2-eu-west-1a 1 1 1 1 6m3s
Conclusion - Check OpenShift Cluster and Routers status
So now, the new worker node is up && running in our cluster of OpenShift:
[root@clientvm 0 ~/new-ocp4-cf]# oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-137-15.eu-west-1.compute.internal Ready worker 2m26s v1.13.4+cb455d664
ip-10-0-143-54.eu-west-1.compute.internal Ready master 23h v1.13.4+cb455d664
ip-10-0-150-130.eu-west-1.compute.internal Ready master 23h v1.13.4+cb455d664
ip-10-0-158-99.eu-west-1.compute.internal Ready worker 23h v1.13.4+cb455d664
ip-10-0-165-234.eu-west-1.compute.internal Ready master 23h v1.13.4+cb455d664
Because we now have an additional worker node in our cluster, the second router can run on this new worker:
[root@clientvm 0 ~/new-ocp4-cf]# oc get pod -n openshift-ingress
NAME READY STATUS RESTARTS AGE
router-default-76f869f9dc-4dqhj 1/1 Running 0 18m
router-default-76f869f9dc-s48rw 1/1 Running 0 23h
NOTE: Opinions expressed in this blog are my own and do not necessarily reflect that of the company I work for.