4 minute read

When you deploy an OpenShift 4.1 cluster using the UPI (User-Provisioned Infrastructure) AWS installation, the deployment can be performed using the AWS CloudFormation templates provided, to create the required infrastructure and afterwards deploy the OCP cluster on top.

By default, the CloudFormation templates provided deploy 3 masters but only one worker (in IPI installations, 2 workers are deployed instead using the Machine Config Operator).

On the other hand, also by default, when the OpenShift cluster is deployed, two routers are deployed (to provide proper HA for the OCP routes) on the worker nodes of our cluster. To avoid having both OCP routers running on the same worker node, a minimum of two worker nodes are needed to host these routers, but only one is deployed with the CloudFormation templates and only one router is running (the other is in Pending state, as we will see below).

The solution for this problem is to use the Machine API Operator to deploy and scale to 2 (or 3) the workers in our cluster. With the proper number of workers, the OCP routers will run perfectly (one on each worker node) and will have the proper HA required for production environments.

Overview

First of all, we need our OCP4 deployed on AWS using the UPI installation. The result of this installation will be something like this:

[root@clientvm 0 ~/new-ocp4-cf]# oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-143-54.eu-west-1.compute.internal    Ready    master   23h   v1.13.4+cb455d664
ip-10-0-150-130.eu-west-1.compute.internal   Ready    master   23h   v1.13.4+cb455d664
ip-10-0-158-99.eu-west-1.compute.internal    Ready    worker   22h   v1.13.4+cb455d664
ip-10-0-165-234.eu-west-1.compute.internal   Ready    master   23h   v1.13.4+cb455d664

As we see, only one worker node is deployed in our cluster.

[root@clientvm 0 ~/new-ocp4-cf]# oc get nodes | grep worker
ip-10-0-158-99.eu-west-1.compute.internal    Ready    worker   23h   v1.13.4+cb455d664

For this reason, only one of the two OCP routers in our cluster is running properly in the openshift-ingress namespace:

[root@clientvm 0 ~/new-ocp4-cf]# oc get pod -n openshift-ingress -o wide
NAME                              READY   STATUS    RESTARTS   AGE   IP           NODE                                        NOMINATED NODE   READINESS GATES
router-default-76f869f9dc-s48rw   1/1     Running   0          23h   10.131.0.6   ip-10-0-158-99.eu-west-1.compute.internal   <none>           <none>
router-default-76f869f9dc-xh7w4   0/1     Pending   0          98s   <none>       <none>                                      <none>

As we can observe, the second router is not running because there isn’t a second worker node to host it, and it is waiting in “Pending” state.

Machine Management - Deploy additional worker nodes

So, to fix this and have two routers fully available, we can deploy a new worker node using a MachineSet and the Machine API Operator.

First of all, we can check that there are no MachineSets or Machines present/available:

[root@clientvm 0 ~/new-ocp4-cf]# oc get machinesets -n openshift-machine-api
No resources found.

[root@clientvm 0 ~/new-ocp4-cf]# oc get machine
No resources found.

NOTE: the only worker node present in our cluster was deployed with AWS Cloudformation:

[root@clientvm 0 ~/new-ocp4-cf]# aws ec2 describe-instances | jq -r '.Reservations[].Instances[] | select(.Tags[].Value|test(".*rcarrata-cf.*worker.*"))? | select(.State.Name=="running") | .InstanceId'
i-008f1d37b01200ffb

[root@clientvm 0 ~/new-ocp4-cf]# oc get nodes | grep worker
ip-10-0-158-99.eu-west-1.compute.internal    Ready    worker   23h   v1.13.4+cb455d664

We can create a MachineSet to deploy a new worker node in eu-west-1a (AZ1 of the AWS Ireland region):

[root@clientvm 0 ~/new-ocp4-cf]# cat machineset_worker.yml
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  labels:
    machine.openshift.io/cluster-api-cluster: rcarrata-cf-7lk9g
  name: rcarrata-cf-7lk9g-worker-2-eu-west-1a
  namespace: openshift-machine-api
spec:
  replicas: 1
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: rcarrata-cf-7lk9g
      machine.openshift.io/cluster-api-machine-role: worker
      machine.openshift.io/cluster-api-machine-type: worker
      machine.openshift.io/cluster-api-machineset: rcarrata-cf-7lk9g-worker-eu-west-1a
  template:
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: rcarrata-cf-7lk9g
        machine.openshift.io/cluster-api-machine-role: worker
        machine.openshift.io/cluster-api-machine-type: worker
        machine.openshift.io/cluster-api-machineset: rcarrata-cf-7lk9g-worker-eu-west-1a
    spec:
      metadata:
        labels:
          node-role.kubernetes.io/worker: ""
      providerSpec:
        value:
          ami:
            id: ami-00d18cbff03587a41
          apiVersion: awsproviderconfig.openshift.io/v1beta1
          blockDevices:
            - ebs:
                iops: 0
                volumeSize: 120
                volumeType: gp2
          credentialsSecret:
            name: aws-cloud-credentials
          deviceIndex: 0
          iamInstanceProfile:
            id: clustersecurity-WorkerInstanceProfile-HGYXROC3XT5E
          instanceType: m4.large
          kind: AWSMachineProviderConfig
          placement:
            availabilityZone: eu-west-1a
            region: eu-west-1
          securityGroups:
            - filters:
                - name: "tag:aws:cloudformation:logical-id"
                  values:
                    - WorkerSecurityGroup
          subnet:
            filters:
              - name: tag:Name
                values:
                  - rcarrata-xbyo-m2d2l-private-eu-west-1a
          tags:
            - name: kubernetes.io/cluster/rcarrata-cf-7lk9g
              value: owned
          userDataSecret:
            name: worker-user-data
status:
  replicas: 0

Apply the machineset worker yaml with the definitions of the worker node:

[root@clientvm 0 ~/new-ocp4-cf]# oc apply -f machineset_worker.yml
machineset.machine.openshift.io/rcarrata-cf-7lk9g-worker-2-eu-west-1a created

And check the machinesets and the machines created:

[root@clientvm 0 ~/new-ocp4-cf]# oc get machineset
NAME                                    DESIRED   CURRENT   READY   AVAILABLE   AGE
rcarrata-cf-7lk9g-worker-2-eu-west-1a   1         1                             4s
[root@clientvm 0 ~/new-ocp4-cf]# oc get machine
NAME                                          INSTANCE              STATE     TYPE       REGION      ZONE         AGE
rcarrata-cf-7lk9g-worker-2-eu-west-1a-r4sqk   i-0a07c73b9d8b438bc   pending   m4.large   eu-west-1   eu-west-1a   7s

Check with the aws-cli tool the new instance that will be our new worker node:

[root@clientvm 0 ~/new-ocp4-cf]# aws ec2 describe-instances | jq -r '.Reservations[].Instances[] \
| select(.Tags[].Value|test(".*rcarrata-cf.*worker.*"))? \
| select(.State.Name=="pending") | .InstanceId'
i-0a07c73b9d8b438bc

Once the worker node is in “Running” state, the machineset reflects that the Desired State and the Ready / Available States have matching resources (1 worker in this case):

[root@clientvm 0 ~/new-ocp4-cf]# aws ec2 describe-instances | jq -r '.Reservations[].Instances[] | \
select(.Tags[].Value|test(".*rcarrata-cf.*worker.*"))? \
| select(.State.Name=="running") | .InstanceId'
i-008f1d37b01200ffb
i-0a07c73b9d8b438bc
[root@clientvm 0 ~/new-ocp4-cf]# oc get machine
NAME                                          INSTANCE              STATE     TYPE       REGION      ZONE         AGE
rcarrata-cf-7lk9g-worker-2-eu-west-1a-r4sqk   i-0a07c73b9d8b438bc   running   m4.large   eu-west-1   eu-west-1a   42s

[root@clientvm 0 ~/new-ocp4-cf]# oc get machineset
NAME                                    DESIRED   CURRENT   READY   AVAILABLE   AGE
rcarrata-cf-7lk9g-worker-2-eu-west-1a   1         1         1       1           6m3s

Conclusion - Check OpenShift Cluster and Routers status

So now, the new worker node is up && running in our cluster of OpenShift:

[root@clientvm 0 ~/new-ocp4-cf]# oc get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-137-15.eu-west-1.compute.internal    Ready    worker   2m26s   v1.13.4+cb455d664
ip-10-0-143-54.eu-west-1.compute.internal    Ready    master   23h     v1.13.4+cb455d664
ip-10-0-150-130.eu-west-1.compute.internal   Ready    master   23h     v1.13.4+cb455d664
ip-10-0-158-99.eu-west-1.compute.internal    Ready    worker   23h     v1.13.4+cb455d664
ip-10-0-165-234.eu-west-1.compute.internal   Ready    master   23h     v1.13.4+cb455d664

Because we now have an additional worker node in our cluster, the second router can run on this new worker:

[root@clientvm 0 ~/new-ocp4-cf]# oc get pod -n openshift-ingress
NAME                              READY   STATUS    RESTARTS   AGE
router-default-76f869f9dc-4dqhj   1/1     Running   0          18m
router-default-76f869f9dc-s48rw   1/1     Running   0          23h

NOTE: Opinions expressed in this blog are my own and do not necessarily reflect that of the company I work for.