Roberto Carratalá bio photo

Roberto Carratalá

Linux Geek. Devops & Kubernetes enthusiast. Architect @ Red Hat.

LinkedIn Github

When you deploy a cluster of Openshift 4.1 using the UPI (User-Provisioned Infrastructure) AWS installation, the deployment can be performed using the AWS Cloudformation templates provided, for create the infrastructure required and afterwards deploy the OCP cluster on top.

By default the cloudformation templates provided, deploys 3 masters but only one worker (in IPI installations 2 workers are deployed instead using the Machine Config Operator).

On the other hand and also by default, when the Openshift cluster is deployed, two routers are deployed (for give the proper HA to the OCP routes) into the worker nodes of our cluster. For avoid that this two OCP routers are running into the same worker node, two worker nodes minimum are needed to host this routers, but only one is deployed with the Cloudformation templates and only one router is running (the other is in Pending state, as we will see above).

The solution for this problem is to use the Machine Api Operator, for deploy and scale to 2 (or 3) the workers deployed in our cluster. With the proper number of workers, the OCP routers will run perfectly (one into each worker node) and will have the proper HA required for production environments.

Overview

First of all, we need our OCP4 deployed into AWS using the UPI installation. The result of this installation something will be like this:

[root@clientvm 0 ~/new-ocp4-cf]# oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-143-54.eu-west-1.compute.internal    Ready    master   23h   v1.13.4+cb455d664
ip-10-0-150-130.eu-west-1.compute.internal   Ready    master   23h   v1.13.4+cb455d664
ip-10-0-158-99.eu-west-1.compute.internal    Ready    worker   22h   v1.13.4+cb455d664
ip-10-0-165-234.eu-west-1.compute.internal   Ready    master   23h   v1.13.4+cb455d664

As we see, only one worker node is deployed in our cluster.

[root@clientvm 0 ~/new-ocp4-cf]# oc get nodes | grep worker
ip-10-0-158-99.eu-west-1.compute.internal    Ready    worker   23h   v1.13.4+cb455d664

For this reason, only one of the two OCP routers in our cluster is running properly in the openshift-ingress namespace:

[root@clientvm 0 ~/new-ocp4-cf]# oc get pod -n openshift-ingress -o wide
NAME                              READY   STATUS    RESTARTS   AGE   IP           NODE                                        NOMINATED NODE   READINESS GATES
router-default-76f869f9dc-s48rw   1/1     Running   0          23h   10.131.0.6   ip-10-0-158-99.eu-west-1.compute.internal   <none>           <none>
router-default-76f869f9dc-xh7w4   0/1     Pending   0          98s   <none>       <none>                                      <none>

As we can observe, the second router are not running ok, because there isn’t a second worker node for host them, and is waiting in “Pending” state.

Machine Management - Deploy additional worker nodes

So, for fix this and have two routers fully available, we can deploy a new worker node using a MachineSet and the Machine API Operator.

First of all, we can check that are not any MachineSet and Machine present/available:

[root@clientvm 0 ~/new-ocp4-cf]# oc get machinesets -n openshift-machine-api
No resources found.

[root@clientvm 0 ~/new-ocp4-cf]# oc get machine
No resources found.

NOTE: the only worker node present in our cluster was deployed with AWS Cloudformation:

[root@clientvm 0 ~/new-ocp4-cf]# aws ec2 describe-instances | jq -r '.Reservations[].Instances[] | select(.Tags[].Value|test(".*rcarrata-cf.*worker.*"))? | select(.State.Name=="running") | .InstanceId'
i-008f1d37b01200ffb

[root@clientvm 0 ~/new-ocp4-cf]# oc get nodes | grep worker
ip-10-0-158-99.eu-west-1.compute.internal    Ready    worker   23h   v1.13.4+cb455d664

We can create a MachineSet for deploy a new worker node in eu-west-1a (AZ1 of AWS Ireland region):

[root@clientvm 0 ~/new-ocp4-cf]# cat machineset_worker.yml
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  labels:
    machine.openshift.io/cluster-api-cluster: rcarrata-cf-7lk9g
  name: rcarrata-cf-7lk9g-worker-2-eu-west-1a
  namespace: openshift-machine-api
spec:
  replicas: 1
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: rcarrata-cf-7lk9g
      machine.openshift.io/cluster-api-machine-role: worker
      machine.openshift.io/cluster-api-machine-type: worker
      machine.openshift.io/cluster-api-machineset: rcarrata-cf-7lk9g-worker-eu-west-1a
  template:
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: rcarrata-cf-7lk9g
        machine.openshift.io/cluster-api-machine-role: worker
        machine.openshift.io/cluster-api-machine-type: worker
        machine.openshift.io/cluster-api-machineset: rcarrata-cf-7lk9g-worker-eu-west-1a
    spec:
      metadata:
        labels:
          node-role.kubernetes.io/worker: ""
      providerSpec:
        value:
          ami:
            id: ami-00d18cbff03587a41
          apiVersion: awsproviderconfig.openshift.io/v1beta1
          blockDevices:
            - ebs:
                iops: 0
                volumeSize: 120
                volumeType: gp2
          credentialsSecret:
            name: aws-cloud-credentials
          deviceIndex: 0
          iamInstanceProfile:
            id: clustersecurity-WorkerInstanceProfile-HGYXROC3XT5E
          instanceType: m4.large
          kind: AWSMachineProviderConfig
          placement:
            availabilityZone: eu-west-1a
            region: eu-west-1
          securityGroups:
            - filters:
                - name: "tag:aws:cloudformation:logical-id"
                  values:
                    - WorkerSecurityGroup
          subnet:
            filters:
              - name: tag:Name
                values:
                  - rcarrata-xbyo-m2d2l-private-eu-west-1a
          tags:
            - name: kubernetes.io/cluster/rcarrata-cf-7lk9g
              value: owned
          userDataSecret:
            name: worker-user-data
status:
  replicas: 0

Apply the machineset worker yaml with the definitions of the worker node:

[root@clientvm 0 ~/new-ocp4-cf]# oc apply -f machineset_worker.yml
machineset.machine.openshift.io/rcarrata-cf-7lk9g-worker-2-eu-west-1a created

And check the machinesets and the machines created:

[root@clientvm 0 ~/new-ocp4-cf]# oc get machineset
NAME                                    DESIRED   CURRENT   READY   AVAILABLE   AGE
rcarrata-cf-7lk9g-worker-2-eu-west-1a   1         1                             4s
[root@clientvm 0 ~/new-ocp4-cf]# oc get machine
NAME                                          INSTANCE              STATE     TYPE       REGION      ZONE         AGE
rcarrata-cf-7lk9g-worker-2-eu-west-1a-r4sqk   i-0a07c73b9d8b438bc   pending   m4.large   eu-west-1   eu-west-1a   7s

Check with the aws-cli tool the brand new instance that will be our new brand worker node:

[root@clientvm 0 ~/new-ocp4-cf]# aws ec2 describe-instances | jq -r '.Reservations[].Instances[] \
| select(.Tags[].Value|test(".*rcarrata-cf.*worker.*"))? \
| select(.State.Name=="pending") | .InstanceId'
i-0a07c73b9d8b438bc

Once the worker node is in state “Running”, the machineset reflects that the Desired State and the Ready / Available States have matching resources (1 worker in this case):

[root@clientvm 0 ~/new-ocp4-cf]# aws ec2 describe-instances | jq -r '.Reservations[].Instances[] | \
select(.Tags[].Value|test(".*rcarrata-cf.*worker.*"))? \
| select(.State.Name=="running") | .InstanceId'
i-008f1d37b01200ffb
i-0a07c73b9d8b438bc
[root@clientvm 0 ~/new-ocp4-cf]# oc get machine
NAME                                          INSTANCE              STATE     TYPE       REGION      ZONE         AGE
rcarrata-cf-7lk9g-worker-2-eu-west-1a-r4sqk   i-0a07c73b9d8b438bc   running   m4.large   eu-west-1   eu-west-1a   42s

[root@clientvm 0 ~/new-ocp4-cf]# oc get machineset
NAME                                    DESIRED   CURRENT   READY   AVAILABLE   AGE
rcarrata-cf-7lk9g-worker-2-eu-west-1a   1         1         1       1           6m3s

Conclusion - Check Openshift Cluster and Routers status

So now, the new worker node is up && running in our cluster of Openshift:

[root@clientvm 0 ~/new-ocp4-cf]# oc get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-137-15.eu-west-1.compute.internal    Ready    worker   2m26s   v1.13.4+cb455d664
ip-10-0-143-54.eu-west-1.compute.internal    Ready    master   23h     v1.13.4+cb455d664
ip-10-0-150-130.eu-west-1.compute.internal   Ready    master   23h     v1.13.4+cb455d664
ip-10-0-158-99.eu-west-1.compute.internal    Ready    worker   23h     v1.13.4+cb455d664
ip-10-0-165-234.eu-west-1.compute.internal   Ready    master   23h     v1.13.4+cb455d664

Because of we have an additional worker node in our cluster, the second router can run in this brand new worker:

[root@clientvm 0 ~/new-ocp4-cf]# oc get pod -n openshift-ingress
NAME                              READY   STATUS    RESTARTS   AGE
router-default-76f869f9dc-4dqhj   1/1     Running   0          18m
router-default-76f869f9dc-s48rw   1/1     Running   0          23h