6 minute read

How to run a Route Sharding on the same OpenShift nodes in an on-premise installation (UPI or IPI)? What are the steps to configure it?

Let’s dig in!

Overview & Example Scenario

This example is applicable to all UPI deployments on-premise (Baremetal, VMWare UPI, RHV UPI, OSP UPI, etc.) and also the RHV IPI, VMWare IPI, and Baremetal IPI (with some differences that will be analyzed in future blog posts) of OpenShift 4.

In this example, an OpenShift 4.5 UPI on top of OpenStack is used (same as the previous blog post).

Why can’t I run two Route Sharding in the same node?

As we checked in the previous blog post about the Deep Dive of OpenShift 4 Routers in On-Premise Deployments the routers in an installation non-cloud are a bit different. That is because the object of Kubernetes Loadbalancer is not available, and the OpenShift Routers are deployed with a Service ClusterIP and with the endpointPublishingStrategy of HostNetwork: true.

This is because HostNetwork allows the OpenShift Router Pods to directly see the network interfaces of the host machine (OpenShift Node) where the pod is started and exposes the ports of the Routers (HTTP, HTTPS, and the metrics port) for load balancing by an external physical or virtual LB.

But with the default behaviour of the IngressController and the OpenShift Routers, you CAN’T have a Route Sharding on the same worker/infra node, because it will overlap the ports on the OpenShift Node (remember that the OCP Router is listening on the Node interface, not in the Pod Network).

For this reason, for each Route Sharding you need to create separate nodes. And because we always want HA for our OpenShift Routers, that means 4 workers or infras for 2 route shardings (2 nodes for the default router and 2 more nodes for the shard).

Back in OpenShift 3.11, it was possible to have several shards in the same router, changing the ports where the route sharding routers were listening to avoid collisions with the default routers:

ROUTER_SERVICE_HTTP_PORT
ROUTER_SERVICE_HTTPS_PORT

But in OpenShift 4, the OpenShift Routers are controlled and managed by the OpenShift Ingress Controllers.

  • Before the OpenShift4.4 there was no way to use the Ingress Operator to deploy more than a router per node.

  • But from OpenShift4.4+ there’s a way to run more than a router using the IngressOperator NodePortService endpoint publishing strategy.

Endpoint Publishing Comparison - NodePort vs HostNetwork

  • NodePortService

The NodePortService strategy publishes an ingress controller using a Kubernetes NodePort Service. With this strategy, the administrator is responsible for configuring any external DNS or load balancer.

  • HostNetwork

The HostNetwork strategy uses host networking to publish the ingress controller directly on the node host where the ingress controller is deployed.

NOTE: These diagrams are extracted from the original repo of cluster-ingress-operator on GitHub.

Creating new Route Shardings in the Same OpenShift 4 Node

  • Define a new IngressController resource in the openshift-ingress-operator namespace that will be our new Route Sharding:
$ cat ingress-custom.yml
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: custom
  namespace: openshift-ingress-operator
spec:
  replicas: 1
  endpointPublishingStrategy:
    type: NodePortService
  domain: apps2.sharedocp4upi45.lab.upshift.rdu2.redhat.com

The most important part of this IngressController definition is the endpointPublishingStrategy, which needs to be defined as type:NodePortService

In the IngressController Operator, the EndpointPublishingStrategyType is a way to publish ingress controller endpoints. In this specific case, the NodePortService publishes the ingress controller using a Kubernetes NodePort Service, instead of the default endpointPublishingStrategy type:HostNetwork (which publishes the ingress controller on node ports where the ingress controller is deployed).

  • Create the IngressController resource in openshift-ingress-operator namespace:
$ oc create -f ingress-custom.yml
ingresscontroller.operator.openshift.io/default configure
  • Check that the pods of the OpenShift Routers custom are creating:
$ oc get pod -n openshift-ingress
NAME                              READY   STATUS              RESTARTS   AGE
router-custom-568c9c455-p47q2     0/1     ContainerCreating   0          23s
router-default-754bf5f974-rpzq5   1/1     Running             0          4m53s
router-default-754bf5f974-wtrnb   1/1     Running             2          4m53s
  • Check the difference between the router custom and the router default:
$ oc get pod -n openshift-ingress router-custom-568c9c455-p47q2 -o json | jq -r '.spec.hostNetwork'
null
$ oc get pod -n openshift-ingress router-default-754bf5f974-rpzq5 -o json | jq -r '.spec.hostNetwork'
true

As we can see the main difference is that the custom router is not using the hostNetwork, and the default router it is. This is because we configured the endpointPublishingStrategy in the custom router as type:NodePortService.

  • Check the location and the IP of the OpenShift Router pods:
$ oc get pod -o wide -n openshift-ingress
NAME                              READY   STATUS    RESTARTS   AGE     IP             NODE NOMINATED NODE   READINESS GATES
router-custom-568c9c455-p47q2     1/1     Running   0          2m3s    10.131.0.120 worker-1.sharedocp4upi45.lab.upshift.rdu2.redhat.com   <none>           <none>
router-default-754bf5f974-rpzq5   1/1     Running   0          6m33s   10.0.92.117 worker-0.sharedocp4upi45.lab.upshift.rdu2.redhat.com   <none>           <none>
router-default-754bf5f974-wtrnb   1/1     Running   0          6m33s   10.0.93.214 worker-1.sharedocp4upi45.lab.upshift.rdu2.redhat.com   <none>           <none>

NOTE: If you noticed, the pods for the Custom and Default Routers are located on the same OpenShift Node (worker1 in this case). If you create another route sharding using another Ingress Controller, you will also be able to allocate and run the pods on the same nodes, with several Route Sharding router pods coexisting on the same nodes.

  • Check the OpenShift ingress Services that are created by the Default and Custom IngressControllers:
oc get svc -n openshift-ingress
NAME                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
router-internal-custom    ClusterIP   172.30.71.122   <none>        80/TCP,443/TCP,1936/TCP      7m11s
router-internal-default   ClusterIP   172.30.108.18   <none>        80/TCP,443/TCP,1936/TCP      11m
router-nodeport-custom    NodePort    172.30.1.17     <none>        80:32596/TCP,443:32121/TCP   7m11s

As you can notice the Service of the router-internal-default that only has the ClusterIP service. ClusterIP Service exposes the Service on a cluster-internal IP. This value makes the Service only reachable from within the cluster.

On the other hand, the Route Sharding Custom Ingress Controller created two Services, a ClusterIP and a NodePort, because hostNetwork is not enabled, and the OpenShift Routers need to be accessible in some way (through the NodePort instead of HostNetwork).

Remember that the NodePort is used to make the service accessible from outside of the cluster opening a static port on each Node.

  • If we check the router-internal-custom ClusterIP service in detail we can see the endpoints of the pods, that are in the pod Network instead of the hostNetwork
$ oc describe -n openshift-ingress service router-internal-custom
Name:              router-internal-custom
Namespace:         openshift-ingress
Labels:            ingresscontroller.operator.openshift.io/owning-ingresscontroller=custom
Annotations:       service.alpha.openshift.io/serving-cert-secret-name: router-metrics-certs-custom
                   service.alpha.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1602996798
                   service.beta.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1602996798
Selector:          ingresscontroller.operator.openshift.io/deployment-ingresscontroller=custom
Type:              ClusterIP
IP:                172.30.71.122
Port:              http  80/TCP
TargetPort:        http/TCP
Endpoints:         10.131.0.120:80
Port:              https  443/TCP
TargetPort:        https/TCP
Endpoints:         10.131.0.120:443
Port:              metrics  1936/TCP
TargetPort:        1936/TCP
Endpoints:         10.131.0.120:1936
Session Affinity:  None

If we check the host subnets of the workers we noticed that the Pod Network is the 10.131.0.0 that is the worker1 where the router-internal-custom pod is.

$ oc get hostsubnet | grep worker
worker-0.sharedocp4upi43.lab.pnq2.cee.redhat.com   worker-0.sharedocp4upi45.lab.pnq2.cee.redhat.com 10.74.178.241   10.130.0.0/23
worker-1.sharedocp4upi43.lab.pnq2.cee.redhat.com   worker-1.sharedocp4upi43.lab.pnq2.cee.redhat.com 10.74.181.41    10.131.0.0/23
  • On the other hand if we check the NodePort Service generated for the custom IngressController:
$ oc get svc -n openshift-ingress router-nodeport-custom -o yaml | yq .spec
{
  "clusterIP": "172.30.1.17",
  "externalTrafficPolicy": "Local",
  "ports": [
    {
      "name": "http",
      "nodePort": 32596,
      "port": 80,
      "protocol": "TCP",
      "targetPort": "http"
    },
    {
      "name": "https",
      "nodePort": 32121,
      "port": 443,
      "protocol": "TCP",
      "targetPort": "https"
    }
  ],
  "selector": {
    "ingresscontroller.operator.openshift.io/deployment-ingresscontroller": "custom"
  },
  "sessionAffinity": "None",
  "type": "NodePort"
}

We can see that the NodePort has opened a random port (from 30000+) on each Node, and bound it to the HTTP and HTTPS ports.

And with that you can create several Route Shardings in the same OpenShift Nodes, avoiding to have specific nodes only for your IngressController (if you want obviously :D), spending less resources.

Stay tuned to the new blog post!

NOTE: Opinions expressed in this blog are my own and do not necessarily reflect that of the company I work for.

Happy OpenShifting!