Running a cluster deprecated

This document describes how a cluster can be setup between middleware containers.

Concepts

For inter-OX-communication over the network, multiple middleware containers (core-mw) can form a cluster. This brings different advantages regarding distribution and caching of volatile data, load balancing, scalability, fail-safety and robustness. Additionally, it provides the infrastructure for upcoming features of OX App Suite. The clustering capabilities of the containers are mainly built up features of Redis, an in-memory database. The following article provides an overview about the current featureset and configuration options.

Requirements

Synchronized system clock times

It is crucial that all involved members in a cluster do have their system clock times in sync with each other. The system clock in a container is the same as on the host machine, because it’s controlled by the kernel of that machine. As containers are managed by Kubernetes, please make sure the system clock times of all Kubernetes workers are in sync; e.g. by using an NTP service.

HTTP routing

A middleware cluster is always part of a larger picture. Usually there is a front level loadbalancer (e.g. Istio) as central HTTPS entry point to the platform. This loadbalancer optionally performs HTTPS termination and forwards HTTP(S) requests to the middleware containers.

It is advisable to ensure session stability. HTTP routing should happen such that all HTTP requests of a session end up on the same middleware container.

If you are using Istio as a loadbalancer and deployed OX App Suite with our stack chart, proper HTTP routing is already in place. The stack chart deploys multiple so-called Destination Rules. A dedicated destination rule for the HTTP API ensures, that the first request will set a routing cookie (<RELEASE_NAME>-core-mw-http-api-route). This cookie is then used to route follow-up requests to the same container that has processed the initial request.

Features

The following list gives an overview about different features that were implemented using the new cluster capabilities.

Distributed Session Storage

Previously, when an Open-Xchange server was shutdown for maintenance, all user sessions that were bound to that machine were lost, i.e. the users needed to login again. Now those sessions are stored inside redis and are no longer bound to a specific container in the cluster. When a container is shut down, the session data is still available in the cluster and can be accessed from the remaining containers. The load-balancing techniques of the webserver then seamlessly routes the user session to another container, with no session expired errors. Unlike before the redis bases distributed session storage is available with the core package and besides the normal redis configuration no further configuration is necessary.

Notes:

While there's some kind of built-in session distribution among the containers in the cluster, this should not be seen as a replacement for session-stickiness between the loadbalancer and middleware containers, i.e. one should still configure the loadbalancer to use sticky sessions for performance reasons.

Remote Cache Invalidation

Compared to before the vast majority of OX App Suite caches reside in redis so that for the most part remote cache invalidation is not needed. There are still some local caches used to improve performance. Those caches are invalidated using redis pub/sub mechanism. No further configuration necessary.

Updating a Cluster

Running a cluster means built-in failover on the one hand, but might require some attention when it comes to the point of upgrading the services on all containers in the cluster. This chapter gives an overview about general concepts and hints for updating of the cluster.

The Big Picture

Updating an OX App Suite cluster is possible in several ways. The involved steps always include

Update the software by updating the container images
Update the database schemas (so-called update tasks)

There are some precautions required, though.

Update Tasks Management

It is a feature of the middleware to automatically start update tasks on a database schema when a user tries to login whose context lives on that schema. For installations beyond a certain size, if you just update the middleware without special handling of the update tasks, user logins will trigger an uncontrolled storm of update tasks on the databases, potentially leading to resource contention, unnecessary long update tasks runtimes, excessive load on the database server, maybe even service outages.

We describe the update strategy in more detail in the next section. Note that these are still high-level outlines of the actual procedure.

Rolling strategy

It is possible to execute the update tasks decoupled from the real update of the rest of the cluster, days or even weeks ahead of time, with the following approach:

Start a new middleware container with the updated version that is not part of your cluster, identically configured to the other middleware containers. This container should not receive any user traffic.
Execute all update tasks from the update container.

In the last step, users from affected schemas will be logged out and denied service while the update tasks are running on their database schema. This is typically a short unavailability (some minutes) for a small part (1000...7000 depending on the installation) of the user base.

This way you end up with the production cluster running on the old version of OX App Suite, with the database already being upgraded to the next version. This is explicitly a valid and supported configuration. This approach offers the advantage that update tasks can be executed in advance, instead of doing them while the whole system is in a full maintenance downtime. Since update tasks can take some time, this is a considerable advantage.

For the actual upgrade of the production cluster, the remaining step is to upgrade your deployment to the latest stack chart version. This includes an update of the core-mw chart with updated container images.

Since all sessions are stored in redis users should not have to relogin when a container is restarted.

Rolling Upgrade in Kubernetes

This guide assumes you are already familiar with Kubernetes and have an already running full stack deployment. E.g. by using the App Suite stack chart.

The procedure consists of a pre-update where one update container, that does not participate to the cluster, will execute the database update tasks, and a real update, where all middleware containers of the cluster will get updated to the new image version of the software.

Pre-Update

In this phase, you will have to create a new Helm release of the core-mw chart with slightly modified values compared to your full stack deployment. This will create a Kubernetes job, that starts a middleware container and runs all the update tasks.

Preparations

Take backups of as much as possible (databases, OX config files, etc).

Step-by-Step Guide

To create the Kubernetes job that runs the update tasks, follow these steps:

Download the desired version of the core-mw chart from registry.open-xchange.com and extract it to a new folder.
Create a new values file (e.g. upgrade-values.yaml) in the extracted directory.
Copy the core-mw configuration of your existing full stack deployment into the new file to ensure that the upgrade container has the same configuration in order to execute all necessary update tasks.
Adjust the values for the new version if necessary (please see the chart's release notes).
Modify values in upgrade-values.yaml:
- Set enableInitialization: false to prevent initialization.
- Remove the existing mysql section and add a global one that references your existing Kubernetes secret containing the access information:
```
global:
  mysql:
    existingSecret: "<RELEASE>-core-mw-mysql"
```
- Add globalDBID and set the global database ID of your deployment.
- Enable the update section. This ensures that no deployment and only the upgrade job will be created. Optionally define the database schemata to reduce the load on the database:
```
update:
  enabled: true
  schemata: "<YOUR_DATABASE_SCHEMATA>"
  image:
    tag: "<IMAGE_TAG>"
```
Install a new Helm release of the core-mw chart with your modified upgrade-values.yaml into the same namespace as your full stack deployment. This will allow the upgrade container to find your database and establish a successful connection. The Helm release will create a Kubernetes job and run the update tasks. Wait for the job to complete.
If the job has succeeded, uninstall the Helm release, which will remove the Kubernetes job and proceed with the real update. If the job failed, manually resolve the situation. Then, remove the Kubernetes job via kubectl and optionally re-run the update tasks via helm upgrade. This will create another job that executes the update tasks again.

Real Update

The real update is more simple. Simply update the core-mw chart version of your existing full stack deployment to the new version and run helm upgrade

Note: Be cautious with major chart updates, since they might require adjusting the chart values. Please see the chart's release notes for more information.

Reference Documentation

Other Considerations

Do not stop a container while running the update tasks.
When updating only UI plugins without also updating to a new version of the core UI, you also need to perform the additional step from Updating UI plugins.