Active-Active deprecated

To support very large setups where multiple active sites serve client traffic simultaneously, App Suite middleware has been enhanced with different techniques. Being part of a larger overall architecture, this article only covers the middleware-specific topics.

Request Routing

For the Advanced Routing Service targeting Active/Active Deployments, one needs to decide to which site / data center an incoming request is routed to. Therefore, the middleware evaluates so called segment markers for incoming requests, from which the targeted segment (i.e., the site) can be derived. If possible, the segment marker is determined based on groupware database schema which contains the data an incoming request can be associated with.

To do this kind of request analysis, a REST servlet is available at /request-analysis/v1/analyze, where data of the incoming client request can be sent, and the corresponding segment marker is returned. The following example shows a request to analyze an OpenID Connect login:

POST /request-analysis/v1/analyze

{
    "method": "GET",
    "url": "https://ox.example.org/ajax/login?sessionToken=57d27fe8ec724e2091757f8287d5e91b-8fb31f05dd40444bb53be4add50be320&action=oidcLogin&client=open-xchange-appsuite",
    "headers": [
        {
            "name": "User-Agent",
            "value": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
        }
    ],
    "remoteIP": "82.63.204.174"
}

The middleware responds with HTTP 200 OK including the marker for the segment the user is associated with, and some additional metadata:

{
    "type": "SUCCESS",
    "headers": {
        "x-ox-context-id": 463,
        "x-ox-user-id": 4,
        "x-ox-login": "user@ox.example.org"
    },
    "marker": "eyJzY2hlbWEiOiJveHVzZXJkYl81In0="
}

Full documentation for the analyze endpoint is available here.

Context Pre-Assembly

Pre-assembly allows to pre-populate the databases with skeletons for new accounts during times with lower load (at night or during the weekend), and later activate them on demand. This improves the performance of provisioning operations, additionally, this is a requirement for an Active/Active operation mode, so that the administrative calls can be routed to the designated site reliably, including the very first call to provision a new context.

For this context pre-assembly, a separate documentation article is available at Pre-assembled contexts.

Propagation to Remote Sites

During a failover or planned maintenance, all traffic that would normally reach a certain data center is re-routed to the remaining data centers, i.e. the segment association changes and afterwards all requests of a logged in client suddenly appear up at a different site. In order to support this kind of re-segmentation without interrupting the end user experience, certain bits of data representing volatile states is always made available on the other sites, too - besides any kind of persistent data where a low-level replication between the storages (database, filestore, etc.) is enabled anyways.

For the middleware, this additionally propagated data includes the user sessions and states of certain SSO flows. In order to achieve this, endpoints to Redis on the remote sides have to be configured, via which PUT- and DELETE-operations on such data are replayed there. Therefore, all relevant properties to configure the Redis connector can be specified with an arbitrary property name infix for each remote site, e.g. as follows:

com.openexchange.redis.site2.mode = standalone
com.openexchange.redis.site2.hosts = redis.dc2.example.org
com.openexchange.redis.site3.mode = standalone
com.openexchange.redis.site3.hosts = redis.dc3.example.org

Additionally, remote site awareness needs to be enabled explicitly, and all used infixes that identify the remote sites need to be declared, e.g. as follows:

com.openexchange.redis.sites.enabled = true
com.openexchange.redis.sites = site2,site3

Besides the active propagation of data that should be available across all sites, for sessions, there is also a built-in fallback lookup, where configured remote sites are queried in case of a session miss. The remote lookup is guarded by a rate-limit check; see the property documentation for further details.

Cache Invalidation

Whenever traffic is re-routed again to a site that has been offline before, there might be situations that volatile data in caches has become stale, due to changes that have been performed through another data center. Therefore, when a site comes back again and the re-segmentation of the user base takes place implicitly, the volatile data held in caches can be invalidated explicitly.

For that purpose, a REST servlet is exposed at /segmenter/v1/changed, where such events can be sent to, e.g.:

POST /segmenter/v1/changed

[
    { "id": "site1", "availability": 1 },
    { "id": "site2", "availability": 1 },
    { "id": "site3", "availability": 1 }
]

See also the documentation for the segmenter endpoint here.

When such a notification contains the local site and this site is considered as "back up again", any data that might have become stale is invalidated. This includes both data held in node-local in-memory data structures, as well as distributed cache data in Redis. However for the latter, this must not include important data like user sessions. Therefore, it is possible (and recommended) to configure a separate Redis instance for volatile cache data by using "cache" as infix of the property base names. Awareness for this special Redis instance for cache data is enabled through com.openexchange.redis.cache.enabled, e.g. as in the following example: com.openexchange.redis.cache.enabled=true com.openexchange.redis.cache.mode=standalone com.openexchange.redis.cache.hosts=redis_cache:6379 See also the Redis-related property documentation.

If such a dedicated cache instance is configured that allows a clear separation of volatile data, this volatile part can be invalidated in the most performant way by issuing the FLUSHDB Redis command against it.

Depending on the underlying Redis deployment, this potentially dangerous command also needs to be enabled manually first, see for example the documentation for the commonly used Bitnami chart for Redis.