Data export deprecated
The data export feature allows users to submit a data export of his/her personal data packed into one or more ZIP archives. The actual export is a background task that completes at any time in future. Once ready, the user is notified via E-Mail that he/she is now able to download the packages.
As mentioned before, a data export is designed as a background task. When triggered by the user, it collects user-related data, which can be a lengthy as well as an I/O- and compute-intensive operation. For this reason, the data export feature provides several options to distribute and schedule this operation as best as possible. The most important options for this, which are also mentioned and presented below:
- Only certain nodes handle the processing of triggered data exports via
com.openexchange.gdpr.dataexport.active
. These could be special nodes in a cluster not serving regular calls. - Processing of data exports takes place only at certain times via
com.openexchange.gdpr.dataexport.schedule
- The number of data exports processed in parallel is controlled by
com.openexchange.gdpr.dataexport.numberOfConcurrentTasks
Since the artefacts and result files are typically quite big files, the configuration for the data export feature requires to specify a dedicated file storage, which is used to manage those big files. Thus common file storage associated with a context or user is not burdened and quota is unaffected as well.
Furthermore, scheduling of data export tasks within an Open-Xchange installation is a global task. Therefore, the common payload databases are not suitable to store data export information since determination execution of the next adequate data export task requires to look-up every payload database on every registered database host. Therefore, the data export feature requires setup of the global database to have a central database for fast and easy look-up.
Currently, the Open-Xchange Middleware ships with 5 standard providers for the data export:
- E-Mails
- Calendar
- Contacts
- Tasks
- Drive
Prerequisites
- A special transport must be configured for data export notifications. This transport is configured in
noreply.properties
. All properties therein are config-cascade capable, so their values can be sensitive to the current user or context. - Choose the appropriate file storage to use for storing export artefacts and result files (both not accounting to quota)
- Either register a dedicated file storage via
registerfilestore
or - Select an existing file storage having enough capacities since artefacts/result files are typically quite big files
- Either register a dedicated file storage via
- Enable global database as explained here
Installation and Configuration
As a first step, install the open-xchange-gdpr-dataexport
package on all middleware nodes.
Once installed, set the following configuration properties
com.openexchange.gdpr.dataexport.fileStorageId
Specifies the identifier of the file storage to use. This configuration property is config-cascade aware.com.openexchange.gdpr.dataexport.enabled
Whether the feature is available for a user. This configuration property is config-cascade aware.com.openexchange.gdpr.dataexport.active
Whether processing of data export tasks is enabled in this node.com.openexchange.gdpr.dataexport.schedule
The pattern specifying when processing of data export tasks is allowed. A comma-separated listing of tokens, while a token consists of a weekday and an optional time range A weekday is one of:Mon
,Tue
,Wed
,Thu
,Fri
,Sat
andSun
. It also accepts ranges e.g.Mon-Fri
A time range is a hour of day range of the 24h clock. E.g.0-6
for 0 to 6 in the morning or22:30-24
for 22:30h until 24h in the evening. Also accepts comma-separated pairs; e.g.0-6,22:30-24
A more complex example:Mon 0:12-6:45; Tue-Thu 0-7:15; Fri 0-6,22:30-24; Sat,Sun 0-8
com.openexchange.gdpr.dataexport.numberOfConcurrentTasks
How many task are allowed being processed concurrently on a node
Please find further configuration properties in the configuration documentation section
Command-line tools
The data export feature ships with a set of command-line tools to
- List the status of submitted data export tasks. Either all, by context or by user
- Request cancelation for certain data export tasks by user or context
- Check and repair consistency of the associated file storage
List data export tasks via listdataexports
Usage
listdataexports [-c <contextId>] [-i <userId>] -A <masterAdmin | contextAdmin> -P <masterAdminPassword |
contextAdminPassword> [-p <RMI-Port>] [-s <RMI-Server] [--responsetimeout <responseTimeout>] |
[-h]
-A,--adminuser <adminUser> Admin username
-c,--context <contextId> The context identifier. If context identifier is given, only data export tasks
associated with denoted context are listed; otherwise all data export tasks are
listed.
-h,--help Prints this help text
-i,--userid <userId> The user identifier. If user identifier is given, only the data export task associated
with denoted user is listed. This user-sensitive output also includes the
task-associated work items and result files.
-p,--port <rmiPort> The optional RMI port (default:1099)
-P,--adminpass <adminPassword> Admin password
--responsetimeout <timeout> The optional response timeout in seconds when reading data from server (default: 0s;
infinite)
-s,--server <rmiHost> The optional RMI server (default: localhost)
Examples
List all data exports:
$ listdataexports -A oxadminmaster -P secret
List all data exports of a certain context:
$ listdataexports -c 1234 -A oxadminmaster -P secret
List the data export of a certain user. This outputs includes task-associated work items and result files (if any):
$ listdataexports -c 1234 -i 3 -A oxadminmaster -P secret
Request cancelation for data export tasks via canceldataexports
Usage
canceldataexports -c <contextId> [-i <userId>] -A <masterAdmin | contextAdmin> -P <masterAdminPassword |
contextAdminPassword> [-p <RMI-Port>] [-s <RMI-Server] [--responsetimeout <responseTimeout>] |
[-h]
-A,--adminuser <adminUser> Admin username
-c,--context <contextId> The context identifier. If only context identifier is given, all data export tasks
associated with denoted context are requested for being canceled
-h,--help Prints this help text
-i,--userid <userId> The user identifier. If this option is set, only the data export task associated with
that user is canceled
-p,--port <rmiPort> The optional RMI port (default:1099)
-P,--adminpass <adminPassword> Admin password
--responsetimeout <timeout> The optional response timeout in seconds when reading data from server (default: 0s;
infinite)
-s,--server <rmiHost> The optional RMI server (default: localhost)
Examples
Request to cancel all data exports for a certain context:
$ canceldataexports -c 1234 -A oxadminmaster -P secret
Request to cancel the data export for a certain user
$ canceldataexports -c 1234 -i 3 -A oxadminmaster -P secret
List (& repair) orphaned file storage entities or references to non-existent file storage entities via dataexportconsistency
Usage
dataexportconsistency -f <filestoreId_1> ... [-r] -A <masterAdmin> -P <masterAdminPassword> [-p <RMI-Port>] [-s
<RMI-Server] [--responsetimeout <responseTimeout>] | [-h]
-A,--adminuser <adminUser> Admin username
-f,--filestores <filestoreIds> Accepts one or more file storage identifiers
-h,--help Prints this help text
-p,--port <rmiPort> The optional RMI port (default:1099)
-P,--adminpass <adminPassword> Admin password
-r,--repair Repairs orphaned files and export task items.
--responsetimeout <timeout> The optional response timeout in seconds when reading data from server (default: 0s;
infinite)
-s,--server <rmiHost> The optional RMI server (default: localhost)
Examples
List orphaned file storage entities or references to non-existent file storage entities:
$ dataexportconsistency -f 34 -A oxadminmaster -P secret
Repair orphaned file storage entities or references to non-existent file storage entities:
$ dataexportconsistency -f 34 --repair -A oxadminmaster -P secret