1 Introduction
This document provides a description for the data storage handling features of Ericsson Centralized User Database (CUDB).
1.1 Scope
The purpose of this document is to describe the architecture, feature interactions, and main functions of the data storage handling features, including the various data backup, restore, and reconciliation procedures. In addition, the document also provides information on the operation and maintenance of the feature.
1.3 Target Groups
This document is intended for system administrators and users working with the data storage handling features, and performing data backups, restore operations, or data reconciliation.
1.4 Prerequisites
Users of this document must have basic knowledge and experience of the following:
1.5 Typographic Conventions
Typographic Conventions can be found in the following document:
2 Overview
CUDB is a distributed, highly resilient, in-memory database, providing a seamless, geographically-distributed database system intended as a common repository in the network to store subscriber data for diverse applications.
As a physically-distributed system, CUDB is made up of a number of nodes. The CUDB node is the main architectural unit of a CUDB system. Each CUDB node is supposed to be deployed in one single network location or site for redundancy reasons.
All CUDB nodes are interconnected through a reliable IP transport network. Data stored in the CUDB system is exposed as a Lightweight Directory Access Protocol (LDAP) Directory Information Tree (DIT). Refer to CUDB LDAP Interwork Description for more information on data structures within the LDAP tree.
2.1 Prerequisites
This section is not applicable to this feature.
2.2 Architecture
From a logical point of view, the architecture of a CUDB system is made up of two data layers, as shown in Figure 1.
The Data Store (DS) Layer spans all the CUDB nodes in the system, while the Processing Layer (PL) not necessarily spans all of them. Components of each layer can communicate with other components of the same layer, even if they are located in different network locations. Exceptions are the CUDB nodes without PL, where they communicate with one PL in the same site. The PL elements can only access DS Layer components located in the same node. For example, if a PL element in CUDB node A needs some resources of a component of the DS Layer in CUDB node B, it must communicate with the PL elements in CUDB node B to access the DS Layer component from there to get the data. See the subsections below for more information on the two layers.
2.2.1 Data Store Layer
The DS Layer is the main data repository in CUDB. The entire subscriber dataset stored in CUDB is partitioned into several Data Store Unit Groups (DSGs), each storing a disjoint partition of subscriber data (see Figure 3).
For each of those data partitions, CUDB allows one, two or three instances (or replicas) to be configured, as described in CUDB Technical Product Description . Each replica of a DSG is stored in a different CUDB node and typically in a different site. The number of DSGs in a CUDB system, the number of replicas, and the distribution of replicas across the CUDB system depend on the system configuration. Refer to CUDB Data Distribution for more information.
2.2.2 Processing Layer
In some configurations, the PL is not present on all the CUDB nodes. The PL supports the LDAP protocol handling, and has the application logic to resolve any request. For this purpose, the PL on each node includes a pool of LDAP Front Ends (FEs), and an instance of the Processing Layer Database (PLDB).
Refer to CUDB Data Distribution for more information on PLDB.
2.2.3 Geographical Redundancy and Replication
For each of the data partitions—PLDB and the DSGs—one of the replicas is considered to be the authoritative source of data. Such replicas are called master replicas and all other replicas of that data partition, if any, are called slave replicas. All write operations will be addressed to master replicas, then replicated asynchronously to slave replicas. If the master replica of the PLDB or of any DSG fails or becomes unreachable for any reason, another replica of the corresponding data partition is appointed as the new master replica, if any is available. Refer to CUDB High Availability for more information.
For redundancy purposes, two replication channels connect the master replicas to the slave replicas. These channels work in active-standby redundancy mode.
For more information on replication mechanisms and the handling of master and slave replicas, refer to CUDB High Availability.
2.3 Description
This section contains detailed information about data storage handling in CUDB.
2.3.1 Data Storage
CUDB Data Storage is realized primarily through the Storage Components acting as a repository and database in CUDB. Storage Component units are in-memory cluster database systems made up of a set of blades or Virtual Machines (VMs) behaving as a unit, and include all the PLDB and DSG databases of the CUDB system. For a simplified physical and logical view of the CUDB node, see Figure 2.
CUDB supports defining Binary Large Objects (BLOB) in the data model. It is possible to select whether each type of BLOB is stored in the memory or the disk storage system. Refer to CUDB Binary Large Object Attributes Management for more information.
A PLDB is a variable-size cluster running on 2 to 16 database blades or VMs. Database clusters storing DSG data are called DS Units. A DS unit is a fixed-size cluster, running on two database blades or VMs.
The number of DS units in a CUDB node is determined by the total number of blades or VMs in the node and the number of blades or VMs used by PLDB (in some configurations the latter is 0 in nodes without PLDB). Therefore, a CUDB node may host as many DS units as the number of remaining pairs of payload blades or VMs, that is, excluding the two System Controllers (SCs).
The CUDB Storage Component is characterized by the following attributes:
2.3.2 Processing Layer Database Distribution
The PLDB can be configured to be hosted in all the nodes of the system or only in some of them, but at least one replica of the PLDB must be configured in a node for every site in the system. The advantage of not configuring the PLDB in all the nodes of the site is to free up some blades or VMs for extra DSGs, so this scenario is mainly used for large deployments. Refer to CUDB Node Configuration Data Model Description for more information on PLDB configuration.
2.3.3 Data Storage Distribution
A DS Unit can be configured to host data for any single DSG. As shown in Figure 3, DS Unit 2 hosts data for DSG 2 in CUDB Node 1, but the same DSG 2 is hosted by DS Unit 1 in CUDB Node 5. DSGs do not necessarily need to be hosted in all CUDB nodes: the data stored in all the DSGs is accessible from any CUDB node in the system.
As shown in Figure 4, nodes without PLDB can host more DS units than CUDB nodes with PLDB: the blades or VMs hosting the PLDB are converted to host DS units. The currently valid DS unit distribution rules for the CUDB nodes must be followed for these new DS units.
Internally, each data partition is organized in a relational model as provided by the Storage Component. Externally, the data storage is accessed with the LDAP protocol as a Directory Information Tree (DIT). This internal relational model assigns memory chunks to individual tables indexed by object classes.
The overall memory occupation levels in the PLDB and DS Units are monitored. When the occupation levels reach a certain threshold, an alarm is raised.
These alarms are raised if the warning level (configured with the memoryWarningThreshold parameter) and the overall maximum level (configured with the memoryFullThreshold parameter) are reached. It is strongly recommended not to change the full threshold value. The warning threshold values can be freely configured to satisfy safety requirements. See Fault Management for more details.
In addition, batches of subscriber data can be moved from one DSG to a different DSG to provide a more balanced memory occupation or a more balanced load distribution across DSGs. This process is known as the subscription reallocation, or "reallocation" procedure. Refer to CUDB Subscription Reallocation for more information.
Data can be distributed among DSGs in accordance to the geographical zones as well. Refer to CUDB Multiple Geographical Areas for more information.
2.3.4 Data Storage Management
The configuration model of a CUDB node includes information about PLDB, all replicas of PLDB in the CUDB system, all DSGs and all their replicas (DS Units) in the CUDB system. Refer to CUDB Node Configuration Data Model Description for more details on this configuration model. Data store management is carried out both through changes in the configuration data model and by executing CLI commands.
When managing data partitions, the following procedures must be considered:
See Operation and Maintenance for details about these management tasks.
2.3.5 Data Storage Scalability
The number of database blades or VMs to use for PLDB determines its capacity and cannot be changed. It is crucial to properly dimension PLDB during initial installation to accommodate the actual and future required capacity.
Storage space for subscriber data and processing capacity to handle that extra data can be increased by adding new DSGs, along with the corresponding DS Units, to the CUDB system.
2.3.6 Reconciliation
Reconciliation is a system-wide process whose goal is to detect and correct discrepancies among the PLDB and DSGs. Since CUDB is an application-agnostic system, the discrepancies that the reconciliation process is able to detect and correct are simply entries stored in a DSG that have no matching Distribution Entry (DE) in the PLDB as an ancestor. Such a discrepancy is known as inconsistency in CUDB. Reconciliation fixes inconsistencies by deleting the offending data from the DSG where it is hosted.
| Note: |
Reconciliation runs as a low priority background task to
minimize its impact on traffic processing. It is executed in nodes
where there are DSG master replicas which are suspected to be inconsistent
with PLDB. If there are several DSG master replicas to be reconciled
in a CUDB node, they are handled one by one. |
2.3.6.1 Reconciliation Tasks
The reconciliation process runs on every CUDB node, and executes the following four main tasks:
2.3.6.2 Reconciliation Scenarios
Due to asynchronous replication and the possibility to do a group data restore, discrepancies among the PLDB and the DSGs can appear in a CUDB system. These discrepancies are usually due to the side effects of the layered data distribution of CUDB (PLDB in the PL, and the DSGs in the DS Layer), the geographical redundancy of the stored data, and the asynchronous nature of replication in the underlying database engine.
There are five different scenarios where reconciliation is needed which are as follows:
Reconciliation can also be executed manually by the cudbReconciliationMgr command. Refer to CUDB Node Commands and Parameters for more information.
| Note: |
In case of group data restore and system data restore operations,
reconciliation must be scheduled manually, unless the automatic execution
of Selective Replica Check and Data Repair processes is disabled by
configuration. Check the details of the CudbDsGroupRepairAndResync class in
CUDB Node Configuration Data Model Description for
more information about configuring these processes. |
2.3.6.3 Reconciliation Output Files
2.3.6.4 Reconciliation Management
Reconciliation is automatically handled by the CUDB system and requires no administrative actions other than managing raised alarms.
The cudbReconciliationMgr command is available to interact with the reconciliation feature, by offering the following functions:
Refer to CUDB Node Commands and Parameters for more information on this command.
2.3.7 Data Backup and Restore
Data Backup dumps the contents of the databases into files, which can later be used to recover the contents of the databases when needed. Data backups can be created both of a particular database partition (unit data backup) and of the complete CUDB system (system data backup).
A CUDB system data backup (that is, a backup of all the data in the CUDB system) is composed of single backups of every data partition inside the CUDB, that is, PLDB and all DSGs. Unit data backups are useful in specific recovery situations.
It is recommended to create a system data backup whenever any major system change takes place.
The restore types are similar to the backup types: CUDB offers the restoration of the complete CUDB system (system data restore), of all replicas of a data partition (group data restore), and of a particular database unit (unit data restore).
The backup and restore functions are described below in more detail.
2.3.7.1 Unit Data Backup
Backing up a single database unit is known as the unit backup, and it must be ordered in the node that holds the master PLDB or DSG replica of the unit.
| Note: |
Before initiating a backup, operators must locate the master
replica of the unit by using the cudbSystemStatus command. Refer to
CUDB System Administrator Guide for more information. |
Once the node where the backup must be initiated is chosen, the backup is executed using the cudbManageStore command.
The output of the unit backup consists of three files for each participating data node in the cluster (two data nodes in case of DSGs, 2 to 16 data nodes in case of PLDB, unless any of the data nodes in the cluster is down - refer to CUDB High Availability for more information). These files are as follows:
The files generated by the unit backups are stored within the local disk storage system of each participating blade or VM. The files are located in the following directory:
/local/cudb/mysql/ndbd/backup/BACKUP/BACKUP-YYYY-MM-DD_HH-mm, where YYYY, MM, DD, HH, and mm stand for the year, month, day, hour, and minute of backup execution, respectively.
| Note: |
To conserve space in the local disk storage system, the blades
or VMs keep only the files of the last three unit backups. Subsequent
unit backups ordered on the same unit result in deleting the oldest
unit backup files to make room for the new backup files. |
Figure 5 shows the inside of the DSG1 master replica unit (part of Node 1 shown in Figure 3), and how both blades or VMs hold the complete data set in the unit. If both blades or VMs are active, each one dumps one part of the complete data set into the backup files, so the overhead caused by the backup operation is evenly spread among both. The colored ovals represent the sub-sets into which the data set (held by the unit) was split.
However, it can happen that when the backup is ordered, one of the blades or VMs running the unit marked for backup is down. In that case, that blade or VM does not generate any files at all, but dumps the data of the inactive blade or VM into the backup files that are generated (because its peer contains the whole data set in the replica). Figure 6 shows what happens in such a case.
In case of problems during the backup procedure, the alarm corresponding to the PLDB and DSG backup is raised. See Backup Alarms for more information.
2.3.7.2 System Data Backup
Backing up the data of the complete CUDB system can be ordered on any CUDB node with the cudbDataBackup command. A system data backup is made of a unit data backup of the PLDB and a unit data backup of each of the DSGs in the system. The clusters where the data unit backups are taken are chosen according to the following rules:
System data backup cannot be initiated in the following cases:
Since CUDB is a highly-distributed database system, and (as mentioned above) the system-level backup consists of a series of individual data backups for the PLDB and every DSG database, the output of this backup is a number of individual backups spread over the participating CUDB nodes of the system. However, the individual unit backups of the system-level backup are not interchangeable with the single database unit backups described in Unit Data Backup. The unit backups of the system data backup procedure differ from the single database backups in the following:
Every TAR file generated for each participating blade or VM is placed in the /home/cudb/systemDataBackup directory within the NFS storage. The format of the TAR filename is as follows:
BACKUP-<DateTime>-<CUDBNodeId>.<Unit>.<ClusterNodeId>.tar
The individual filename elements in the backup filename stand for the following:
| DateTime |
Contains the date and time when the backup was executed. It follows the YYYY-MM-DD_HH-mm format, where YYYY, MM, DD, HH, and mm stand for the year, month, day, hour, and minutes, respectively. |
|
| CUDBNodeId |
The CUDB Node Identifier of the node where the backup was executed. |
|
| Unit |
The unit number of the PLDB or the DSG where the DS belongs. Its value is 0 for PLDB and 1-255 in case of DSGs. |
|
| ClusterNodeId |
The database cluster node ID, used to identify the data node within the cluster. |
|
| Note: |
To conserve space in the disk storage system, the system
keeps only the files of the last three system backups in the shared
directory of the CUDB node. Subsequent system backups result in deleting
the oldest system backup to make room for the new backup files. |
During the execution of the system data backup, the information about the chosen DSG database replicas is printed out to the standard output of the SSH session through which the system data backup command was entered. It is recommended to keep this information, since administrators need it to collect the backup files from the different CUDB nodes before initiating a system restore. See System Data Restore) for more details.
As explained in System Data Restore, system data restore results in the loss of all changes in the CUDB database occurred between the creation of the system data backup and the execution of the system data restore. Therefore, administrators must be completely sure that the restoration is really necessary.
Individual PLDB and DSG database backups in a system data backup are not synchronized. This means that data in those individual backups might not be fully aligned if provisioning is on while the system data backup is being created (for example, in case new subscriber information is added to the PLDB and the DSGs). Therefore, operators may choose to stop provisioning during system data backup. For more information, refer to CUDB Backup and Restore Procedures.
Figure 7 shows how a complete system data backup is executed in the entire CUDB system, and what the outcome of the backup execution is.
System data backups have some interactions that must be considered in order to minimize inconsistencies in the content of backup files. See Dependencies and Interactions for further details.
In case of problems during the backup procedure, the alarm corresponding to the PLDB and DSG backup is raised. See Backup Alarms for more information.
2.3.7.3 Periodic System Data Backup
System data backups can be automated by creating periodical system data backup tasks, using the Linux Distribution Extension (LDE) cron resource. This way, administrators can ensure that a recent backup is always available without the need of manual intervention. Refer to CUDB Backup and Restore Procedures for more information.
The alarms that apply to the periodic backup are the same as the normal backup of a complete CUDB system. See Backup Alarms for more information.
2.3.7.4 Unit Data Restore
The restore of a single database unit is called the unit data restore, and makes it possible to recover a unit data backup into a single database replica (see Unit Data Backup for more information on unit data backups). The operation is executed with the cudbManageStore command.
| Note: |
It is strongly recommended to use only the latest unit data backup for restore operations.
Using obsolete unit data backups for unit restore operations can lead to the impossible
establishment of the replication that can only be fixed with making a new unit data backup
and restoring that again. Refer to CUDB Backup and Restore Procedures for more information. |
Before executing the restore operation, the output files of the latest unit data backup must be copied to the destination blades or VMs of the replica. As described in Unit Data Backup, the backup files are located by default in the following directory of the local disk storage system of the blade or VM:
/local/cudb/mysql/ndbd/backup/BACKUP/BACKUP-YYYY-MM-DD_HH-mm
Then the files must be copied to the blades or VMs where the cluster data nodes of the unit to restore are running. In case not all the blades or VMs of a unit are available, refer to CUDB Backup and Restore Procedures for more information.
After the successful execution of the restore operation, the replication is started automatically. Refer to CUDB Backup and Restore Procedures for the exact steps of performing unit data restore.
| Note: |
Stored SQL procedures issued in the database for different
purposes (such as performance management) are neither backed up, nor
restored. Therefore, stored SQL procedures intended to run in the
CUDB system must be recovered individually at every unit. Refer to
CUDB Backup and Restore Procedures for
more details. |
In case of problems during the restore procedure, the corresponding DS and PLDB alarms are raised. See Restore Alarms for more details.
2.3.7.5 PLDB Group Data Restore
In case the restoration of all PLDB replicas is required at once, the state of the PLDB data can differ from the state of the DSG data after successful restoration, the latter being more recent.
In such case, when reconciliation is executed either automatically or manually, it might remove the recently added subscriber data, thereby restoring all the DSG replicas back to the state when the PLDB was backed up. See Reconciliation for more details on reconciliation.
| Note: |
If the Selective Replica Check and Data Repair processes
are enabled, then reconciliation must be scheduled manually for all
DSGs after a PLDB group data restore. Check the details of the CudbDsGroupRepairAndResync class in
CUDB Node Configuration Data Model Description for
more information about configuring the Selective Replica Check and
Data Repair processes. |
2.3.7.6 DSG Group Data Restore
The restore of a complete DSG database is also known as the group data restore. It consists of the unit data restore of all the DSG database replicas.
Replication must be started manually on all DSG slave replicas. Refer to CUDB Backup and Restore Procedures for more details on the process, and the exact steps to follow.
| Note: |
Group data restore initiated for a DSG does not affect the
rest of the DSGs in the CUDB system. However, following the master
election of the restored group, a reconciliation procedure is executed
automatically, which removes the inconsistent entries of the restored
group from the system (since the restored data is likely outdated
compared to the current PLDB). See Reconciliation for more information about reconciliation. |
Similarly to the unit data restore procedure, the stored SQL procedures must be recovered individually at every unit. Refer to CUDB Backup and Restore Procedures for more details.
In case of problems during the restore procedure, the corresponding DSG alarm is raised. See Restore Alarms for more details.
| Note: |
If the Selective Replica Check and Data Repair processes
are enabled, then reconciliation must be scheduled manually for the
restored DSG after a DSG group data restore. Check the details of
the CudbDsGroupRepairAndResync class in
CUDB Node Configuration Data Model Description for
more information about configuring the Selective Replica Check and
Data Repair processes. |
2.3.7.7 System Data Restore
The restore of the entire CUDB system is also known as the CUDB system data restore. This type of restore requires a previous CUDB system data backup, and is equivalent to the individual restoration of all database system units (that is, the restoration of all PLDB replicas and all DSGs database replicas). System data restore can only be executed on a system that is exactly the same as the system from which the backup was taken.
System data restore results in the loss of all changes in the CUDB database occurred between the creation of the system data backup and the execution of the system data restore. Therefore, administrators must be completely sure that the restoration is really necessary.
| Note: |
Restoring a complete CUDB system wipes out all the changes
in the system since the last system backup. |
The detailed steps of performing a complete system restore are available in CUDB Backup and Restore Procedures. The following list provides an overview of the steps to perform during the process.
Steps
Troubleshooting:
In case of problems during the restore procedure, the corresponding restore fault in PLDB and DS alarms are raised. See Restore Alarms for more details.2.3.7.8 Combined Unit Data Backup and Restore
The automated procedure for a single database unit backup and restore enables to execute these two steps with a single command, using the cudbUnitDataBackupAndRestore script. The command takes a unit data backup on the master replica and restores this data backup into the specified slave replica.
This procedure can be initiated by specifying the database unit (PLDB or a DSG) and the target node for the restoration.
The command generates backup files, copies the files to the target node, and removes the backup files at the end of the restore procedure.
For further information, refer to CUDB Backup and Restore Procedures and CUDB Node Commands and Parameters.
2.3.7.9 Automated Procedure for Complete CUDB System Backup and Restore
The cudbSystemDataBackupAndRestore command automates the backup and restore procedure for a complete CUDB system. In this backup type all information stored in CUDB is affected. A CUDB system data backup is composed of single backups of each individual database unit inside CUDB.
The following have to be considered when performing a system data backup and restore:
For further information, refer to CUDB Backup and Restore Procedures and CUDB Node Commands and Parameters.
2.3.8 Detecting Inconsistencies between Replicas
Due to the nature of asynchronous replication, if there is a replication lag between the master DS unit and the slave DS unit of a DSG, the data content of the master and the slave is different at any given time. This "temporary" inconsistency is normal.
However, under normal operational conditions the replication lag is small and causes only minimal data content difference. Based on this assumption, CUDB offers a solution that performs lightweight replica consistency checking based on the difference of tables size in master and slave replicas.
The feature checks the PLDB and all DSG slave replicas at configurable intervals, and it can also be executed manually using the cudbCheckConsistency command. For more information on this command, see CUDB Node Commands and Parameters.
Inconsistencies can also be introduced by other factors, such as replication channel problems caused by network disturbances, or by human errors (for example inappropriate backup handling or maintenance operation). To handle such inconsistencies, CUDB offers a solution called Consistency Check, which performs a consistency check that can detect inconsistency between master and slave replicas by performing a scan of the object classes and attributes in LDAP entries. For more information about Consistency Check, see CUDB Consistency Check.
The cudbConsistencyMgr command can order consistency tasks to check that a master and a slave replica of the same DSG or PLDB are consistent. It is also possible to list each pending and running consistency check task within all sites of the CUDB system. For more information on this command, refer to CUDB Node Commands and Parameters.
2.3.8.1 Consistency Check Output Files
The Consistency Check tasks produce several log files, which contain information about the task execution and the found inconsistencies. For more information about these logs, refer to CUDB Consistency Check.
2.3.9 Automatic Handling of Network Isolation
This section contains information about the Automatic Handling of Network Isolation function in CUDB.
The Automatic Handling of Network Isolation function maximizes service availability for network isolation scenarios, therefore removing the potential impact of partial loss of service in an isolated site or set of sites during a network incident. The function ensures that in case of an even symmetrical split situation, the two halves of the system always provide service: this is achieved by making the CUDB system working in a "multi-master" mode during network incidents. In this mode, both halves execute changes, but they cannot replicate their local changes to the other half, therefore temporarily behaving as two diverging databases. When the network is recovered, Automatic Handling of Network Isolation ensures that all the modifications executed during the "multi-master" mode are automatically identified, merged together into the master (that is, "repaired") and kept.
During the execution of Automatic Handling of Network Isolation, the Selective Replica Check and then the Data Repair processes are run to achieve the final result.
2.3.9.1 Selective Replica Check
The Selective Replica Check process is started on a database cluster, which was formerly a master, but is now a slave, and is unable to sync with the current master. It tries identifying the changes on the database cluster that may have not been replicated to the current master (a side effect of asynchronous replication) by reading the operational logs of that database cluster in the payload blades or VMs.
2.3.9.2 Data Repair
The Data Repair process is started as part of Automatic Handling of Network Isolation if and when Selective Replica Check has been completed.
Data Repair takes the Selective Replica Check output as its input, containing information about the LDAP entries. For each LDAP entry, the timestamp of the last update on the former master is recorded, together with the type of operation and the latest contents of the entry on the former master.
Data Repair compares the LDAP entries in the output of Selective Replica Check with data in the current master replica. Then, based on this information, on an entry-by-entry basis, Data Repair decides whether to keep the data in the current master replica as it is or to update/delete the LDAP entry on the current master replica, based on the information from the former master replica. Each entry in the input is logged to the Data Repair output logs, either to the repaired entries log or the unrepaired entries log.
The corresponding alarms are raised to inform the operator about repaired and/or unrepaired entries.
2.3.9.3 Logs
For the format and interpretation of the output logs of the Automatic Handling of Network Isolation processes, refer to CUDB Automatic Handling of Network Isolation Output Description.
2.3.10 Self-Ordered Backup and Restore
The Self-Ordered Backup and Restore process is started on a slave database cluster if it is unable to sync with the current master. The process will try to recover the original geographical redundancy configuration, 1+1 double or 1+1+1 triple geographical redundancy, by creating an individual unit data backup on the current master PLDB or DSG replica and restore that unit data backup on its local slave replica without any intervention.
2.4 Dependencies and Interactions
This section provides information on the various dependencies and interactions.
2.4.1 Consistency Check Interactions
The Consistency Check task does not start, and the ongoing ones stop, if any of the following conditions are met:
A new task can be started for other non-degraded DSs if an ongoing task was interrupted due to degradation. A new task does not start on the degraded DS.
It is possible to stop an ongoing replica consistency check task manually during the execution, or to remove it from the Pending Task List.
The root cause of the stopped or failed replica consistency check tasks is logged locally in the given site, which can be checked with the cudbConsistencyMgr -t command.
During a consistency check task, it is not allowed to perform any LDAP schema change. It is not allowed to run consistency check task during an upgrade.
The consistency check process tries to minimize the impact on the traffic handling capacity of the CUDB system. It may cause some bandwidth usage increase.
2.4.2 Backup Interactions
In general, system-level and unit-level backup operations do not necessarily interact with other CUDB system functions. However, when a PLDB or DS replica is backed up, negative performance impact is expected due to the constant information flow between the memory and the local disk storage system at every blade or VM involved in the backup. If the backed up replica is a master replica, the performance impact also affects the number of traffic operations in each unit that the CUDB system can handle at a time. At the same time, the backup of the DSG master replica only impacts the traffic operations that are directed to the set of subscribers allocated to the DSG that the master replica belongs to.
It is strongly recommended to observe the following rules for backup executions:
The execution of a backup does not affect regular traffic operations and traffic operations do not affect the backup. Provisioning might have minor impacts on system data backups. For more information, refer to CUDB Backup and Restore Procedures.
2.4.3 Restore Interactions
During the execution of a PLDB or DSG unit data restore, the PLDB and DSG replicas affected by the restoration are unavailable. The PLDB and DSG replica downtime ends when the restore process is successfully finished. The downtime of a PLDB replica results in making its whole node becoming unavailable. Refer to CUDB High Availability for more information.
During the execution of a DSG group data restore, the affected DSG is unavailable as all replicas are down at the same time, and does not become available until the individual unit data restore operation successfully finishes in at least one of the replicas, and a new master replica for that DSG is selected. After a group data restore in a DSG, a reconciliation process takes place in that DSG. See Reconciliation for reconciliation details.
During a PLDB group data restore, the PLDB is unavailable as all replicas are down at the same time. The PLDB does not become available until the unit data restore operation successfully finishes in at least one of the replicas, and a new master replica for the PLDB is selected. As an unavailable PLDB replica takes down the whole node where it is hosted, a PLDB group data restore takes down the whole CUDB system, therefore no service is provided. CUDB nodes come back online as restore operations for the PLDB finish. Master replicas for DSGs are chosen as CUDB nodes come back online and DSG replicas become available. After a PLDB group data restore, a reconciliation process is started for all DSGs in the system. See Reconciliation for reconciliation details.
During the execution of a system data restore, the entire CUDB system is unavailable, therefore no service is provided at all. Once all the individual unit data restore processes finish, new master selections take place, and the system gradually starts providing service again. See CUDB High Availability for more information.
2.4.4 Provisioning Assurance Interactions
If the Provisioning Assurance feature is configured in the CUDB system, and a Provisioning Assurance replay is taking place, the execution of Selective Replica Check, Data Repair and reconciliation tasks will be delayed. These tasks will start when the Provisioning Gateway finishes the reprovisioning operations. For configuring the Provisioning Assurance feature, refer to CUDB High Availability.
| Note: |
A mastership change in DSG
<X>
will interrupt an ongoing reconciliation
task on DSG
<X>
, irrespective of whether Provisioning Assurance is enabled or not. |
2.4.5 Automatic Handling of Network Isolation Interactions
If the Selective Replica Check and Data Repair processes are enabled, the interactions in the following subsections apply. Besides, reconciliation will not be automatically triggered after group data restore for a DSG or PLDB, and after system data restore.
2.4.5.1 Selective Replica Check Interactions
Selective Replica Check will not start if the Provisioning Assurance task is running, and it will only start on the database cluster if it is in ready mode.
Selective Replica Check can run even if the local PLDB replica is down or the node does not have a local PLDB. If there are any other issues that would prevent LDAP queries on the database cluster where Selective Replica Check is started, the Selective Replica Check execution will fail. Do not launch the Backup and Restore activity manually on the database cluster where Selective Replica Check runs, only after the Selective Replica Check is finished and the results are properly logged.
| Note: |
If Backup and Restore is started in any way, it will destroy
the existing operational logs which are required by Selective Replica
Check. |
Execution of an ongoing Selective Replica Check process is gracefully stopped if it exceeds a 24 hour time limit.
2.4.5.2 Data Repair Interactions
The following considerations apply to Data Repair interactions:
2.4.6 Self-Ordered Backup and Restore Interactions
In case the Automatic Handling of Network Isolation function is enabled, the Self-Ordered Backup and Restore process is started if and when a Selective Replica Check subtask has been completed, irrespective of its outcome. The Data Repair and Self-Ordered Backup and Restore tasks run parallel without conflict.
The Self-Ordered Backup and Restore process creates an individual unit data backup on the current master PLDB or DSG replica, and restores it on its local slave replica. For more information about backup and restore interactions, refer to Backup Interactions and Restore Interactions.
Manual Backup and Restore activity will fail on a database cluster if Self-Ordered Backup and Restore is already running.
3 Operation and Maintenance
This section provides information related to the operation and maintenance of the data storage handling features of CUDB.
3.1 Configuration
This section provides information on the configuration of the CUDB data storage and restore features.
3.1.1 Data Store Configuration
Data store management is configured through model-driven actions over the CUDB configuration model. In this model, each CUDB node reflects both of its own detailed configuration, along with an overview of the rest of the CUDB nodes in the system through the CudbLocalNode and CudbRemoteNode Management Object Classes, respectively. This model structure is the starting point of the data store configuration in CUDB. Refer to CUDB Node Configuration Data Model Description for more details on the configuration model.
3.1.2 Data Storage Parameters
As part of the CUDB internal configuration model, the following Management Object Classes are used to manage the DSs:
Refer to CUDB Node Configuration Data Model Description for more details about the CUDB internal configuration model.
3.1.3 System Restore Configuration
Following a successful system data restore, the manual restore of the stored SQL procedures is required. This process requires a proper username and password to access the database clusters needed for the successful SQL procedure restore. Refer to CUDB Backup and Restore Procedures for more information on recovering stored SQL procedures, and CUDB Security and Privacy Management for more information on usernames and passwords.
3.1.4 Miscellaneous Tasks
This section describes miscellaneous tasks related to the operation and maintenance of the data storage handling features.
3.1.4.1 Data Storage Tasks
The miscellaneous data storage tasks associated with data storage handling include the following:
3.1.4.2 Backup and Restore Tasks
The tasks related to the backup and restore procedures are as follows:
3.1.5 Consistency Check Configuration
Consistency Check tasks have configurable parameters which can be specified with the cudbConsistencyMgr command. For more information about cudbConsistencyMgr command, refer to CUDB Node Commands and Parameters.
3.2 Fault Management
This section provides detailed information about the fault management of the CUDB data handling features.
3.2.1 Data Storage Management Alarms
The CUDB system can raise the following alarms during data storage management:
3.2.2 Backup Alarms
The CUDB system can raise the following alarms when performing a backup operation:
3.2.3 Restore Alarms
The CUDB system can raise the following alarms when performing a restore operation:
3.2.4 Reconciliation Alarms
The CUDB system can raise the following alarms when performing a restore operation:
3.2.5 Replica Inconsistency Alarms
The CUDB system can raise the following replica inconsistency related alarms:
3.2.6 Replication Alarms
The CUDB system can raise the following alarms related to replication channel state:
3.2.7 Troubleshooting
Refer to CUDB Troubleshooting Guide for more information on fault management.
3.3 Performance Management
This section provides detailed information on the performance management services related to the data storage handling features.
Data Storage Management Counters
The following cumulative counters are associated to data storage management:
For more information, refer to CUDB Counters List.
3.4 Security
This section contains information on the security measures associated with the data storage handling features of CUDB.
Automatic Handling of Network Isolation
Root permission or cudbadmin group membership is needed to access the Automatic Handling of Network Isolation output logs.
Backup and Restore Security Management
The following security-related information applies to the backup and restore procedures:
| Access Control |
Only root or users in cudbadmin group can access the backup files. |
|
| Protocols |
Backup and restore procedures are executed through Secure Shell (SSH) sessions. The outcome of these operations is also checked through this protocol. |
|
| Roles |
Backup and restore procedures in CUDB can only be executed by users with system administration privileges. The system administration privileges are also required to read and write to the persistent disk storage system of the blades or VMs, as well as to the NFS storage of the CUDB nodes. |
|
| User authentication |
The user setting up an SSH session to the CUDB system is authenticated by means of a username and a password. |
|
| Media |
Files generated during unit backups and system backups are neither ciphered, nor encrypted. When safe-keeping those files to media external to the CUDB system, care must be taken to prevent unauthorized parties accessing the files. |
|
Consistency Check
Root permission is required to run consistency check. Users belonging to the cudbadmin group can run it through sudo (without password).
Export Security Management
| Access Control |
Only root or users in cudbadmin group can access the export files. |
3.5 Logging
This section provides information on the logging services related to the data storage handling features.
3.5.1 Backup and Restore Log Management
In addition to the related specific logs and alarms, the execution of backup and restore commands also generate console output containing useful information for the further management of backup files. This section only describes the most relevant output in more detail: the system data backup output.
System Data Backup Standard Output
When executing a system backup, the following information is produced through the standard output channel of the SSH session, over which the backup order was initiated:
Example 1 shows how the standard output of a system data backup looks like on the shell where the backup command was initiated. The CUDB system for the example below consists of three nodes, where the PLDB master replica resides in Node 1 and runs on four blades or VMs, while the DSG replicas selected for backup reside in Nodes 1-3 respectively, and each of them run on two blades or VMs.
Example 1 Example Output of System Data Backup
cudb_backup ver(1.2.18) BEGIN MANAGEMENT FOR LAST BACKUP Backup 2010-02-24_11-42 finished successfully in : PLDB in CUDB node 1 NDB node PL_2_3 NDB node PL_2_4 NDB node PL_2_5 NDB node PL_2_6 DSG#1 in CUDB node 1 NDB node PL_2_7 NDB node PL_2_8 DSG#2 in CUDB node 2 NDB node PL_2_9 NDB node PL_2_10 DSG#3 in CUDB node 3 NDB node PL_2_11 NDB node PL_2_12
| Note: |
The PL_x_y resources in the output above are the names of the
payload blades or VMs running the units. |
Based on the output above, the NFS media of Node 1 must contain six files (one on each blade or VM for the PLDB master replica, and one on each blade or VM for the DSG1 replica residing in Node 1). An example of the file names can be found below.
| Note: |
PLDB backup file names carry number 0 as the DSG ID value. |
| PLDB backup files: |
BACKUP-2010-02-24_11-42-1.0.3.tar BACKUP-2010-02-24_11-42-1.0.4.tar BACKUP-2010-02-24_11-42-1.0.5.tar BACKUP-2010-02-24_11-42-1.0.6.tar |
|
| DSG1 replica backup files: |
BACKUP-2010-02-24_11-42-2.1.3.tar BACKUP-2010-02-24_11-42-2.1.4.tar |
|
At the same time, the NFS media of Node 2 contains two files (one on each blade or VM for the DSG2 replica residing in Node 2). An example of the file names can be found below.
| DSG2 replica backup files: |
BACKUP-2010-02-24_11-42-3.2.3.tar BACKUP-2010-02-24_11-42-3.2.4.tar |
Finally, the NFS media of Node 3 contains two files as well (one on each blade or VM for the DSG3 replica residing in Node 3). An example of the file names can be found below.
| DSG3 replica backup files: |
BACKUP-2010-02-24_11-42-1.3.3.tar BACKUP-2010-02-24_11-42-1.3.4.tar |
3.5.2 Reconciliation Log Management
The reconciliation procedure logs the following events:
For further information about the logs provided by this procedure, refer to CUDB Node Logging Events.
Reference List
- CUDB LDAP Interwork Description
- CUDB Technical Product Description
- CUDB Data Distribution
- CUDB High Availability
- CUDB Binary Large Object Attributes Management
- CUDB Node Configuration Data Model Description
- CUDB Subscription Reallocation
- CUDB Multiple Geographical Areas
- CUDB Node Commands and Parameters
- CUDB System Administrator Guide
- CUDB Backup and Restore Procedures
- CUDB Consistency Check
- CUDB Automatic Handling of Network Isolation Output Description
- Storage Engine, Execution of Selective Replica Check Failed, PLDB, Major
- Storage Engine, Execution of Selective Replica Check Failed, DS, Major
- Storage Engine, Data Inconsistency between Replicas Repaired, PLDB
- Storage Engine, Data Inconsistency between Replicas Repaired, DS
- Storage Engine, Unrepaired Data Inconsistency between Replicas, PLDB
- Storage Engine, Unrepaired Data Inconsistency between Replicas, DS
- Storage Engine, Unable to Synchronize Cluster in DS, Warning
- Storage Engine, Unable to Synchronize Cluster in PLDB, Warning
- Storage Engine, Unable to Synchronize Cluster in DS, Major
- Storage Engine, Unable to Synchronize Cluster in PLDB, Major
- CUDB Security and Privacy Management
- Server Platform, Blade Replacement
- Virtualized CUDB Virtual Machine Recovery
- Storage Engine, DS Cluster Down
- Storage Engine, PLDB Cluster Down
- Storage Engine, DS Cluster in Maintenance Mode
- Storage Engine, PLDB Cluster in Maintenance Mode
- Storage Engine, Memory Usage Too High In DS, Warning Threshold Reached
- Storage Engine, Memory Usage Too High In DS, Full Threshold Reached
- Storage Engine, Out Of Memory In DS
- Storage Engine, Memory Usage Too High In PLDB, Warning
- Storage Engine, Memory Usage Too High In PLDB, Major
- Storage Engine, Out Of Memory In PLDB
- Storage Engine, High Load In DS
- Storage Engine, High Load In PLDB
- Storage Engine, Backup Fault In PLDB
- Storage Engine, Backup Fault In DS
- Storage Engine, Backup Notification Failure to Provisioning Gateway
- Storage Engine, Restore Fault in PLDB
- Storage Engine, Restore Fault in DS
- Storage Engine, Temporary Data Inconsistency
- Storage Engine, Deleted Data Due to Reconciliation
- Storage Engine, Potential Data Inconsistency between Replicas Found in PLDB
- Storage Engine, Potential Data Inconsistency between Replicas Found in DS
- Storage Engine, Data Inconsistency between Replicas Found in DS, Minor
- Storage Engine, Data Inconsistency between Replicas Found in DS, Major
- Storage Engine, Data Inconsistency between Replicas Found in PLDB, Minor
- Storage Engine, Data Inconsistency between Replicas Found in PLDB, Major
- Storage Engine, Replication Stopped Working in PLDB
- Storage Engine, Replication Stopped Working in DS
- Storage Engine, Replication Channels Down in DS
- Storage Engine, Replication Channels Down in PLDB
- Storage Engine, Replication Delay Too High In DS
- Storage Engine, Replication Delay Too High In PLDB
- Storage Engine, Automatic Handling of Network Isolation not Completed for DS
- Storage Engine, Automatic Handling of Network Isolation not Completed for PLDB
- CUDB Troubleshooting Guide
- CUDB Counters List
- CUDB Node Logging Events
- CUDB Glossary of Terms and Acronyms

Contents