1 Introduction
This instruction concerns alarm handling for the Storage Engine, Unrepaired Data Inconsistency Between Replicas, DS alarm.
1.1 Alarm Description
This alarm is raised when the Data Repair procedure was invoked for the current Data Store Unit Group (DSG) master replica, but some of the inconsistencies between the current and the former master replicas have not been successfully repaired.
The alarm is issued in the following situations:
- A Data Repair task has been completed, and there are unrepaired entries recorded in the output log.
The possible alarm causes and the corresponding fault reasons, fault locations, and impacts are described in Table 1.
|
Alarm Cause |
Description |
Fault Reason |
Fault Location |
Impact |
|---|---|---|---|---|
|
Some of the detected data inconsistencies between the current and former DSG master replicas could not be repaired. |
Data Repair was executed with the identified LDAP entries that might not have been correctly replicated between the former and current master replicas, and it failed to repair some of these entries. These LDAP entries are recorded in the unrepaired log. |
The repair of an entry may fail due to different reasons. These reasons are stated in the unrepaired log. The possible causes are that Data Repair was not able to repair at least one LDAP entry stored in the DSG due to one of the following reasons:
|
Current and former DSG master replicas. |
Some provisioning and traffic data updates may be missing in CUDB. |
Incident timestamp refers to the time when the network incident, for example a network split or a DSG mastership change, happened in the CUDB system. For more information, refer to CUDB Data Storage Handling, Reference [1].
CDC means Collision Detection Counter, refer to CUDB LDAP Interwork Description, Reference [2] for more information.
The following are the consequences for the node if the alarm is not acted upon:
- Traffic updates or provisioning data from the former master (non-replicated transactions due to mastership change or a network incident) may be lost in CUDB, which may have a service impact for certain subscribers.
The alarm attributes are listed and explained in Table 2.
|
Attribute Name |
Attribute Value |
|---|---|
|
Auto Cease |
No |
|
Module |
STORAGE-ENGINE |
|
Error Code |
26 |
|
Time |
Date when the alarm was raised. |
|
Resource ID |
.1.3.6.1.4.1.193.169.1.2.25.<DG>.<TIMESTAMP> |
|
Alarm Model Description |
Unrepaired Data Inconsistency Between Replicas, Storage Engine |
|
Alarm Active Description |
Storage Engine (DS-Group #DG): Unrepaired data inconsistency between replicas, major (task <TASKID>, blade <BLADE>) |
|
ITU Alarm Event Type |
processingErrorAlarm (4) |
|
ITU Alarm Probable Cause |
databaseInconsistency (160) |
|
ITU Alarm Perceived Severity |
(4) - Major |
|
Originating Source IP |
Node IP where the alarm was raised. |
In Table 2, the indicated variables are as follows:
- <DG> is the DSG the DS cluster belongs to.
- <TIMESTAMP> is an integer representing the seconds since the Unix epoch when the Data Repair task was started.
- <BLADE> is the CUDB blade or Virtual Machine (VM) identifier the replica is located at.
- <TASKID> is the identifier of the repair task.
For further information about attribute descriptions, refer to CUDB Node Fault Management Configuration Guide, Reference [3]. The alarm must be cleared manually.
For the interpretation of the unrepaired logs, refer to CUDB Automatic Handling of Network Isolation Output Description, Reference [4].
1.2 Prerequisites
This section provides information on the documents, tools, and conditions that apply to the procedure.
1.2.1 Documents
Before starting this procedure, ensure that you have read the following documents:
- CUDB Node Fault Management Configuration Guide, Reference [3], regarding alarm configuration.
- System Safety Information, Reference [6].
- Personal Health and Safety Information, Reference [7].
1.2.2 Tools
Not applicable.
1.2.3 Conditions
Not applicable.
2 Procedure
Do the following:
- Locate and identify the unrepaired log based on the <BLADE> and <TASKID> parameters
in the alarm as follows:
- Log in to the alarm originator CUDB node.
- Search for the file(s) /local2/cudb/ahsi/replica_repair/datarepair_TASKID_unrepaired_*.ldif.gz on the blade or VM <BLADE>.
It is preferred to copy these files from the node to an external machine for further analysis if needed.
- Note:
- The files must be transferred and stored appropriately, as they may contain confidential subscriber data.
- Unrepaired data inconsistencies may cause issues for the application Front Ends (FEs), owners of those data. Consult the appropriate application FE troubleshooting documentation for handling the effects of "Network Isolation in CUDB".
Glossary
For the terms, definitions, acronyms, and abbreviations used in this document, refer to CUDB Glossary of Terms and Acronyms, Reference [5].
Reference List
| Other Ericsson Documents |
|---|
| [6] System Safety Information. |
| [7] Personal Health and Safety Information. |

Contents