Storage Engine, Automatic Handling of Network Isolation not Completed for PLDB
Ericsson Centralized User Database

Contents

1Introduction
1.1Alarm Description
1.2Prerequisites

2

Procedure
2.1Actions for Selective Replica Check Task Was Not Completed
2.2Actions for Data Repair Task Was Not Completed
2.3Actions for Triggering Reconciliation Task Was Not Completed

Glossary

Reference List

1   Introduction

This document provides the description and troubleshooting steps to take for the Storage Engine, Automatic Handling of Network Isolation not Completed for PDDB alarm.

1.1   Alarm Description

This alarm is raised when the Automatic Handling of Network Isolation process has failed to repair the Processing Layer Database (PLDB) cluster inconsistency between former and current master replica servers.

The alarm is issued in the following situations:

The possible alarm causes and the corresponding fault reasons, fault locations and impacts are described in Table 1.

Table 1    Alarm Causes

Alarm Cause

Description

Fault Reason

Fault Location

Impact

Selective Replica Check task was not completed.

Automatic Handling of Network Isolation process was unsuccessful in repairing PLDB cluster inconsistency between former and current master replica servers.

  • Any issue preventing LDAP queries to be performed on the invoked PLDB Cluster.

  • Any other issue preventing access to the operational logs results in a failed execution.

Slave (Selective Replica Check) replica server.

Rescuing non-replicated data from former master has failed.

Data Repair task was not completed.

Automatic Handling of Network Isolation process was unsuccessful in repairing PLDB cluster inconsistency between former and current master replica servers.

  • Any issue preventing LDAP queries to be performed on the invoked PLDB Cluster.

  • Any other issue resulting in a failed execution.

Master (Data Repair) replica server.

Rescuing non-replicated data from former master has failed.

Triggering Reconciliation task was not completed.

Automatic Handling of Network Isolation process was unsuccessful in adding local DS units that were elected master for their DSG to the Reconciliation Pending Task List.

  • Any issue preventing an update of the Reconciliation Pending Task List.

Master replica server.

No data reconciliation process.

Note:  
An alarm can appear as a result of a maintenance activity.

The following are the consequences for the node if the alarm is not solved:

The alarm attributes are listed and explained in Table 2.

Table 2    Alarm Attributes

Attribute Name

Attribute Value

Auto Cease

No

Module

STORAGE-ENGINE

Error Code

29

Timestamp First

Date and time when the alarm was raised for the first time.

Repeated Counter

Number which indicates how many times the alarm was raised.

Timestamp Last

Date and time of the most recent alarm raise.

Resource ID

.1.3.6.1.4.1.193.169.1.1.29.<Timestamp>

Alarm Model Description

Automatic Handling of Network Isolation not Completed, Storage Engine.

Alarm Active Description

Storage Engine (PLDB): Automatic Handling of Network Isolation task <add_info> was not completed <add_info2> (task <taskid>, blade <Blade>), uuid: <uuid>

ITU Alarm Event Type

processingErrorAlarm (4)

ITU Alarm Probable Cause

softwareError (163)

ITU Alarm Perceived Severity

(4) – Major

Originating source IP

Node IP where the alarm was raised.

Sequence Number

Number which indicates the order in which the alarms are raised.

In Table 2, the indicated variables are as follows:

For more information about attributes description, refer to CUDB Node Fault Management Configuration Guide, Reference [1].

1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

Before starting this procedure, ensure that you have read the following documents:

1.2.2   Tools

Not applicable.

1.2.3   Conditions

Not applicable.

2   Procedure

This section describes the procedure to follow when this alarm is received.

2.1   Actions for Selective Replica Check Task Was Not Completed

Do the following:

  1. If the Storage Engine, Unable to Synchronize Cluster in PLDB, Major alarm is raised when the Self-Ordered Backup and Restore function is not enabled or fails to restore the replication automatically, follow the procedure described in Storage Engine, Unable To Synchronize Cluster In PLDB, Major, Reference [2].
  2. Cease the alarm manually.
    Note:  
    The procedure is to fix data inconsistency among master and slave replicas, but it cannot guarantee the full repair of system data.

2.2   Actions for Data Repair Task Was Not Completed

Do the following:

Cease the alarm manually.

Note:  
Full repair of system data cannot be guaranteed in this case.

2.3   Actions for Triggering Reconciliation Task Was Not Completed

Do the following:

  1. Run the following command to establish an admin "CUDB CLI" session towards the CUDB node where the master for PLDB is:

    ssh <admin_user>@<CUDB_Node_OAM_IP_Address>

    Refer to CUDB System Administrator Guide, Reference [3] on how to list all master DSG replicas.

  2. Run the following command to check if there is any pending or ongoing reconciliation task for the specific DSG(s) from Step 1:

    cudbReconciliationMgr -l

    This command returns the DSG(s) in an affirmative case. Otherwise, it returns nothing. In an affirmative case, exit this procedure. In a negative case, follow with the next step.Refer to CUDB Node Commands and Parameters, Reference [4] for further information about this command.

  3. Schedule reconciliation for the specific DSG(s) from Step 1:

    cudbReconciliationMgr -a <dsId>

    If the task for DSG identified with <dsId> is added, the command has no output. Otherwise, the output provides the error(s) fetched from the database. Refer to CUDB Data Storage Handling, Reference [5] for further information about the reconciliation process.


Glossary

For the terms, definitions, acronyms, and abbreviations used in this document, refer to CUDB Glossary of Terms and Acronyms, Reference [6].


Reference List

CUDB Documents
[1] CUDB Node Fault Management Configuration Guide.
[2] Storage Engine, Unable To Synchronize Cluster In PLDB, Major.
[3] CUDB System Administrator Guide.
[4] CUDB Node Commands and Parameters.
[5] CUDB Data Storage Handling.
[6] CUDB Glossary of Terms and Acronyms.
Other Ericsson Documents
[7] System Safety Information.
[8] Personal Health and Safety Information.