Storage Engine, Unable to Synchronize Cluster in PLDB, Major
Ericsson Centralized User Database

Contents

1Introduction
1.1Alarm Description
1.2Prerequisites

2

Procedure

Glossary

Reference List

1   Introduction

This instruction describes the alarm handling for the Storage Engine, Unable to Synchronize Cluster in PLDB, Major alarm.

1.1   Alarm Description

This alarm is issued when a slave replica is not able to sync data with the Processing Layer Database (PLDB) group master replica.

The alarm is issued in the following situations:

If the CUDB system enters a state in which no master replica can be reached from the current node for the PLDB, then this alarm is cleared automatically, and the Storage Engine, No Available Master Replica for PLDB, Reference [2] alarm is raised.

The possible alarm causes and the corresponding fault reasons, fault locations, and impacts are described in Table 1.

Table 1    Alarm Causes

Alarm Cause

Description

Fault Reason

Fault Location

Impact

Replication information in the master replica has been removed or purged.

Operational log index or operational log files in master replica are missing. Or operational log index is inconsistent.

Operational log index or operational log files in master replica have been removed or purged. Or operational log index table was found inconsistent after starting or restarting master server. Inconsistency could have been caused by a non-graceful stop of the running process or a file system error.

Master replica server.

Loss of geographical redundancy.

There is a mismatch between the local and the remote replication information.

There is a mismatch between the local operational log time stamp and the remote operational log one.

Mastership change during non-replicated transaction (like traffic updates or provisioning) accompanied with reconciliation process on new master replica server. Replication process cannot start once the former master is rejoined (not necessarily working) because the new master server does not have correct (enough) time stamp information.

Both replica servers.

Loss of geographical redundancy.

Slave server has no replication information about the master server.

Slave replica server has no replication information about master replica server.

Daemon serving remote master replica server is missing or has been killed (both instances).

Slave replica server.

Loss of geographical redundancy.

Automatic Handling of Network Isolation was not executed.

It was not possible to execute the Selective Replica Check task, or the Selective Replica Check fails to retrieve all applicable entries.

Rescuing non-replicated data from former master has failed.

Both replica servers.

Loss of geographical redundancy.

Self-Ordered Backup and Restore failed.

It was not possible to restore the replication automatically.

The automatic backup and restore task has failed during either the backup creation, backup transfer or slave replica restore.

Both replica servers.

Loss of geographical redundancy.

Note:  
An alarm can appear as a result of maintenance activity.

The following are the consequences for the node if the alarm is not solved:
Partial or complete loss of geographical redundancy since the alarm was raised. Take into account that complete loss of redundancy occurs for Double Geographical Redundancy or Triple Geographical Redundancy if alarm is raised on both slave replica servers.

The alarm attributes are listed and explained in Table 2.

Table 2    Alarm Attributes

Attribute Name

Attribute Value

Auto Cease

Yes

Module

STORAGE-ENGINE

Error Code

1

Timestamp First

Date and time when the alarm was raised for the first time.

Repeated Counter

Number which indicates how many times the alarm was raised.

Timestamp Last

Date and time of the most recent alarm raised.

Resource ID

.1.3.6.1.4.1.193.169.1.1.1

Alarm Model Description

Unable to synchronize cluster, Storage Engine.

Alarm Active Description

Storage Engine (PLDB): Synchronization to current master impossible. <add_info> (task <taskid>, time <Timestamp> - <DateTime>).

ITU Alarm Event Type

qualityOfServiceAlarm (3)

ITU Alarm Probable Cause

equipmentMalfunction (514)

ITU Alarm Perceived Severity

(4) – Major

Originating Source IP

Node ID where the alarm was raised.

Sequence Number

Number which indicates the order in which alarms were raised.

In Table 2, the indicated variables are as follows:

Note:  
<taskid>, <Timestamp>, and <DateTime> are not shown in case of Self-Ordered Backup and Restore.

For further information about attribute descriptions, refer to CUDB Node Fault Management Configuration Guide, Reference [1].

1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

Before starting this procedure, ensure that you have read the following documents:

1.2.2   Tools

Not applicable.

1.2.3   Conditions

Not applicable.

2   Procedure

If the Storage Engine, Unable to Synchronize Cluster in PLDB, Major alarm is cleared automatically, check if the Storage Engine, No Available Master Replica for PLDB alarm is raised. If yes, follow the procedure in Storage Engine, No Available Master Replica for PLDB, Reference [2].

If the alarm is not cleared automatically in a short period of time, perform the following steps:

  1. Perform a new backup in the master PLDB of the CUDB node where the master replica is located and restore it in the faulty CUDB node. To find out where the master replicas are, refer to CUDB System Administrator Guide, Reference [3]. For further information about the data backup and restore procedure, refer to CUDB Backup and Restore Procedures, Reference [4].

    Check if there is more than one node in the same site, and if all of those nodes have Storage Engine, Unable to Synchronize Cluster in PLDB, Major alarm. If the alarm is present in all the nodes within one site, recovery must be done node by node. Continue with the second node when the first node is recovered and replication is up and running, and so on.

    Take into account that replication starts automatically after a successful restore.

  2. If the alarm is not ceased, consult the next level of maintenance support. Further actions are outside the scope of this Operating Instruction.

Glossary

For the terms, definitions, acronyms, and abbreviations used in this document, refer to CUDB Glossary of Terms and Acronyms, Reference [5].


Reference List

CUDB Documents
[1] CUDB Node Fault Management Configuration Guide.
[2] Storage Engine, No Available Master Replica for PLDB.
[3] CUDB System Administrator Guide.
[4] CUDB Backup and Restore Procedures.
[5] CUDB Glossary of Terms and Acronyms.
Other Ericsson Documents
[6] System Safety Information.
[7] Personal Health and Safety Information.


Copyright

© Ericsson AB 2017. All rights reserved. No part of this document may be reproduced in any form without the written permission of the copyright owner.

Disclaimer

The contents of this document are subject to revision without notice due to continued progress in methodology, design and manufacturing. Ericsson shall have no liability for any error or damage of any kind resulting from the use of this document.

Trademark List
All trademarks mentioned herein are the property of their respective owners. These are shown in the document Trademark Information.

    Storage Engine, Unable to Synchronize Cluster in PLDB, Major         Ericsson Centralized User Database