Storage Engine, PLDB Cluster Down
Ericsson Centralized User Database

Contents

1Introduction
1.1Alarm Description
1.2Prerequisites

2

Procedure
2.1Actions for the Local Cluster Is Under Maintenance Operation
2.2Actions for All Management Components of the Local Cluster Are Unreachable
2.3Actions for All Data Nodes Are Unreachable
2.4Actions for All Local Master Replication Servers of This Cluster Are Unreachable If the CUDB System Contains at Least Two CUDB Nodes

Glossary

Reference List

1   Introduction

This instruction concerns alarm handling for the Storage Engine, PLDB Cluster Down alarm.

1.1   Alarm Description

This alarm is raised when the cluster is unable to provide service.

The alarm is issued in the following situations:

Unfortunately the alarm does not state which cause triggered it.

The possible alarm causes and the corresponding fault reasons, fault locations, and impacts are described in Table 1.

Table 1    Alarm Causes

Alarm Cause

Description

Fault Reason

Fault Location

Impact

The local cluster is under maintenance operation.

The local cluster is under maintenance operation.

Due to explicit order, the cluster is under maintenance (data restoring, initializing, stopped or restarting) and thus cannot provide service.

Cluster Supervisors on the System Controllers (SCs).

The cluster cannot provide service until the operation completes.

All management components of the local cluster are unreachable.

All management components of the local cluster are unreachable.

All management components of the local cluster are unable to start or started, but impossible to access both of them.

Management components on the SCs.

The cluster cannot provide service.

All data nodes are unreachable.

All data nodes are unreachable.

The data nodes cannot even start or started, but do not provide service. The fault can have several causes, for example file system consistency errors due to non-graceful shutdown, uncontrolled crash or infrastructure errors.

Data nodes on the payload blades or Virtual Machines (VMs) of the cluster.

The cluster cannot provide service, data redundancy is decreased.

All local master replication servers of this cluster are unreachable if the CUDB system contains at least two CUDB nodes.

All local master replication servers of this cluster are unreachable if the CUDB system contains at least two CUDB nodes.

All local master replication servers of the cluster are unable to start or started, but not visible by the management components.

Master replication servers on the first and second payload blades or VMs of the cluster.

The cluster cannot provide service.

Note:  
An alarm can appear as a result of maintenance activity.

The alarm attributes are listed and explained in Table 2:

Table 2    Alarm Attributes

Attribute Name

Attribute Value

Module

STORAGE-ENGINE

Error Code

6

Timestamp First

Date and time when the alarm was raised for the first time.

Repeated Counter

Number which indicates how many times the alarm was raised.

Timestamp Last

Date and time of the most recent alarm raised.

Resource ID

1.3.6.1.4.1.193.169.1.1.6

Timestamp

Date when the alarm was raised.

Model Description

Cluster down, Storage Engine.

Active Description

Storage Engine (PLDB): Storage Engine is down.

Event Type

4

Probable Cause

546

Severity

Critical

Originating Source IP

Node IP where the alarm was raised.

Sequence Number

Number which indicates the order in which alarms were raised.

For further information about attribute descriptions, refer to CUDB Node Fault Management Configuration Guide, Reference [1].

1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

Before starting this procedure, ensure that you have read the following documents:

1.2.2   Tools

Not applicable.

1.2.3   Conditions

Not applicable.

2   Procedure

This section describes the procedure to follow when this alarm is received.

2.1   Actions for the Local Cluster Is Under Maintenance Operation

If this state is not by intention, contact the next level of maintenance support.

2.2   Actions for All Management Components of the Local Cluster Are Unreachable

Consult with the next level of maintenance support.

2.3   Actions for All Data Nodes Are Unreachable

Restore a previously created backup. For further information about the data backup and restore procedure, refer to CUDB Backup and Restore Procedures, Reference [2].

If the alarm is not cleared automatically after the restore is completed, contact the next level of maintenance support.

2.4   Actions for All Local Master Replication Servers of This Cluster Are Unreachable If the CUDB System Contains at Least Two CUDB Nodes

Restore a previously created backup. For further information about the data backup and restore procedure, refer to CUDB Backup and Restore Procedures, Reference [2].

If the alarm is not cleared automatically after the restore is completed, contact the next level of maintenance support.


Glossary

For the terms, definitions, acronyms, and abbreviations used in this document, refer to CUDB Glossary of Terms and Acronyms, Reference [3].


Reference List

CUDB Documents
[1] CUDB Node Fault Management Configuration Guide.
[2] CUDB Backup and Restore Procedures.
[3] CUDB Glossary of Terms and Acronyms.
Other Ericsson Documents
[4] System Safety Information.
[5] Personal Health and Safety Information.