1 Introduction
This instruction concerns alarm handling for the Control, Messaging Service Server Down alarm.
1.1 Alarm Description
The alarm is issued when a Messaging Service server is down.
The possible alarm causes and the corresponding fault reasons, fault locations, and impacts are described in Table 1.
|
Alarm Cause |
Description |
Fault Reason |
Fault Location |
Impact |
|---|---|---|---|---|
|
The System Controller (SC) hosting a Messaging Service server is down. |
The SC hosting a Messaging Service server instance is down. |
The SC is rebooting or shut down, and cannot provide any service. |
The SC holding the Messaging Service server. |
Messaging Service server double redundancy is decreased, since the system is running with one less Messaging Service server instance. |
|
A Messaging Service server goes down, or becomes unreachable. |
The Messaging Service server process is not running. |
The process has been stopped or killed, and cannot be started. |
The Messaging Service server process running in the SC. | |
|
A Messaging Service server does not provide any service. |
The Messaging Service server process is running, but is unable to provide any service. |
The Messaging Service server process is running, but in an unhealthy state. |
The Messaging Service server process running in the SC. |
The alarm attributes are listed and explained in Table 2.
|
Attribute Name |
Attribute Value |
|---|---|
|
Auto Cease |
Yes |
|
Module |
CONTROL |
|
Error Code |
7 |
|
Timestamp First |
Date and time when the alarm was raised for the first time. |
|
Repeated Counter |
Number which indicates how many times the alarm was raised. |
|
Timestamp Last |
Date and time of the most recent alarm raised. |
|
Resource ID |
1.3.6.1.4.1.193.169.7.7.<IP> |
|
Alarm Model Description |
Messaging Service Server down, Control |
|
Alarm Active Description |
Control: Messaging Service Server <IP> is down, uuid: <UUID> |
|
ITU Alarm Event Type |
processingErrorAlarm (4) |
|
ITU Alarm Probable Cause |
softwareProgramError (546) |
|
ITU Alarm Perceived Severity |
(4) – Major |
|
Originating Source IP |
Node IP where the alarm was raised. |
|
Sequence Number |
Number which indicates the order in which alarms were raised. |
In Table 2, the indicated variables are as follows:
- <IP> is the Internet Protocol (IP) address of the Messaging Service server that is down.
- <UUID> is the universally unique identifier of the computing resource (blade or VM). It is blank if it is not possible to figure out its value.
For more information about Messaging Service, refer to CUDB High Availability, Reference [1].
For further information about attribute descriptions, refer to CUDB Node Fault Management Configuration Guide, Reference [2].
1.2 Prerequisites
This section provides information on the documents, tools and conditions that apply to the procedure.
1.2.1 Documents
Before starting this procedure, ensure that you have read the following documents:
- CUDB Node Fault Management Configuration Guide, Reference [2], regarding alarm configuration.
- CUDB Node Logging Events, Reference [3], regarding logs related to the Messaging Service server.
1.2.2 Tools
Not applicable.
1.2.3 Conditions
Not applicable.
2 Procedure
In case the alarm is raised, do the following:
- Wait for a short time for the alarm to clear. If the alarm clears, no further actions must be taken. If it is not cleared after a short period of time, continue with the next step.
- If the SC hosting the Messaging Server down is rebooting, wait until the reboot finishes. If the alarm clears, no further actions must be taken.
- If the SC is permanently down, alarm SAF, CLM Cluster Node Unavailable must be raised. Refer to SAF, CLM Cluster Node Unavailable, Reference [4] for further information.
- If the SC is not down, access the SC hosting
the Messaging Server down and try to restart the process manually
with the following command:
cudbManageMsgSrvServer restart
- If the problem is not identified, or the alarm does not cease with the measures taken, consult the next level of maintenance support. Further actions are outside the scope of this instruction.
Glossary
For the terms, definitions, acronyms and abbreviations used in this document, refer to CUDB Glossary of Terms and Acronyms, Reference [5].
Reference List
| CUDB Documents |
|---|
| [1] CUDB High Availability. |
| [2] CUDB Node Fault Management Configuration Guide. |
| [3] CUDB Node Logging Events. |
| [4] SAF, CLM Cluster Node Unavailable. |
| [5] CUDB Glossary of Terms and Acronyms. |

Contents