SAF, LOTC Time Synchronization Failed
Ericsson Centralized User Database

Contents

1Introduction
1.1Alarm Description
1.2Prerequisites

2

Procedure

Glossary

Reference List

1   Introduction

This instruction concerns alarm handling for the SAF, LOTC Time Synchronization Failed alarm.

1.1   Alarm Description

This alarm is related to Service Availability Forum (SAF), refer to LOTC Time Synchronization, Reference [3] for more information.

The alarm is issued when the Network Time Protocol (NTP) server(s) cannot be contacted or if the local time is off by more than the threshold value of 10 seconds.

The alarm has the following severity levels:

Depending on severity, the possible alarm causes and the corresponding fault reasons, fault locations, and impacts are described in Section 1.1.1.

Depending on severity, the alarm attributes are listed and explained in Section 1.1.2.

1.1.1   Alarm Causes

Minor severity alarm causes are listed in Table 1:

Table 1    Minor Severity Alarm Causes

Alarm Cause

Description

Fault Reason

Fault Location

Impact

Not possible to contact one of the configured NTP servers configured in cluster.conf from a System Controller (SC).

Connectivity from the SCs to the external NTP service cannot be established.

Network infrastructure misconfiguration.

Network infrastructure

Loss of NTP service redundancy

Internal CUDB node network does not allow communication.

Network infrastructure

External NTP server is not available.

External NTP server

Network between the CUDB node and the external NTP server does not allow communication between the two endpoints.

Datacenter network

Not possible to contact one of the SC NTP servers from a payload blade or Virtual Machine (VM).

Connectivity from the SCs to the SC NTP service cannot be established.

Internal CUDB node network does not allow communication.

Network infrastructure

NTP server in an SC blade is not available, or the SC blade is not available

SC

Major severity alarm causes are listed in Table 2:

Table 2    Major Severity Alarm Causes

Alarm Cause

Description

Fault Reason

Fault Location

Impact

Unusable: The NTP servers provided in cluster.conf can not be used by the local NTP daemon, ntpd.

Server is listed in NTP configuration (/etc/ntp.conf), but is not reported in the list of peers provided by the ntpq -p command.

Server name can not be resolved into IP address.

CUDB configuration

Loss of NTP service redundancy

Rejected: None of the configured NTP servers can be selected as a current time source (rejected at initial selection).

The time server could not be selected after 60 minutes from start.

NTP protocol algorithm declares a rejected server for the following reasons:


  • The external server is detected as "insane" as the provided time is too different from the current one.(1)

  • Both SCs have set each other as source as no other NTP source is available.

  • The jitter of the server is perceived as too high by the selection algorithm in the local NTP server.

NTP server or external network

Rejected: None of the configured NTP servers can be selected as a current time source (rejected at reselection).

The time server selection was successful, but the NTP daemon was restarted and the reselection process takes longer than 90 seconds.

Unreachable: Not possible to contact any of the configured NTP servers configured in cluster.conf from an SC.

Connectivity from the SCs to the external NTP service cannot be established.

Network infrastructure misconfiguration.

Network infrastructure

Risk of losing consistent time reference

Internal CUDB node network does not allow communication.

Network infrastructure

External NTP server is not available.

External NTP server

Network between the CUDB node and the external NTP server does not allow communication between the two endpoints.

Datacenter network

Not possible to contact any of the SC blade NTP servers from a payload blade.

Connectivity from the SCs to the SC NTP service can not be established.

Internal CUDB node network does not allow communication between the location of the raised alarm and the SCs.

Network infrastructure

NTP server in both SCs are not available.

SC

(1)  This occurs when the time difference from the local and remote servers is bigger than 1000 seconds.


Critical severity alarm causes are listed in Table 3:

Table 3    Critical Severity Alarm Causes

Alarm Cause

Description

Fault Reason

Fault Location

Impact

The time difference between the local system time and the remote time server is greater than the alarm threshold (10 seconds), but smaller than the insane threshold (1000 seconds).

Between 10 and 1000 seconds time difference between the SC time and the external NTP reference.

Time jump in the external NTP server.

External NTP server

Inaccurate system time

The connection towards external NTP servers is re-established after a period of non-connectivity.

Internal CUDB node network, network infrastructure

Datacenter network

Between 10 and 1000 seconds time difference between the payload blade or VM time and the SC time.

Time jump in the SC NTP server.

SC NTP server

The connection towards SC NTP servers is re-established after a period of non-connectivity.

Internal CUDB node network, network infrastructure

1.1.2   Alarm Attributes

Minor severity alarm attributes are listed in Table 4:

Table 4    Minor Severity Alarm Attributes

Attribute Name

Attribute Value

Auto Cease

Yes

Module

SAF

Error Code

11

Timestamp First

Date and time when the alarm was raised for the first time.

Repeated Counter

Number which indicates how many times the alarm was raised.

Timestamp Last

Date and time of the most recent alarm raise.

Resource ID

.1.3.6.1.4.1.193.169.9.5.<length>.<NOI>

Alarm Model Description

LOTC Time Synchronization, SAF

Alarm Active Description

SAF platform: LOTC Time Synchronization, minor, @<NON>

ITU Alarm Event Type

other (1)

ITU Alarm Probable Cause

timingProblemX733 (550)

ITU Alarm Perceived Severity

(5) – Minor

Originating Source IP

Node IP where the alarm was raised.

Sequence Number

Number which indicates the order in which alarms were raised.

Major severity alarm attributes are listed in Table 5:

Table 5    Major Severity Alarm Attributes

Attribute Name

Attribute Value

Auto Cease

Yes

Module

SAF

Error Code

12

Timestamp First

Date and time when the alarm was raised for the first time.

Repeated Counter

Number which indicates how many times the alarm was raised.

Timestamp Last

Date and time of the most recent alarm raise.

Resource ID

.1.3.6.1.4.1.193.169.9.5.<length>.<NOI>

Alarm Model Description

LOTC Time Synchronization, SAF

Alarm Active Description

SAF platform: LOTC Time Synchronization, major, @<NON>

ITU Alarm Event Type

other (1)

ITU Alarm Probable Cause

timingProblemX733 (550)

ITU Alarm Perceived Severity

(4) – Major

Originating Source IP

Node IP where the alarm was raised.

Sequence Number

Number which indicates the order in which alarms were raised.

Critical severity alarm attributes are listed in Table 6:

Table 6    Critical Severity Alarm Attributes

Attribute Name

Attribute Value

Auto Cease

Yes

Module

SAF

Error Code

5

Timestamp First

Date and time when the alarm was raised for the first time.

Repeated Counter

Number which indicates how many times the alarm was raised.

Timestamp Last

Date and time of the most recent alarm raise.

Resource ID

.1.3.6.1.4.1.193.169.9.5.<length>.<NOI>

Alarm Model Description

LOTC Time Synchronization, SAF

Alarm Active Description

SAF platform: LOTC Time Synchronization, critical, @<NON>

ITU Alarm Event Type

other (1)

ITU Alarm Probable Cause

timingProblemX733 (550)

ITU Alarm Perceived Severity

(3) – Critical

Originating Source IP

Node IP where the alarm was raised.

Sequence Number

Number which indicates the order in which alarms were raised.

In Table 4, Table 5, and Table 6, the indicated variables are as follows:

For more information about attribute descriptions, refer to CUDB Node Fault Management Configuration Guide, Reference [1].

1.2   Prerequisites

This section provides information on the documents, tools, and conditions that apply to the procedure.

1.2.1   Documents

Before starting this procedure, ensure that you have read the following documents:

1.2.2   Tools

Not applicable.

1.2.3   Conditions

Not applicable.

2   Procedure

If the alarm is raised, do the following:

  1. Follow the instructions specified in LOTC Time Synchronization, Reference [3].
  2. If the alarm does not cease, contact the next level of maintenance support. Further actions are outside the scope of this Operating Instruction.

Glossary

For the terms, definitions, acronyms, and abbreviations used in this document, refer to CUDB Glossary of Terms and Acronyms, Reference [2].


Reference List

CUDB Documents
[1] CUDB Node Fault Management Configuration Guide.
[2] CUDB Glossary of Terms and Acronyms.
Other Ericsson Documents
[3] LOTC Time Synchronization.
[4] System Safety Information.
[5] Personal Health and Safety Information.