1 Introduction
This document describes how to recover a Virtual Machine (VM) in an Ericsson Centralized User Data Base (CUDB) node deployed on a cloud infrastructure.
1.1 Description
1.2 Target Groups
This document is intended for system administrators operating CUDB systems. For some of the actions described in the document, cloud administration role is also required. The cloud administrator is the cloud service provider who delivers the cloud service and executes required actions on the cloud infrastructure.
1.4 Typographic Conventions
Typographic Conventions can be found in the following document:
2 Reboot and Rebuild the VM
If a VM is having issues and all applicable CUDB recovery procedures outlined in CUDB Troubleshooting Guide have been performed, but the VM still did not recover, it can be rebooted from the cloud infrastructure (see Reboot the VM). If rebooting does not solve the issue, or if the VM must be reinstalled, the VM can also be rebuilt (see Rebuild the VM).
2.1 Reboot the VM
Perform the following steps to reboot the VM from the cloud infrastructure.
Steps
In case of using Cloud Execution Environment (CEE), follow the below steps to reboot the VM:
- Login to the Atlas Dashboard.
- Select the appropriate project in the Current Project field, then select Project in the View field.
- Choose the Instances category.
- Identify the VM to reboot.
- In the Actions column of the identified VM, select the action Soft Reboot Instance to use graceful shutdown, or the action Hard Reboot Instance to use non-graceful shutdown. Refer to the "Openstack End User Guide" or the "Atlas Dashboard End User Guide" documents of the CEE Customer Product Information (CPI) for more details.
- While the reboot process is ongoing, the Status column of the instance will show REBOOT. Once it becomes ACTIVE, the processes in the VM will begin to start up.
- If the issue still persists after the VM has fully started up, rebuild the VM by performing the steps of Rebuild the VM.
After This Task
2.2 Rebuild the VM
VMs are rebuilt during node installation to ensure the automatic recovery of System Controllers (SCs), or when any VM is reinstalled.
After This Task
| Note: |
After finishing the rebuild procedure, the stored procedures are not restored so it is
recommended to recreate them with the following command:cudbManageStore -p -o
restorestoredprocedures |
Refer to the Atlas Dashboard End User Guide document in the CEE documentation for more information on how to rebuild VMs if using CEE. In case of using a different cloud solution, refer to the solution-specific documentation for more information.
3 Actions in the Case of Infrastructure Activities
This section describes how to prepare the CUDB node and system to gracefully handle and recover from both planned and unplanned activities on the infrastructure level.
In general, upon infrastructure activities including compute host shutdown/reboot, VMs evacuate automatically on remaining functional host(s) in their failure domain. Same behavior is expected in the case of a sudden compute host failure. To check if the VM is evacuated, on the cloud infrastructure, check actions/events under VM information. For more information about failure domains, refer to the Infrastructure Availability for CUDB Systems Deployed on a Cloud Infrastructure section of CUDB High Availability. For more information about the VM evacuation infrastructure related requirements, refer to the Other Requirements section of CUDB Virtual Infrastructure Requirements.
For more information on how to check if compute host is running out of resources or if it is underperforming, refer to the documentation provided by the cloud infrastructure. For example, if the infrastructure is the CEE, refer to the CEE Troubleshooting Guideline in the Cloud Execution Environment CPI.
In case compute host is underperforming because of hardware faults for example, shut down the affected VM(s) until the compute host is recovered or replaced.
When using CEE, if underlying compute host is underperforming, perform the following steps to shut down the affected VM:
Steps
- Login to the Atlas Dashboard.
- Select the appropriate project in the Current Project field, then select Project in the View field.
- Choose the Instances category.
- Identify the VM to shut down.
- In the Actions column of the identified VM, select the action "Shut Off Instance".
Results
Depending on the type of infrastructure activity, if the activity will result, or has already resulted, in shutdown or reboot of:
3.1 One Compute Host Affected
If the infrastructure activity results in a compute host shut down or reboot, no manual activities are needed to prepare the VMs.
To recover evacuated VM(s), perform the following steps:
Steps
- Identify all affected VMs, if they are not known already (see Identify All Affected VMs).
- Prepare the evacuated VMs for operation (see Prepare the VMs for Operation).
3.2 Multiple Compute Hosts Affected
If the infrastructure activity results in multiple compute host shutdown or reboot, it must be ensured that those activities are executed in order that Cloud administration related security infrastructure requirement is respected. During this process, VMs will evacuate automatically on remaining functional host(s) in their failure domain. Refer to the Other Requirements section of CUDB Virtual Infrastructure Requirements for more information about that requirement.
Perform the following steps to recover VM groups:
Steps
- Identify the VM groups and prepare the system for parallel recovery (see Recovery of Multiple VMs in Parallel).
- Prepare the evacuated VMs for operation (see Prepare the VMs for Operation).
3.3 Identify All Affected VMs
To identify the VMs that will be or have already been affected by the infrastructure activity, either do the following:
3.3.1 Identify Affected Infrastructure Segment
If the infrastructure segment has not yet been identified, it can be identified by the infrastructure position of the faulty VMs. To do so, provide the VM instance name(s), the VM instance Universally Unique Identifier(s) (UUIDs), or both to the cloud administrator. This information can be gathered as follows:
In case of using CEE, identify the instance name, the instance UUID, or both as described below:
Once one or both of the above data is available, contact the cloud administrator, and provide them the VM instance name(s), instance UUID(s), or both.
3.3.2 Cloud Administration Actions to Identify Infrastructure Position
After obtaining the VM(s) instance name, instance UUID, or both (as described in Identify Affected Infrastructure Segment), cloud administrators must identify the infrastructure position. Depending on the user interface used, perform the applicable procedure described in Identifying the VM Infrastructure Position in the Case of Using Atlas Dashboard GUI or Identifying the VM Infrastructure Position in the Case of Using OpenStack Command Line Tools.
| Note: |
In the case of using a cloud solution other than CEE, refer
to the solution-specific documentation for more information on how
to identify the infrastructure position. |
3.3.2.1 Identifying the VM Infrastructure Position in the Case of Using Atlas Dashboard GUI
In the case of using the Atlas Dashboard GUI, identify the VM infrastructure position with the following steps:
Steps
- Login to the Atlas Dashboard.
- Choose the Instances category, and search for the instance using the provided instance name.
- Look for the compute host name, which is located left to the name of the provided instance name in the Host column. Ignore the .domain.tld suffix.
3.3.2.2 Identifying the VM Infrastructure Position in the Case of Using OpenStack Command Line Tools
In the case of using OpenStack command line tools, perform the following steps in a Cloud Infrastructure Controller (CIC):
Steps
3.3.3 Cloud Administration Actions to Identify Affected VMs
The cloud administrator can identify the affected VMs on the cloud infrastructure level from the cloud infrastructure segment. To do so, identify which VMs the infrastructure segment hosts.
In case of using CEE, follow the steps below to identify the VMs hosted by the specific infrastructure segment:
| Note: |
In case of using a cloud solution other than CEE, refer to the solution-specific
documentation for more information on how to identify the affected VMs. |
Steps
- Login to the Atlas Dashboard.
- Choose the Compute Environment category, and search for the identified compute host.
- Choose the identified compute host to see the related details.
- Check the instances running on the selected compute host. This information can be found under the Instances view of the opened compute host details.
- Provide the instance names to the requesting tenants.
3.4 Recovery of Multiple VMs in Parallel
3.4.1 VM Groups
Because of the virtualized CUDB node infrastructure deployment on host aggregates, multiple VMs can be affected by an infrastructure activities. An affected infrastructure segment (that is, "compute host") can host either one SC, or one or several payload VMs for one CUDB node.
Because of the deployment rules described in the Virtual Deployment Considerations section of CUDB Deployment Guide, VMs of another virtualized CUDB node can be hosted on the same infrastructure segment, and can therefore also be affected. Take special care while identifying the affected VMs, and execute the recovery steps of Execute Parallel Recovery on all affected virtualized CUDB nodes.
In the CUDB system, VMs are categorized into three distinct groups: SC, PLDB, and DSG. These groups can be further divided into groups of even-numbered and odd-numbered VMs. Considering the deployment of a typical virtualized CUDB node and the applied failure domains, the affected infrastructure segment is likely to host one of the following groups of VMs:
| Note: |
The type of VMs in the last two groups can vary depending
on the infrastructure availability during the deployment, but the
redundancy over the infrastructure must be satisfied because of the
applied failure domains. This means the following: |
In case of a failure in multiple infrastructure segments (that is, more than one group of VMs of the above four groups must be recovered), consider the following rules for recovery:
3.5 Prepare the VMs for Operation
This section describes how to prepare the VMs for operation after the infrastructure activities have been performed.
| Note: |
In case the VM was shut down due to underperforming host
as specified in Actions in the Case of Infrastructure Activities,
it must be powered on before taking any further actions. |
| Note: |
If the VM did not join the cluster automatically after the
infrastructure activity has been performed, refer to the Release Notes
to check if any manual action is needed for the recovery of the network
connectivity of the VMs. Some actions can involve cloud administration. |
3.5.1 Prepare SC VM
| Note: |
When recovering the SC, the SAF, LOTC Disk Replication Consistency Failed alarm might
appear. At the same time, if the evacuation is taking more than 20 minutes to complete,
then the SAF, LOTC Disk Replication Communication Failed alarm might also
appear. These alarms can be expected during the VM recovery procedure on the SC, and
should be automatically cleared when all recovery steps are executed. Refer to the
corresponding alarm OPI for more information on these alarms. |
If the VM to recover is an SC, then perform the following steps once the infrastructure activity has been finished:
Steps
3.5.2 Prepare PLDB or DSG VM
In case replication is not recovered automatically, refer to CUDB Backup and Restore Procedures.
Reference List
- CUDB Troubleshooting Guide
- CUDB Backup and Restore Procedures
- CUDB High Availability
- CUDB Virtual Infrastructure Requirements
- CUDB Virtual Infrastructure Requirements
- CUDB Deployment Guide
- CUDB Node Configuration Data Model Description
- CUDB Users and Passwords 3/00651-HDA 104 03/10
- CUDB Node Commands and Parameters
- CUDB System Administrator Guide
- SAF, LOTC Disk Replication Consistency Failed
- SAF, LOTC Disk Replication Communication Failed
- CUDB Glossary of Terms and Acronyms
- CEE Troubleshooting Guideline

Contents