Operating Instructions 4/1543-CSH 109 067/10 Uen M

Server Platform, Blade Replacement

Contents


1 Introduction

This document describes how to replace a blade in an Ericsson Centralized User Data Base (CUDB) node deployed on native BSP 8100.

1.1 Description

This Operating Instruction (OPI) describes how to replace a blade in a CUDB node. A blade replacement must be performed either because of blade fault, or due to a hardware upgrade.

1.2 Revision Information

Rev. A

Rev. B

Rev. C

Rev. D

Rev. E

Rev. F

Rev. G

Rev. H

Rev. J

Rev. K

Rev. L

Rev. M

Other than editorial changes, this document has been revised as follows:

1.3 Typographic Conventions

Typographic Conventions can be found in the following document:

2 Prerequisites

3 Replacing a Blade

This section describes how to identify a faulty blade, how to perform a blade replacement in a CUDB node, and how to prepare the replacement blade for operation.

Also, further actions after physical board replacement are described.

Attention!

If a System Controller (SC) blade is replaced and, as a result of the replacement, the partition type is different in both SCs, a future upgrade may fail so blade replacement procedure must be performed in the other SC blade, as well, to change its partition type. This board does not have to be physically replaced, so MAC addresses are not modified. This situation could only occur in CUDB nodes that were installed with old releases using partition type msdos.

Note: In the case of GEP5 SC blade replacement with Generic Ericsson Processor version 7, Low Power (GEP7L) boards, both controllers must be GEP7L. Mixed GEP5/GEP7L scenarios are not allowed on SCs. In the case of replacing a single GEP7L blade with another GEP7L blade, it can be done individually on a single SC.

3.1 Identifying the Faulty Blade

3.1.1 Identifying the Blade Name

Perform the following steps to identify a faulty blade in a CUDB node:

Steps

  1. Establish an Secure Shell (SSH) session towards the target CUDB node with the following command:
    ssh root@<CUDB_Node_OAM_VIP_Address>
    This session is established to the first or second (SC, that is either to SC_2_1 or SC_2_2.
    Refer to CUDB Users and Passwords for more information on the default root password.

Attention!

If a Data Store Unit Group (DSG) master replica is degraded due to hardware or software failure, the system selects a new master for the DSG provided that at least one other slave replica of the DSG is available, not degraded and with a replication delay below than 3 seconds; otherwise smooth mastership change will not happen until the previous conditions are met. Therefore, if the master remains in the degraded replica, the blade replacement must be performed in low traffic periods.Note also that if Automatic Mastership Change (AMC) is disabled, mastership will not be returned to preferred location until AMC is enabled or mastership change is done manually.

  1. Several methods are available to identify an SC, a DSG, or a PLDB in the /cluster/etc/cluster.conf file. An example is provided below.
    For instance, below is a defined node containing 2 SCs, and 8 payload blades:
    node 1 control SC_2_1
    node 2 control SC_2_2
    node 3 payload PL_2_3
    node 4 payload PL_2_4
    node 5 payload PL_2_5
    node 6 payload PL_2_6
    node 7 payload PL_2_7
    node 8 payload PL_2_8
    node 9 payload PL_2_9
    node 10 payload PL_2_10
    
    
    host all 10.22.0.1 OAM1
    host all 10.22.0.2 OAM2
    host all 10.22.0.3 PL0
    host all 10.22.0.4 PL1
    host all 10.22.0.5 DS1_0
    host all 10.22.0.6 DS1_1
    host all 10.22.0.7 DS2_0
    host all 10.22.0.8 DS2_1
    ..............
    

    The below list provides some example scenarios for various failing blades:

    • In case the failing blade is Blade 7, with the IP of 10.22.0.7 (note the last octet), then pay attention to the following two lines of the cluster.conf file:

      node 7 payload PL_2_7
      host all 10.22.0.7 DS1_0
      
      In the above lines, PL_2_7 is the name of the payload blade, the blade number is "7", while the identification number of the associated DS is "1" (DS1_0).
    • In case the failing blade is Blade 3 with the IP of 10.22.0.3 (note the last octet), then pay attention to the following two lines of the cluster.conf file:

      node 3 payload PL_2_3
      host all 10.22.0.3 PL0
      

      In the above lines, PL_2_3 is the name of the payload blade, "3" is the blade number, while the Processing Layer ID is PL0. In case the blade in the cluster needs reboot, the following command must be used:

      cluster reboot --node 3

    • In case the failing blade is Blade 1 with the IP of 10.22.0.1 (note the last octet), then pay attention to the following two lines of the cluster.conf file:

      node 1 payload SC_2_1
      host all 10.22.0.1 OAM1
      

      In the above lines SC_2_1 is the name of the blade, "1" is the blade number, while the SC ID is OAM1.

3.1.2 Identifying Blade Rack and Subrack Position

To identify the physical blade that has to be replaced, do the following:

Steps

  1. Establish a BSP CLI session:
    ssh advanced@<BSP-NBI-SCX> -p2024
  2. Enter the following commands:
    show-table ManagedElement=1,DmxcFunction=1,Eqm=1,VirtualEquipment=cudb -m Blade -p bladeId,userLabel

Results

The output must be similar to the following example:

=======================
| bladeId | userLabel                               |
=======================
| 0-1      | SC-1                                   |
| 0-11     | PL-6                                   |
| 0-13     | PL-7                                   |
| 0-15     | PL-8                                   |
| 0-17     | PL-9                                   |
| 0-19     | PL-10                                  |
| 0-21     | PL-11                                  |
| 0-23     | PL-12                                  |
| 0-3      | SC-2                                   |
| 0-5      | PL-3                                   |
| 0-7      | PL-4                                   |
| 0-9      | PL-5                                   |
| 1-1      | PL-13                                  |
| 1-11     | PL-18                                  |
| 1-13     | PL-19                                  |
| 1-15     | PL-20                                  |
| 1-17     | PL-21                                  |
| 1-19     | PL-22                                  |
| 1-21     | PL-23                                  |
| 1-23     | PL-24                                  |
| 1-3      | PL-14                                  |
| 1-5      | PL-15                                  |
| 1-7      | PL-16                                  |
| 1-9      | PL-17                                  |
| 2-1      | PL-25                                  |
| 2-11     | PL-30                                  |
| 2-13     | PL-31                                  |
| 2-15     | PL-32                                  |
| 2-17     | PL-33                                  |
| 2-19     | PL-34                                  |
| 2-21     | PL-35                                  |
| 2-23     | PL-36                                  |
| 2-3      | PL-26                                  |
| 2-5      | PL-27                                  |
| 2-7      | PL-28                                  |
| 2-9      | PL-29                                  |
=======================
Note: LDE and BSP 8100 naming conventions are slightly different, so SC_2_1 on LDE level equals to SC-1 on BSP 8100 and so on.

The bladeId identifies the blade position in the rack, the first number meaning the subrack and the second meaning the slot within the subrack. For example, PL-14 is in the third slot of subrack 1.

3.1.3 Identifying Blade Hardware Type and Board Revision

To identify the blade hardware type and revision, do the following:

Steps

  1. Establish a BSP CLI session:

    ssh advanced@<BSP-NBI-SCX> -p2024

  2. Enter the following commands:

    show ManagedElement=1,SystemFunctions=1,HwInventory=1,HwItem=blade:<bladeId>,productIdentity

    Note: <bladeId> is the physical position of the blade and can be obtained from the output of the command in Identifying Blade Rack and Subrack Position.

Results

The output must be similar to the following example:

GEP3

productIdentity="ROJ 208 840/3"
     productDesignation="GEP3-HD300"
     productRevision="R4B"

GEP5

 productIdentity="ROJ 208 868/5"
     productDesignation="GEP5-64-400"
     productRevision="R2A"

GEP7

productIdentity="ROJ208864/7"
     productDesignation="GEP7L-64-X16"
     productRevision="R1B"

3.2 Preparing the Blade Replacement

Perform the following steps to prepare the blade replacement:

Note: In the below commands, <name> and <blade> are used to identify blades, where:
  • <blade> is a numeric identifier, for example in SC_2_1 <blade> is 1, in PL_2_3 <blade> is 3.

  • <name> is the controller name (SC_2_<blade>) or the payload blade name (PL_2_<blade>).

Steps

  1. Establish an SSH session towards the target CUDB node with the following command:
    ssh root@<CUDB_Node_OAM_VIP_Address>
    This session is established to the first or second SC, that is either to SC_2_1 or SC_2_2.
    For more information on the default root password refer to CUDB Users and Passwords.
  2. Lock the blade at SAF level with the following command:
    cmw-node-lock <name>
  3. Check if the specific blade is locked at SAF level with the following command:
    cmw-status -v node
    The output must be similar to the following example (assuming PL-7 is locked):
    PL.7 adminState should be LOCKED
     safAmfNode=PL-10,safAmfCluster=myAmfCluster
     AdminState=UNLOCKED(1)
     OperState=ENABLED(1)
     safAmfNode=PL-11,safAmfCluster=myAmfCluster
     AdminState=UNLOCKED(1)
     OperState=ENABLED(1)
     safAmfNode=PL-12,safAmfCluster=myAmfCluster
     AdminState=UNLOCKED(1)
     OperState=ENABLED(1)
     safAmfNode=PL-3,safAmfCluster=myAmfCluster
     AdminState=UNLOCKED(1)
     OperState=ENABLED(1)
     safAmfNode=PL-4,safAmfCluster=myAmfCluster
     AdminState=UNLOCKED(1)
     OperState=ENABLED(1)
     safAmfNode=PL-5,safAmfCluster=myAmfCluster
     AdminState=UNLOCKED(1)
     OperState=ENABLED(1)
     safAmfNode=PL-6,safAmfCluster=myAmfCluster
     AdminState=UNLOCKED(1)
     OperState=ENABLED(1)
     safAmfNode=PL-7,safAmfCluster=myAmfCluster
     AdminState=LOCKED(1)
     OperState=ENABLED(1)
     safAmfNode=PL-8,safAmfCluster=myAmfCluster
     AdminState=UNLOCKED(1)
     OperState=ENABLED(1)
     safAmfNode=PL-9,safAmfCluster=myAmfCluster
     AdminState=UNLOCKED(1)
     OperState=ENABLED(1)
     safAmfNode=SC-1,safAmfCluster=myAmfCluster
     AdminState=UNLOCKED(1)
     OperState=ENABLED(1)
     safAmfNode=SC-2,safAmfCluster=myAmfCluster
     AdminState=UNLOCKED(1)
     OperState=ENABLED(1)
  4. Make a backup of the rpm.conf file of the blade to be replaced as follows:
    1. Make a copy of the file, and rename it to rpm.conf_FULL with the following command:
      cp /cluster/nodes/<blade>/etc/rpm.conf /cluster/nodes/<blade>/etc/rpm.conf_FULL
      The rpm.conf_FULL is now contains all entries of the original rpm.conf file.
    2. Overwrite the contents of the original rpm.conf file, so that it contains only the ldews-control or ldews-payload entries (depending on the type of blade to replace). Use the following command to do so:
      grep -ia 'ldews\|linux' /cluster/nodes/<blade>/etc/rpm.conf_FULL > /cluster/nodes/<blade>/etc/rpm.conf
  5. Replace the blade as described in Replacing GEP Boards.
    Note: When replacing SC, alarm SAF, LOTC Disk Replication Consistency Failed, might appear. If physical replacement is taking more than 20 minutes, alarm SAF, LOTC Disk Replication Communication Failed, might appear. These alarms are expected during blade replacement procedure on SC and must be automatically cleared when all replacement steps are executed. For more information, please follow the corresponding alarm OPI.

3.3 Replacing GEP Boards

Refer to the Replace Device (GEP) Blade section of the Manage Blade document in the BSP 8100 CPI for detailed information on the procedure required to physically replace a blade.

3.4 CUDB Node Configuration Changes

This section describes the configuration changes to perform in a CUDB node in case blade replacement is needed.

Note: It is recommended to make a backup of the file to be modified.

3.4.1 Obtaining MAC Addresses for the New Blade

The MAC addresses are used as input to create the cluster.conf file, which is used by LDE. The MAC addresses are also required to configure the Jumpstart server before installing LDE on the SCs, as well for the blade replacement procedure.

The MAC addresses are fetched through the BSP CLI. This MAC is the MAC base, used to obtain the MAC addresses necessary to complete the cluster.conf file generation.

To obtain the MAC addresses, do the following:

Steps

  1. Establish a BSP CLI session:
    ssh advanced@<BSP-NBI-SCX> -p2024
  2. Execute the following commands to show the MAC addresses:
    show-table ManagedElement=1,DmxcFunction=1,Eqm=1,VirtualEquipment=cudb -m Blade -p bladeId,firstMacAddr
    The output must be similar to the below example:
    ===============================
    | bladeId | firstMacAddr      |
    ===============================
    | 0-1     | 90:55:AE:3A:CB:1D |
    | 0-11    | 90:55:AE:3A:CA:5D |
    | 0-13    | 90:55:AE:3A:C9:CD |
    | 0-15    | 90:55:AE:3A:CA:75 |
    | 0-17    | 90:55:AE:3A:C9:9D |
    | 0-19    | 90:55:AE:3A:CA:ED |
    | 0-21    | 90:55:AE:3A:CB:AD |
    | 0-23    | 90:55:AE:3A:CD:5D |
    | 0-3     | 90:55:AE:3A:C9:55 |
    | 0-5     | 90:55:AE:3A:CA:15 |
    | 0-7     | 90:55:AE:3A:C9:FD |
    | 0-9     | 90:55:AE:3A:C9:25 |
    | 1-1     | 90:55:AE:3A:B0:7D |
    | 1-11    | 90:55:AE:3A:BF:C5 |
    | 1-3     | 90:55:AE:3A:C1:45 |
    | 1-5     | 90:55:AE:3A:BF:05 |
    | 1-7     | 90:55:AE:3A:BF:1D |
    | 1-9     | 90:55:AE:3A:BF:35 |
    ===============================

3.4.1.1 Obtaining All MAC Addresses

The MAC shown for each shelf slot in Obtaining MAC Addresses for the New Blade is the base MAC. All the MACs can be obtained by adding a number to the <base mac> , in accordance to the following tables. tbl-mac-addresses-relation applies to BSP 8100 (GEP3) boards, tbl-mac-address-relation-gep5 applies to BSP 8100 (GEP5) boards and Table 3 applies to BSP 8100 (GEP7L) boards.

Table 1   MAC Address Relation to GEP3 Boards

Address

Resulting MAC(1)

 

<BASE MAC> + 1

eth3

Left SCX Backplane Port

<BASE MAC> + 2

eth4

Right SCX Backplane Port

<BASE MAC> + 3

eth2

ETH-Debug Front Port

<BASE MAC> + 5

eth0

ETH-0 Front Port

<BASE MAC> + 6

eth1

ETH-1 Front Port

<BASE MAC> + 8

eth5

Left SCX 10GbE Backplane Port

<BASE MAC> + 9

eth6

Right SCX 10GbE Backplane Port

(1) The resulting MAC must be in hexadecimal format.
Table 2   MAC Address Relation to GEP5 Boards

Address

Resulting MAC(2)

 

<BASE MAC> + 1

eth3

Left SCX 1GbE Backplane Port

<BASE MAC> + 2

eth4

Right SCX 1GbE Backplane Port

<BASE MAC> + 3

eth2

ETH-Debug Front Port

<BASE MAC> + 5

eth5

Left SCX 10GbE Backplane Port

<BASE MAC> + 6

eth6

Right SCX 10GbE Backplane Port

<BASE MAC> + 8

eth0

ETH-0 Front Port

<BASE MAC> + 9

eth1

ETH-1 Front Port

(2) The resulting MAC must be in hexadecimal format.
Table 3   MAC Address Relation to GEP7L Boards

Address

Resulting MAC(3)

 

<BASE MAC> + 1

eth3

Left SCX Backplane Port

<BASE MAC> + 2

eth4

Right SCX Backplane Port

<BASE MAC> + 7

eth5

Left SCX 10GbE Backplane Port

<BASE MAC> + 8

eth6

Right SCX 10GbE Backplane Port

(3) The resulting MAC must be in hexadecimal format.
Note: Ports ETH-0 and ETH-1 are enabled only during the initial software installation phase from the Jumpstart server. After the LDE is installed on the blade, they remain disabled and cannot be used.

3.4.2 Obtaining Board Revision for the New Blade

In case of SC replacement in BSP 8100 systems with GEP3 hardware, perform the procedure below to check the product revision of the new blade:

Steps

  1. Establish a BSP NBI CLI session:
    ssh advanced@<BSP-NBI-SCX> -p2024 -t -s cli
  2. Execute the following command to show the blade hardware revisions:
    show ManagedElement=1,SystemFunctions=1,HwInventory=1,HwItem=blade:<bladeId>,productIdentity
    The expected output must be similar to the below example:
    productIdentity="ROJ 208 840/3"
    productDesignation="GEP3-HD300"
    productRevision="R4B"
    
    Note: <bladeId> is the physical position of the blade and can be obtained from the output of the command in Identifying Blade Rack and Subrack Position.

3.4.3 Editing the LDE installation.conf File

Edit the installation.conf file only in case blade is replaced with GEP3 or GEP7L hardware.

3.4.3.1 Editing the LDE installation.conf File in Case of GEP3 Hardware

In case of SC replacement in BSP 8100 systems with GEP3 hardware, perform the procedure below to edit the installation.conf file.

Steps

  1. Establish an SSH session towards the target CUDB node with the following command:
    ssh root@<CUDB_Node_OAM_VIP_Address>
    This session is established to the first or second SC, either to SC_2_1 or SC_2_2.
    Refer to CUDB Users and Passwords for more information on the default root password.
  2. Locate the installation.conf file in the following directory:
    /cluster/etc/installation.conf
  3. Edit the file and set parameter value depending on the hardware revision of the new blade obtained in Obtaining Board Revision for the New Blade:
    • If it is lower than R9A, use the following value:

      disk_device_path=/dev/sdb

    • If it is R9A or higher, use the following value:

      disk_device_path=/dev/sda

3.4.3.2 Editing the LDE installation.conf File in Case of GEP7L Hardware

In case of SC replacement in BSP 8100 systems of GEP5 with GEP7L hardware, perform the procedure below to edit the installation.conf file.

This procedure is required only after the replacement of the first SC, regardless if it is SC_1 or SC_2. After that no additional changes are required.

Note: This procedure must be skipped for replacement of GEP7L blade with GEP7L blade.

Steps

  1. Establish an SSH session towards the target CUDB node with the following command:
    ssh root@<CUDB_Node_OAM_VIP_Address>
    This session is established to the first or second SC, either to SC_2_1 or SC_2_2.
    Refer to CUDB Users and Passwords for more information on the default root password.
  2. Replace the /cluster/etc/installation.conf file content with the one suitable for GEP7L blades. Keep the installation.conf file name.
    See the installation.conf for GEP7L blade:
    root_password_hash=$2y$10$T/HbyWltNmKp2R1F.JOj5eQ5SSBrSEMGIIy.LI/T1wZm/PG/CKbXi 
    
     disk physical0 
     option physical0 path=/dev/disk/by-path/pci-0000:06:00.0-sas-phy0-0x4433221100000000-lun-0 
     partition lde-boot-part physical0 
     option lde-boot-part size=4G 
     option lde-boot-part boot 
     partition lde-log-part physical0 
     option lde-log-part size=40G 
     partition lde-drbddata-part physical0 
     option lde-drbddata-part size=700G 
     drbd lde-cluster-drbd lde-drbddata-part 
     option lde-cluster-drbd config=/usr/lib/lde/config-management/drbd-resource-config 
     pv lde-cluster-pv lde-cluster-drbd 
     option lde-cluster-pv tag=shared 
     vg lde-cluster-vg lde-cluster-pv 
     option lde-cluster-vg tag=shared 
     lv lde-cluster-lv lde-cluster-vg 
     option lde-cluster-lv tag=shared 
     option lde-cluster-lv size=50% 
     filesystem lde-boot lde-boot-part 
     option lde-boot fs_type=ext3 
     filesystem lde-log lde-log-part 
     option lde-log fs_type=ext3 
     filesystem lde-cluster lde-cluster-lv 
     option lde-cluster fs_type=ext3 
     option lde-cluster tag=shared 
     map control lde-boot 
     map control lde-log 
     map control lde-cluster 
    
     disk physical1 
     option physical1 path=/dev/disk/by-path/pci-0000:06:00.0-sas-phy1-0x4433221101000000-lun-0 
     partition cudb-local-part physical1 
     option cudb-local-part size=100% 
     filesystem /local cudb-local-part 
     option /local fs_type=ext3 
     map control /local 

3.4.4 Editing the LDE cluster.conf File

Perform the following steps to edit the cluster.conf file.

Steps

  1. Establish an SSH session towards the target CUDB node with the following command:
    ssh root@<CUDB_Node_OAM_VIP_Address>
    This session is established to the first or second SC, that is either to SC_2_1 or SC_2_2.
    Refer to CUDB Users and Passwords for more information on the default root password.
  2. Locate the cluster.conf file in the following directory:
    /cluster/etc/cluster.conf
  3. Open the file, and replace the old MACs with the ones. Use tbl-mac-addresses-relation or tbl-mac-address-relation-gep5 in Obtaining MAC Addresses for the New Blade as a means to calculate the actual MAC addresses.
    An example of the LDE cluster.conf file is provided below. Interfaces 1 or 2 are related to blade number: for example, if payload blade PL_2_5 is replaced, then interface 5 needs MAC addresses adaptation.

    Example

    # # Example /cluster/etc/cluster.conf
    ########################
    #  
    #  Interface definition
    #  
    
    interface 1 eth3 ethernet 90:55:ae:3a:b0:7e
    interface 1 eth4 ethernet 90:55:ae:3a:b0:7f
    interface 1 eth5 ethernet 90:55:ae:3a:b0:82
    interface 1 eth6 ethernet 90:55:ae:3a:b0:83
    
    interface 2 eth3 ethernet 90:55:ae:3a:c1:46
    interface 2 eth4 ethernet 90:55:ae:3a:c1:47
    interface 2 eth5 ethernet 90:55:ae:3a:c1:4a
    interface 2 eth6 ethernet 90:55:ae:3a:c1:4b
    
    interface 3 eth3 ethernet 90:55:ae:3a:bf:06
    interface 3 eth4 ethernet 90:55:ae:3a:bf:07
    interface 3 eth5 ethernet 90:55:ae:3a:bf:0a
    interface 3 eth6 ethernet 90:55:ae:3a:bf:0b
    
    interface 4 eth3 ethernet 90:55:ae:3a:c9:fe
    interface 4 eth4 ethernet 90:55:ae:3a:c9:ff
    interface 4 eth5 ethernet 90:55:ae:3a:ca:02
    interface 4 eth6 ethernet 90:55:ae:3a:ca:03
    
    interface 5 eth3 ethernet 90:55:ae:3a:c9:26
    interface 5 eth4 ethernet 90:55:ae:3a:c9:27
    interface 5 eth5 ethernet 90:55:ae:3a:c9:2a
    interface 5 eth6 ethernet 90:55:ae:3a:c9:2b
    

  1. Verify the syntax of the cluster.conf file with the following command:
    cluster config -v
    In case of any error message, check the command output and correct syntax mistakes. Warning messages can be ignored.
  2. Reload the configuration with the following command:
    cluster config --reload --all
    Note: The command fails for a currently replaced blade, this is the expected behavior. (Node X (<name>)not responding, skipped). Continue with next step.
  3. The new blade(s) start(s) booting from network.

3.5 System Controller Replacement Steps

This section describes the procedure to finalize the SC blade replacement.

Do!

If GEP5 blades are replaced with GEP7L blades, replace installation.conf accordingly.

The new blade is by default set to boot from network, the following procedure describes how to set it to boot from hard disk.

During this procedure, the new SC also synchronizes its replicated storage disk partition with another SC. This process can take up to one hour, depending on storage disk partition size and available network bandwidth. Use the following command on another SC to check the synchronization status and drdb Primary with the following command (first listed in command output is drbd status of the current SC):

cat /proc/drbd

Do!

If GEP5 blades are replaced with GEP7L blades, to use the whole disk partition, execute the pvresize /dev/drbd0 and the lvresize -r -L 320G /dev/lde-cluster-vg/lde-cluster-lv commands on SC with drbd process Primary.

Perform the following steps to finalize the SC replacement:

Steps

  1. Restore the original rpm.conf file with the following command: cp /cluster/nodes/<blade>/etc/rpm.conf_FULL /cluster/nodes/<blade>/etc/rpm.conf
    In the above command, <blade> must be replaced with the blade number. For example, in case of SC_2_2, the blade number is 2.
  2. Set the new SC blade to boot from hard disk. See Changing the Boot Device Order for details.
  3. Reboot the new SC from console interface with the following command:
    reboot

3.6 DSG and PLDB Replacement Steps

If the blade to replace is a DSG or PLDB, then perform the following steps:

Steps

  1. Log in to one of the SCs, and execute the following commands:
    ssh root@<CUDB_Node_OAM_VIP_Address>
    Refer to CUDB Users and Passwords for more information on the default root password.
    cd /opt/ericsson/cudb/OAM/support/bin/
    ./cudbPartTool rebuild -n <blade>
    In the above command, <blade> must be replaced with the blade number. For example, in case of PL_2_5, <blade> is 5.
  2. Check if the partition is created with the following command:
    ./cudbPartTool check -n <blade>
    In the above command, <blade> must be replaced with the blade number. For example, in case of PL_2_5, <blade> is 5.
    The output must be similar to the below example:
    CUDB_2 SC_2_1# ./cudbPartTool check -n 5
    CUDB Partitioning Tool...
    
    Node PL_2_5 report:
     WARNING: "/local" partition is not mounted
     WARNING: "/local2" partition is not mounted
     OK
    
    STATUS: OK
    Done.
    
  3. Restore the original rpm.conf file with the following command:
    cp /cluster/nodes/<blade>/etc/rpm.conf_FULL /cluster/nodes/<blade>/etc/rpm.conf
    In the above command, <blade> must be replaced with the blade number. For example, in case of PL_2_5, <blade> is 5.

3.7 Finalizing Replacement

To finish blade replacement, perform the following steps which apply to replacing every blade type (SC, DSG, and PLDB).

Note: In case of SC replacement, crontab jobs and their definitions, or similar tasks, which are not deployed by default in CUDB, or scheduled with data or software backup scripts, will be lost. If necessary, redeploy them after the procedure is completed.

Steps

  1. Unlock the blade at SAF level with the following command:
    cmw-node-unlock <name>
    <name> is the name of the replaced blade, for example PL_2_5.
  2. Reboot the newly-installed blade with the following command:
    cluster reboot -n <blade>
    <blade> is the number of the replaced blade, for example 5 for PL_2_5.
  3. Wait until the blade has rebooted and joined the cluster. Use the following command to list the joined blades, and to check the operational states of the SUs:
    cmw-status -v node
    The expected output must be similar to the below example:
    safAmfNode=PL-7,safAmfCluster=myAmfCluster
    AdminState=UNLOCKED(1)
    OperState=ENABLED(1)
  4. Wait until all the processes are started in the blade and check if the system has recovered without faults with the cudbSystemStatus command. In case of DS, errors related to the DS database can be ignored because of the data restore done later.
    Note: If the status is not correct, stop the procedure, and contact the next level of maintenance support.
  5. Exit the SSH session with the exit command.
  6. Depending on the blade type, do the following:
    • If the replaced blade is an SC, the procedure is finished.

    • If the replaced blade is a DSG blade, to backup and restore a DSG replica, perform the steps described in the CUDB Backup and Restore Procedures.

    • If the replaced blade is any one in PLDB group, to backup and restore a PLDB replica, after the NDBs are started and the mysql server connections are OK, to recreate the stored procedures, execute the following command:

      cudbManageStore -p -o restorestoredprocedures

After This Task

Attention!

Software backup created before blade replacement will not be valid after blade replacement since the backup contains an outdated cluster.conf file, therefore the new blade cannot be reached. For creating a new software backup, follow the steps described in the CUDB Backup and Restore Procedures.

Refer to the CUDB System Administrator Guide for more information.

3.8 Changing the Boot Device Order

3.8.1 Changing the Boot Device Order on GEP3 Boards

To change the boot device order on the GEP3 boards, connect to the SCXB RS232 connector and to the GEP3 console port at the same time. For these connections, two VT100 Terminals and two serial cables are required.

Note: To ensure correct blade operation, the configured boot device type must be Hard disk for SC boards, and Backplane port for payload blades.

Steps

  1. Open a new connection to the BSP 8100 CLI.
  2. Enter configuration mode:
    configure
  3. Turn on the power of the GEP3 board with the following commands:
    Move to VirtualEquipment=cudb branch:
    (config)>ManagedElement=1,DmxcFunction=1,Eqm=1,VirtualEquipment=cudb
    List the available blades and identify the one that must be locked:
    (config-VirtualEquipment=cudb)>show-table -m Blade -p bladeId,userLabel
    ======================= 
    | bladeId | userLabel | 
    ======================= 
    | 0-1     | SC-1      | 
    | 0-3     | SC-2      | 
    | 0-5     | PL-3      | 
    | 0-7     | PL-4      | 
    | 0-9     | PL-5      | 
    ...
    | ======================= 
    
    Set the administrative state of the blade 0-1 (SC-1) to unlocked:
    (config-VirtualEquipment=cudb)>Blade=0-1,administrativeState=UNLOCKED
    (config-VirtualEquipment=cudb)>commit
  4. Open a serial connection to GEP3 board.
  5. Enter PBIST mode by pressing F3 during boot up. The screen must be similar to Figure 1.
    Figure 1   GEP3 BIOS Pop-Up
  6. Wait for the prompt to appear, then type 40 to invoke the Unified Extensible Firmware Interface (UEFI) prompt. A screen similar to the below example must appear.
    Note: After typing 40 above, press any key in 10 seconds. If no keys are pressed in 10 seconds, the GEP3 blade starts booting, and the procedure must be restarted from Step 3.
    Figure 2   PBIST Menu
    Figure 3   Press Key
  7. Execute the ipmi -o display command to check the boot configuration. Then use the ipmi -o pop command to erase the boot devices for GEP3. Repeat this step until the list is empty.

    See Figure 4.

    Figure 4   Boot Devices
  8. Execute the ipmi -o display command to check the boot device list. If the list is empty, execute the ipmi -o push 10 command to add the hard disk to the list.
    Figure 5   Hard Disk Device Input
  9. Execute pbist -r command to reboot the blade.
    Figure 6   PBIST Reboot

3.8.2 Changing the Boot Device Order on GEP5 Boards

To change the boot device order on the GEP5 boards, connect to the SCXB RS232 connector and to the GEP5 console port at the same time. For these connections, two VT100 Terminals and two serial cables are required.

Note: To ensure correct blade operation, the configured boot device type must be for SC boards, and Ethernet Backplane port for payload blades.Hard drive

The instructions below apply to SC boards only. If a payload board is configured, the devices pushed to the IMPI boot table in Step 9 must be numbered as 00 and 01.

Steps

  1. Open a new connection to the BSP 8100 CLI.
  2. Enter configuration mode:
    configure
  3. Turn on the power of the GEP5 board with the following commands:
    Move to VirtualEquipment=cudb branch:
    Hard(config)>ManagedElement=1,DmxcFunction=1,Eqm=1,VirtualEquipment=cudb
    List the available blades and identify the one that must be locked:
    (config-VirtualEquipment=cudb)>show-table -m Blade -p bladeId,userLabel
    ======================= 
    | bladeId | userLabel | 
    ======================= 
    | 0-1     | SC-1      | 
    | 0-3     | SC-2      | 
    | 0-5     | PL-3      | 
    | 0-7     | PL-4      | 
    | 0-9     | PL-5      | 
    ...
    | ======================= 
    
    Set the administrative state of the blade 0-1 (SC-1) to unlocked:
    (config-VirtualEquipment=cudb)>Blade=0-1,administrativeState=UNLOCKED
    (config-VirtualEquipment=cudb)>commit
  4. Open a serial connection to GEP5 board.
  5. Enter PBIST mode by pressing F3 during boot up. The screen must be similar to Figure 7.
    Figure 7   GEP5 PBIST Menu
  6. Wait for the prompt to appear, then type 40 to invoke the UEFI prompt. A screen similar to the below example in Figure 8 must appear.
    Note: After typing 40 above, press any key in 10 seconds. If no keys are pressed in 10 seconds, the GEP5 blade starts booting, and the procedure must be restarted from Step 3.
    Figure 8   Press Any Key in the UEFI Shell
  7. Erase the boot configuration with the following steps:
    1. Type the following command to check the boot configuration:
      ipmi bo display
    2. Erase the boot devices of the GEP5 board with the following command:
      ipmi bo erase
      If you want to delete only one boot device, use the following command:
      ipmi bo pop
    3. Use the ipmi bo display command again to check the boot configuration after the erase.
    See Figure 9 for an example of the above steps.
    Figure 9   Erasing the List of Boot Devices
  8. If the boot device list is empty, run the following commands to add all hard disks to the list:
    ipmi bo push 10
    ipmi bo push 11
    ipmi bo push 12
    See Figure 10 for an example.
    Figure 10   Adding Hard Disks to the Boot Order
  9. Reboot the blade with the following command:
    pbist -r
    An example output is shown in Figure 11.
    Figure 11   Rebooting the GEP5 Board

3.8.3 Changing the Boot Device Order on GEP7L Boards

To change boot device order on GEP7L boards, connect to the SCXB RS232 connector and to the GEP7L console port at the same time. For these connections, two VT100 terminals and two serial cables are required.

Do!

To ensure correct blade operation, the configured boot device type must be "Hard drive" for SC boards, and "Ethernet Backplane port" for payload blades.

Attention!

The instructions below apply to SC boards only. If a payload board is configured, the devices pushed to the IMPI boot table in Step 9 must be numbered as 00 and 01.

Steps

  1. Open a new connection to BSP 8100 CLI.
  2. Enter configuration mode:
    configure
  3. Turn on the power of the GEP7 board with the following commands:
    1. Move to VirtualEquipment=cudb branch:
      (config)> ManagedElement=1,DmxcFunction=1,Eqm=1,VirtualEquipment=cudb
    2. List the available blades and identify the one that must be locked:
      (config-VirtualEquipment=cudb)> show-table -m Blade -p bladeId,userLabel

      Example

      ======================= 
      | bladeId | userLabel | 
      ======================= 
      | 0-1     | SC-1      | 
      | 0-3     | SC-2      | 
      | 0-5     | PL-3      | 
      | 0-7     | PL-4      | 
      | 0-9     | PL-5      | 
      ...
      | ======================= 
      
    3. Set the administrative state of the blade 0-1 (SC-1) to unlocked:
      (config-VirtualEquipment=cudb)> Blade=0-1,administrativeState=UNLOCKED
      (config-VirtualEquipment=cudb)> commit
  4. Open a serial connection to GEP7L board.
  5. Enter PBIST mode by pressing F3 during boot up. The screen must be similar to Figure 12.
    Figure 12   GEP7L PBIST Menu
  6. Wait for the prompt to appear, then type 40 to invoke the Unified Extensible Firmware Interface prompt.
    Note: After typing 40 above, press any key in 3 seconds to proceed to the Unified Extensible Firmware Interface shell. If no keys are pressed in 3 seconds, the GEP7L blade starts booting, and the procedure must be restarted fromStep 3.
    See Figure 13 as an example.
    Figure 13   Unified Extensible Firmware Interface Shell
  7. Erase the boot configuration with the following steps:
    1. Clear the present boot device order with command:
      ipmi oem bo erase
    2. Check the result with command:
      ipmi oem bo display
      It must be empty.
    See Figure 14 as an example for the above steps.
    Figure 14   Erasing the List of Boot Devices
  8. Execute the following commands to add two Internal SAS disks as first and second priority, and Ethernet Backplane Left as third priority to the list:
    ipmi oem bo insert 1 10
    ipmi oem bo insert 2 11
    ipmi oem bo insert 3 00
    See Figure 15 as an example.
    Figure 15   Adding the List of Boot Devices
  9. Reset the board using command:
    reset
    See Figure 16 as an example.
    Figure 16   Rebooting the GEP7L Board

3.9 Replacing Multiple Blades in Parallel

This section provides instructions required to replace multiple blades in parallel on CUDB nodes.

3.9.1 Parallel Blade Replacement Procedure

Only the same group of blades can be replaced in parallel at once. In the CUDB system, blades can be grouped into three distinct groups: SC blades, PLDB blades, and DSG blades. These groups can be further divided into groups of even-numbered and odd-numbered blades, resulting six distinct groups of blades in total:

  1. SC_2_2

  2. SC_2_1

  3. Odd-numbered PLDB blades

  4. Even-numbered PLDB blades

  5. Odd-numbered DSG blades

  6. Even-numbered DSG blades

Stop!

Do not replace blades in parallel if they belong to different blade groups. Replacing blades belonging to different groups in parallel at the same time can cause major node outage.

Perform the following steps to replace multiple blades in parallel:

Note: To ensure that there is enough traffic handling capacity during replacement execution, it is recommended that the maximum number of payload blades to be replaced in parallel must not be larger than the configured value of the redundancyLevel attribute of the CudbLdapAccess class.

If there are more blades to be replaced, it must be done iteratively, in a way that in each iteration, replacement is done for maximum of N blades from the same group in parallel, where N is the value of the redundancyLevel attribute. However, if replacement is done in low traffic period or in a maintenance window, when the degraded traffic handling capacity could still be sufficient, it can be decided to execute replacement for more than N blades in parallel.

Steps

  1. Check the value of the redundancyLevel attribute of the CudbLdapAccess class and take special note of it. For more information, refer to the CUDB Node Configuration Data Model Description.
  2. Identify all faulty blades inside the node, as described in Identifying the Faulty Blade to be able to group them.
  3. Identify the position of all faulty blades, as described in Identifying Blade Rack and Subrack Position.
  4. Prepare for the replacement of all faulty blades, as described in Preparing the Blade Replacement.
  5. In case of replacing SC group(s) or PLDB group(s), force the external applications to move their primary connections to another CUDB node. This applies in case primary connections are established, or the SC or the PLDB blades are affected.
  6. Execute the blade replacement exactly in the following order, skipping any group which has no faulty blades:
    1. SC_2_2
    2. SC_2_1
    3. Odd-numbered PLDB blades
    4. Even-numbered PLDB blades
    5. Odd-numbered DSG blades
    6. Even-numbered DSG blades
    Do!

    Always follow the order of groups exactly.

  7. If number of blades that have been replaced in parallel was greater than value of redundancy level parameter, please also execute cudbLdapFeRestart command. For more information, refer to the CUDB Node Commands and Parameters.

Reference List