cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
869
Views
5
Helpful
0
Comments
Suprabha P
Cisco Employee
Cisco Employee

Introduction

In a customer network with multiple VSM cards, if multiple VSM cards are reloaded for some reason, it is observed that the VSM comes to XR-RUN state, however the ova activation gets stuck in recovering state as shown below :

RP/0/RSP0/CPU0:Cluster#show virtual-service list

Fri Mar 4 12:16:11.399 UTC

Virtual Service List:

 

Service Name Status Package Name Node Name

______________________________________________________________________________

cgn123 Recovering asr9k-vsm-cgv6-5.2.4.02. 0/2/CPU0

cgn456 Recovering asr9k-vsm-cgv6-5.2.4.02. 1/2/CPU0

RP/0/RSP0/CPU0:Cluster#

If this happens on a customer network, the traffic will be completely lost. In order to recover from this scenario, the service_mgr process has to be restarted. Post restart, the ova will come up in activated state. This EEM script is written to take care of this scenario and recover the VSM ova and services.

Implementation

This is an event based EEM script which would be triggered once the below event is observed : RP/0/RSP0/CPU0:Mar 4 11:22:10 : shelfmgr[385]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/2/CPU0 A9K-VSM-500 state:IOS XR RUN

If multiple cards report this event, multiple instances of this script would be triggered.

The script will keep checking output of “show virtual-service list” for the ova status. If the status is stuck in “Recovering” for >20 secs, then the script will trigger restart of the service_mgr process. It then checks for the ova status again after 20 secs and reports the status of the ova.

Steps to execute EEM script:

Step 1: copy EEM script(virtual_service.tcl) to harddisk:/scripts.

Create a directory scripts under harddisk:
----------------------------------------------------
cd harddisk:
mkdir scripts

Copy the script to the asr9k router
------------------------------------------
Copy the file virtual_service.tcl to the harddisk:/scripts/

Step2 : Configuration required for to specify the script location, authentication and username under which the scripts are executed.

RP/0/RSP0/CPU0:BRAHMAPUTRA#conf

Sat Mar 12 19:09:38.727 IST

RP/0/RSP0/CPU0:BRAHMAPUTRA(config)#event manager directory user policy harddisk:/scripts

RP/0/RSP0/CPU0:BRAHMAPUTRA(config)#aaa authorization eventmanager default local

RP/0/RSP0/CPU0:BRAHMAPUTRA(config)#commit

Sat Mar 12 19:09:42.028 IST

RP/0/RSP0/CPU0:BRAHMAPUTRA(config)#end

RP/0/RSP0/CPU0:BRAHMAPUTRA#conf

Sat Mar 12 19:10:29.886 IST

RP/0/RSP0/CPU0:BRAHMAPUTRA(config)#username eem_user

RP/0/RSP0/CPU0:BRAHMAPUTRA(config-un)# group root-system

RP/0/RSP0/CPU0:BRAHMAPUTRA(config-un)# group cisco-support

RP/0/RSP0/CPU0:BRAHMAPUTRA(config-un)#commit

Sat Mar 12 19:10:33.967 IST

RP/0/RSP0/CPU0:BRAHMAPUTRA(config-un)#

Step3: Register the script:

RP/0/RSP0/CPU0:BRAHMAPUTRA#conf

Tue Mar 15 17:39:51.579 IST

RP/0/RSP0/CPU0:BRAHMAPUTRA(config)#event manager policy virtual_service.tcl username eem_user persist-time 3600 type user

RP/0/RSP0/CPU0:BRAHMAPUTRA(config)#commit

Tue Mar 15 17:40:02.882 IST

RP/0/RSP0/CPU0:Mar 15 17:40:13.064 : eem_policy_dir[197]: %HA-HA_EM-6-FMPD_POLICY_REG_SUCC : fh_reg_unreg_policy: Policy 'virtual_service.tcl' registered successfully, by user eem_user, with persist time 3600 and type 1

RP/0/RSP0/CPU0:Mar 15 17:40:13.180 : config[65853]: %MGBL-CONFIG-6-DB_COMMIT : Configuration committed by user 'lab'. Use 'show configuration commit changes 1000000150' to view the changes.

RP/0/RSP0/CPU0:BRAHMAPUTRA(config)#exit

RP/0/RSP0/CPU0:Mar 15 17:40:21.730 : config[65853]: %MGBL-SYS-5-CONFIG_I : Configured from console by lab

RP/0/RSP0/CPU0:BRAHMAPUTRA#

Note: please, check “Policy 'virtual_service.tcl' registered successfully” or not. If “registered successfully” is present then ignore remaining logs.

Step4: Check whether script is registered or not.

RP/0/RSP0/CPU0:BRAHMAPUTRA#show event manager policy registered

Tue Mar 15 18:24:20.233 IST

No. Class Type Event Type Trap Time Registered Name

1 script user syslog Off Tue Mar 15 17:59:01 2016 virtual_service.tcl

pattern {A9K-VSM-500 state:IOS XR RUN}

nice 0 queue-priority normal maxrun 600.000 scheduler rp_primary Secu none

persist_time: 3600 seconds, username: eem_user

OR

RP/0/RSP0/CPU0:BRAHMAPUTRA#sh run | i event manager

Tue Mar 15 18:25:20.198 IST

Building configuration...

event manager directory user policy harddisk:/scripts

event manager policy virtual_service.tcl username eem_user persist-time 3600 type user

Sample Output :

The script gets triggered as soon as it see the syslog event of VSM going to IOSXR-RUN state.

RP/0/RSP0/CPU0:Apr 21 14:04:45.273 : shelfmgr[442]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/1/CPU0 A9K-VSM-500 state:IOS XR RUN

RP/0/RSP0/CPU0:Apr 21 14:04:45.275 : invmgr[266]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/1/CPU0, state: IOS XR RUN

RP/0/RSP0/CPU0:Apr 21 14:03:49.431 : tclsh[65956]: %HA-HA_EEM-6-ACTION_SYSLOG_LOG_INFO : virtual_service.tcl: VSM AT 0/1/CPU0 IS UP, CHECK STATUS OF CGN OVA

RP/0/RSP0/CPU0:Apr 21 14:05:08.847 : tclsh[65956]: %HA-HA_EEM-6-ACTION_SYSLOG_LOG_INFO : virtual_service.tcl: VSM LOCATION 0/1/CPU0 MATCHES SYSLOG EVENT LOCATION 0/1/CPU0

RP/0/RSP0/CPU0:Apr 21 14:05:08.848 : tclsh[65956]: %HA-HA_EEM-6-ACTION_SYSLOG_LOG_INFO : virtual_service.tcl: OVA IS IN Recovering STATE AT 0/1/CPU0

RP/0/RSP0/CPU0:Apr 21 14:05:28.849 : tclsh[65956]: %HA-HA_EEM-6-ACTION_SYSLOG_LOG_INFO : virtual_service.tcl: CHECK OVA STATUS AGAIN AFTER 20 SECS AT 0/1/CPU0

RP/0/RSP0/CPU0:Apr 21 14:05:50.238 : tclsh[65956]: %HA-HA_EEM-6-ACTION_SYSLOG_LOG_INFO : virtual_service.tcl: OVA IS IN Recovering STATE AT 0/1/CPU0, RESTART service_mgr TO RECOVER

RP/0/RSP0/CPU0:Apr 21 14:05:50.500 : sysmgr_control[65958]: %OS-SYSMGR-4-PROC_RESTART_NAME : User eem_user (vty100) requested a restart of process service_mgr at 0/RSP0/CPU0

RP/0/RSP0/CPU0:Apr 21 14:06:10.548 : tclsh[65956]: %HA-HA_EEM-6-ACTION_SYSLOG_LOG_INFO : virtual_service.tcl: CHECKING OVA STATUS AFTER 20 SECS AT 0/1/CPU0

RP/0/RSP0/CPU0:Apr 21 14:06:11.636 : tclsh[65956]: %HA-HA_EEM-2-ACTION_SYSLOG_LOG_INFO : virtual_service.tcl: OVA AT 0/1/CPU0 IS ACTIVATED SUCCESSFULLY AFTER PROCESS RESTART

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Quick Links