on 04-25-2016 05:41 AM
Introduction
In a customer network with multiple VSM cards, if multiple VSM cards are reloaded for some reason, it is observed that the VSM comes to XR-RUN state, however the ova activation gets stuck in recovering state as shown below :
RP/0/RSP0/CPU0:Cluster#show virtual-service list
Fri Mar 4 12:16:11.399 UTC
Virtual Service List:
Service Name Status Package Name Node Name
______________________________________________________________________________
cgn123 Recovering asr9k-vsm-cgv6-5.2.4.02. 0/2/CPU0
cgn456 Recovering asr9k-vsm-cgv6-5.2.4.02. 1/2/CPU0
RP/0/RSP0/CPU0:Cluster#
If this happens on a customer network, the traffic will be completely lost. In order to recover from this scenario, the service_mgr process has to be restarted. Post restart, the ova will come up in activated state. This EEM script is written to take care of this scenario and recover the VSM ova and services.
Implementation
This is an event based EEM script which would be triggered once the below event is observed : RP/0/RSP0/CPU0:Mar 4 11:22:10 : shelfmgr[385]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/2/CPU0 A9K-VSM-500 state:IOS XR RUN
If multiple cards report this event, multiple instances of this script would be triggered.
The script will keep checking output of “show virtual-service list” for the ova status. If the status is stuck in “Recovering” for >20 secs, then the script will trigger restart of the service_mgr process. It then checks for the ova status again after 20 secs and reports the status of the ova.
Steps to execute EEM script:
Step 1: copy EEM script(virtual_service.tcl) to harddisk:/scripts.
Create a directory scripts under harddisk:
----------------------------------------------------
cd harddisk:
mkdir scripts
Copy the script to the asr9k router
------------------------------------------
Copy the file virtual_service.tcl to the harddisk:/scripts/
Step2 : Configuration required for to specify the script location, authentication and username under which the scripts are executed.
RP/0/RSP0/CPU0:BRAHMAPUTRA#conf
Sat Mar 12 19:09:38.727 IST
RP/0/RSP0/CPU0:BRAHMAPUTRA(config)#event manager directory user policy harddisk:/scripts
RP/0/RSP0/CPU0:BRAHMAPUTRA(config)#aaa authorization eventmanager default local
RP/0/RSP0/CPU0:BRAHMAPUTRA(config)#commit
Sat Mar 12 19:09:42.028 IST
RP/0/RSP0/CPU0:BRAHMAPUTRA(config)#end
RP/0/RSP0/CPU0:BRAHMAPUTRA#conf
Sat Mar 12 19:10:29.886 IST
RP/0/RSP0/CPU0:BRAHMAPUTRA(config)#username eem_user
RP/0/RSP0/CPU0:BRAHMAPUTRA(config-un)# group root-system
RP/0/RSP0/CPU0:BRAHMAPUTRA(config-un)# group cisco-support
RP/0/RSP0/CPU0:BRAHMAPUTRA(config-un)#commit
Sat Mar 12 19:10:33.967 IST
RP/0/RSP0/CPU0:BRAHMAPUTRA(config-un)#
Step3: Register the script:
RP/0/RSP0/CPU0:BRAHMAPUTRA#conf
Tue Mar 15 17:39:51.579 IST
RP/0/RSP0/CPU0:BRAHMAPUTRA(config)#event manager policy virtual_service.tcl username eem_user persist-time 3600 type user
RP/0/RSP0/CPU0:BRAHMAPUTRA(config)#commit
Tue Mar 15 17:40:02.882 IST
RP/0/RSP0/CPU0:Mar 15 17:40:13.064 : eem_policy_dir[197]: %HA-HA_EM-6-FMPD_POLICY_REG_SUCC : fh_reg_unreg_policy: Policy 'virtual_service.tcl' registered successfully, by user eem_user, with persist time 3600 and type 1
RP/0/RSP0/CPU0:Mar 15 17:40:13.180 : config[65853]: %MGBL-CONFIG-6-DB_COMMIT : Configuration committed by user 'lab'. Use 'show configuration commit changes 1000000150' to view the changes.
RP/0/RSP0/CPU0:BRAHMAPUTRA(config)#exit
RP/0/RSP0/CPU0:Mar 15 17:40:21.730 : config[65853]: %MGBL-SYS-5-CONFIG_I : Configured from console by lab
RP/0/RSP0/CPU0:BRAHMAPUTRA#
Note: please, check “Policy 'virtual_service.tcl' registered successfully” or not. If “registered successfully” is present then ignore remaining logs.
Step4: Check whether script is registered or not.
RP/0/RSP0/CPU0:BRAHMAPUTRA#show event manager policy registered
Tue Mar 15 18:24:20.233 IST
No. Class Type Event Type Trap Time Registered Name
1 script user syslog Off Tue Mar 15 17:59:01 2016 virtual_service.tcl
pattern {A9K-VSM-500 state:IOS XR RUN}
nice 0 queue-priority normal maxrun 600.000 scheduler rp_primary Secu none
persist_time: 3600 seconds, username: eem_user
OR
RP/0/RSP0/CPU0:BRAHMAPUTRA#sh run | i event manager
Tue Mar 15 18:25:20.198 IST
Building configuration...
event manager directory user policy harddisk:/scripts
event manager policy virtual_service.tcl username eem_user persist-time 3600 type user
Sample Output :
The script gets triggered as soon as it see the syslog event of VSM going to IOSXR-RUN state.
RP/0/RSP0/CPU0:Apr 21 14:04:45.273 : shelfmgr[442]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/1/CPU0 A9K-VSM-500 state:IOS XR RUN
RP/0/RSP0/CPU0:Apr 21 14:04:45.275 : invmgr[266]: %PLATFORM-INV-6-NODE_STATE_CHANGE : Node: 0/1/CPU0, state: IOS XR RUN
RP/0/RSP0/CPU0:Apr 21 14:03:49.431 : tclsh[65956]: %HA-HA_EEM-6-ACTION_SYSLOG_LOG_INFO : virtual_service.tcl: VSM AT 0/1/CPU0 IS UP, CHECK STATUS OF CGN OVA
RP/0/RSP0/CPU0:Apr 21 14:05:08.847 : tclsh[65956]: %HA-HA_EEM-6-ACTION_SYSLOG_LOG_INFO : virtual_service.tcl: VSM LOCATION 0/1/CPU0 MATCHES SYSLOG EVENT LOCATION 0/1/CPU0
RP/0/RSP0/CPU0:Apr 21 14:05:08.848 : tclsh[65956]: %HA-HA_EEM-6-ACTION_SYSLOG_LOG_INFO : virtual_service.tcl: OVA IS IN Recovering STATE AT 0/1/CPU0
RP/0/RSP0/CPU0:Apr 21 14:05:28.849 : tclsh[65956]: %HA-HA_EEM-6-ACTION_SYSLOG_LOG_INFO : virtual_service.tcl: CHECK OVA STATUS AGAIN AFTER 20 SECS AT 0/1/CPU0
RP/0/RSP0/CPU0:Apr 21 14:05:50.238 : tclsh[65956]: %HA-HA_EEM-6-ACTION_SYSLOG_LOG_INFO : virtual_service.tcl: OVA IS IN Recovering STATE AT 0/1/CPU0, RESTART service_mgr TO RECOVER
RP/0/RSP0/CPU0:Apr 21 14:05:50.500 : sysmgr_control[65958]: %OS-SYSMGR-4-PROC_RESTART_NAME : User eem_user (vty100) requested a restart of process service_mgr at 0/RSP0/CPU0
RP/0/RSP0/CPU0:Apr 21 14:06:10.548 : tclsh[65956]: %HA-HA_EEM-6-ACTION_SYSLOG_LOG_INFO : virtual_service.tcl: CHECKING OVA STATUS AFTER 20 SECS AT 0/1/CPU0
RP/0/RSP0/CPU0:Apr 21 14:06:11.636 : tclsh[65956]: %HA-HA_EEM-2-ACTION_SYSLOG_LOG_INFO : virtual_service.tcl: OVA AT 0/1/CPU0 IS ACTIVATED SUCCESSFULLY AFTER PROCESS RESTART
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: