cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
165
Views
0
Helpful
2
Replies

Cisco EEM script for 4G/5G mobile network stability

crunchyrolls
Level 1
Level 1

I'm just posting this here for the future benefit of the Cisco community, and mainly to make this solution visible on Google searches.

Background information: Anybody who operates Cisco cellular equipment at scale will be aware that no matter how much effort you spend on improving uptime of your mobile/cellular WANs, there will always invariably exist hard outages of router(s) that require a power cycle to recover it. This can be caused by numerous reasons (whether it's Cisco bugs, long-running LTE backoff, carrier service anomalies in a geographic service area, roaming issues, etc).

The Cellular Modem Link Recovery feature was introduced many years ago for such aforementioned issues, and is enabled by default in Cisco IOS. In most cases, that feature does a decent job of recovering the cellular connectivity. However, often the ultimate requirement is to ask somebody to locally power cycle the device to completely disconnect the cellular modem from the mobile network and reboot it.

Supplemental cellular watchdog script: Below are two sample EEM scripts that operate as a supplemental failback to the Cellular Modem Link Recovery feature, and my opinion is that Cisco really needs to add these (or something similar) into IOS by default (so that customers can choose to use them or delete them). After spending some time optimizing the script, I wanted to share this solution because hard outages on our cellular WANs have almost essentially disappeared.

Basic principles of the script:

1) The script monitors (hardcoded syslogging) of the targeted cellular interface and starts a 10 minute countdown timer when an outage occurs.

2) We define a maximum of 2 applets can run at one time (by default, EEM allocates 32 scheduler threads for applets). This is a precaution, so that a rapidly flapping cellular interface cannot cause a race condition (or overutilization of memory/cpu on lower end hardware). So if you need multiple applets to run just increase the number.

3) At this point, the Cellular Modem Link Recovery feature (if enabled) theoretically should initialize and recover cellular connectivity, and the EEM countdown timer will wait for 10 minutes.

4) After 10 minutes, the EEM script will query the status (up or down) of the cellular interface (using it's SNMP OID), and if link recovery was unable to restore connectivity, then the router is reloaded. You'll need to use the correct SNMP OID 

5) The important key is to use the EEM reload command, because it's able to forcibly interrupt conflicting IOS daemons and initialize a graceful shutdown of the cellular modem and reload of the router. (It does not work to script a cli "reload" command because that method will fail to interrupt locking processes such as the Cellular Modem Link Recovery feature).

6) You should be running IOS XE 17.9.1 or later, because cellular link up-time was improved by 20%, and also use Install Mode (vs Bundle Mode) since reload time is much faster as well.

7) Note that the SNMP OIDs in the script must be customized to match the exact Cisco hardware that you're installing the script on (they are very different from platform to platform).

 

This script is for Verizon Wireless Private Network (monitors the Mobile IP tunnel):

event manager scheduler applet thread class default number 2
event manager applet mip-watchdog authorization bypass
event syslog pattern "%IPMOBILE-5-MIP_TUNNELDELETE: Mobile IP tunnel Tunnel0 deleting" maxrun 900
action 001 syslog msg "Starting mip-watchdog EEM script"
action 002 wait 600
action 003 info type snmp oid 1.3.6.1.2.1.2.2.1.8.14 get-type exact
action 004 syslog msg "The SNMP OID value for interface Tunnel0 is $_info_snmp_value"
action 005 if $_info_snmp_value ne "1"
action 006 syslog msg "Interface Tunnel0 is currently down, rebooting router."
action 007 cli command "enable"
action 008 cli command "term exec prompt timestamp"
action 009 cli command "term length 0"
action 010 reload
action 011 else
action 012 syslog msg "Interface Tunnel0 is currently up, taking no action."
action 013 syslog msg "Stopping mip-watchdog EEM script"
action 014 end


This script is for various providers such as Verizon, AT&T, T-Mobile (monitoring the desired cellular interface):

event manager scheduler applet thread class default number 2
event manager applet cellular-watchdog authorization bypass
event syslog pattern "%LINEPROTO-5-UPDOWN: Line protocol on Interface Cellular0/1/0, changed state to down" maxrun 900
action 001 syslog msg "Starting cellular-watchdog EEM script"
action 002 wait 600
action 003 info type snmp oid 1.3.6.1.2.1.2.2.1.2.6 get-type exact
action 004 syslog msg "The SNMP OID value for interface Celluar0/1/0 is $_info_snmp_value"
action 005 if $_info_snmp_value ne "1"
action 006 syslog msg "Interface Cellular0/1/0 is currently down, rebooting router."
action 007 cli command "enable"
action 008 cli command "term exec prompt timestamp"
action 009 cli command "term length 0"
action 010 reload
action 011 else
action 012 syslog msg "Interface Cellular0/1/0 is currently up, taking no action."
action 013 syslog msg "Stopping cellular-watchdog EEM script"
action 014 end

 

Good Luck, hopefully this is helpful to folks in the future.

2 Replies 2

Leo Laohoo
Hall of Fame
Hall of Fame

Don't forget to upgrade the Sierra Wireless cellular modem's firmware.  

Correct, actually both the firmware and carrier PRI. Especially check those on RMA replacement hardware (which can sit around in Cisco inventory for a while) in my experience:

https://www.cisco.com/c/dam/en/us/td/docs/routers/access/interfaces/firmware/Cisco-Firmware-Upgrade-Guide-for-4G-LTE-and-5G-Cellular-Modems.pdf

 

Review Cisco Networking for a $25 gift card