EEM Applet to augment IOS sip-ua response to SIP registration failures with multiple redundant SIP Proxies.
The EEM applet described below is used in the IOS configuration of a Cisco IAD2431 Voice Gateway that used to connect a PBX to a SIP-based telephone network. The current SIP User Agent (sip-ua) built into the 2431's IOS performs well in normal operation and in some error cases where it is unsuccessful in attempting to register with the first of it's 3 available SIP proxies. For example, when the IOS sip-ua receives no response at all to it's SIP REGISTER requests, it successfully rotates among the 3 SIP proxies return by its DNS query until it receives a successful 200 OK response from any one of the three proxies. However, if the IOS sip-ua receives repeated SIP error responses to its REGISTER messages, such as a SIP 403 Forbidden, a 408 Timeout, a 480 Termporarily Unavailable or a 500 Internal Server Error, the IOS sip-ua does not try to register with a different redundant SIP proxy. Instead, it waits for a configurable time, such as 3 minutes, during which it cannot provide telephone service to the PBX. After 3 minutes, it attempts to register again with the same SIP Proxy IP address. The 2431 can repeat this behavior indefinitely, even though one or both of the other 2 Proxies it received via DNS query may be able to successfully handle its SIP registration request and enter normal telephone call processing mode for the PBX. Also, the 2431 does not re-register with its SIP Proxies if the IP address it received via DHCP expires and it receives a new, different IP address via DHCP. Under normal circumstances, the 2431 will be able to renew its DHCP lease for the same IP address it previously had leased. However, periodically network administrators must perform a 'Re-IP' procedure to maintain the network's IP address ranges, resulting in a new DHCP IP address assignment to the 2431, which could cause a lengthy telephone service interruption to the 2431's PBX if the 2431 does not promptly register its new IP address with its SIP proxy.
The EEM applet described below enhances the 2431's behavior in these unusual situations to automatically recover its SIP registration with a working SIP proxy to quickly restore telephone service to the PBX. The applet has been extensively tested and is deployed in the network. The applet is triggered by detection of increments of the 2431's internal SNMP counter OIDs in the IOS SIP MIB that are configured as EEM events. The counter OIDs used are incremented when the IOS sip-ua receives 403, 408, 480 or 500 errors respectively, which could be from in response to REGISTER requests or to any other SIP request such as an INVITE, BYE, CANCEL, etc. The applet also detects sip-ua counter OID incremented by SIP INVITE retries, as well as a routing OID that is incremented when a DHCP discover results in the addition a new IP route. When the EEM applet is triggered, it first checks to see if it was assigned a new IP address by DHCP, and if so immediately executes a CLI command sequence described below that performs a new SIP registration. If DHCP did not trigger the applet, it next checks its SIP registration status, by issuing the IOS CLI command 'show sip-ua register status' command and looking for the string 'yes' in the response. If its SIP registration is expired, the applet immediately executes the series of IOS commands that first flushes the DNS hosts cache, then enters global configuration mode, enters the sip-ua config mode, issues the 'no registrar' command, waits 2 seconds, and then re-issues the normal 'registrar' command containing the dns:<sip-proxy fqdn> argument. As a result of this CLI sequence, the 2431's sip-ua first de-registers with the last SIP Proxy IP address it had resolved, then performs a DNS query for the SIP Proxy FQDN, and finally issues a SIP REGISTER request to the 1st IP address returned by the DNS query. The 2431's DNS server is configured to rotate the order of its configured SIP Proxy IP addresses in its replies to each successive DNS query it recieves from any of the multiple IAD2431's deployed in the network configured with the same SIP Proxy FQDN. Thus the 2431's query has a probability of 2/3 of receiving a different SIP Proxy IP address as first choice in the DNS reply. The 2431 IOS sip-ua then performs a new SIP REGISTER to the new 1st choice Proxy. Then the EEM applet then waits 10 seconds and again issues the 'show sip-ua register status' CLI command and looks for 'yes' in the response; if 'yes' is not found the applet repeats the CLI command sequence up to 6 times before going into a randomized backoff timer loop for 200 - 400 seconds. If upon initial trigger of the applet, its SIP Registration was not expired, the applet just increments a 'leakey bucket' error count by 4. Then the applet reads other SNMP OIDs that record successful calls and decrements the bucket count by 2 for each successful call since the last time the applet was executed. Finally, the applet checks to see whether the bucket count has reached a threshold of 40, and if so, the applet executes the CLI sequence described above to clear the hosts cache and perform a new SIP REGISTER. Below is the RegBucket applet listing, followed by the companion InitVars applet listing, both of which are entered into the IOS configuration. Below the applet is the normal 'sip-ua' and 'voice service voip' sections of the IOS configuration for reference.
event manager applet RegBucket
description RegBucket performs Re-Registration with multiple SIP Proxys upon various SIP errors using a leaky bucket algorithm.
event tag 403 snmp oid 126.96.36.199.188.8.131.52.184.108.40.206.7.0 get-type exact entry-op ge entry-val "1" entry-type increment poll-interval 30 maxrun 7200
event tag 408 snmp oid 220.127.116.11.18.104.22.168.22.214.171.124.17.0 get-type exact entry-op ge entry-val "1" entry-type increment poll-interval 30
event tag 480 snmp oid 126.96.36.199.188.8.131.52.184.108.40.206.33.0 get-type exact entry-op ge entry-val "1" entry-type increment poll-interval 30
event tag 500 snmp oid 220.127.116.11.18.104.22.168.22.214.171.124.1.0 get-type exact entry-op ge entry-val "1" entry-type increment poll-interval 30
event tag InviteRetries snmp oid 126.96.36.199.188.8.131.52.184.108.40.206.1.0 get-type exact entry-op ge entry-val "2" entry-type increment poll-interval 30
event tag dhcp routing network 0.0.0.0/0 type add protocol connected ge 1
correlate event 403 or event 408 or event 480 or event 500 or event InviteRetries or event dhcp
attribute tag 403 occurs 1
attribute tag 408 occurs 1
attribute tag 480 occurs 1
attribute tag 500 occurs 1
attribute tag InviteRetries occurs 1
attribute tag dhcp occurs 1
action 010 handle-error type ignore
action 015 context retrieve key SIPCTXT1 variable "Bucket"
action 020 context retrieve key SIPCTXT2 variable "LastCalls"
action 025 context retrieve key SIPCTXT3 variable "Flush"
action 030 handle-error type exit
action 035 cli command "en"
action 040 if $_event_type_string eq routing
action 050 snmp-trap strdata "action 050: Triggered by addition of new default route usually due to DHCP discover, starting re-registration actions."
action 060 if $_event_type_string eq routing goto 400
action 070 end
action 071 cli command "show sip-ua register status | include yes"
action 072 string first "yes" "$_cli_result"
action 073 if $_string_result le 0
action 074 snmp-trap strdata "action 074: SIP registration renewal failed, expires in <10 minutes, or expired already, starting re-registration actions."
action 075 if $_string_result le 0 goto 400
action 077 elseif $_snmp_oid eq 220.127.116.11.18.104.22.168.22.214.171.124.7.0
action 078 snmp-trap strdata "action 078: Triggered by SIP 403 error, starting re-registration actions."
action 079 if $_snmp_oid eq 126.96.36.199.188.8.131.52.184.108.40.206.7.0 goto 400
action 081 elseif $_snmp_oid eq 220.127.116.11.18.104.22.168.22.214.171.124.17.0
action 082 syslog priority error msg "action 082: Triggered by SIP 408 error"
action 084 elseif $_snmp_oid eq 126.96.36.199.188.8.131.52.184.108.40.206.33.0
action 085 syslog priority error msg "action 085: Triggered by SIP 480 error"
action 087 elseif $_snmp_oid eq 220.127.116.11.18.104.22.168.22.214.171.124.1.0
action 088 syslog priority error msg "action 088: Triggered by SIP 500 error"
action 090 elseif $_snmp_oid eq 126.96.36.199.188.8.131.52.184.108.40.206.1.0
action 091 syslog priority errors msg "action 091: Triggered by INVITE Timeout Retries"
action 092 end
action 100 multiply $_snmp_oid_delta_val $ErrWeight
action 102 if $_snmp_oid eq 220.127.116.11.18.104.22.168.22.214.171.124.1.0
action 104 divide $_result 2
action 106 end
action 110 add $Bucket $_result
action 120 set Bucket "$_result"
action 130 syslog priority error msg "Bucket filled to $Bucket from delta $_snmp_oid_delta_val"
action 200 info type snmp oid 126.96.36.199.188.8.131.52.184.108.40.206.2.1 get-type exact
action 210 set Calls "$_info_snmp_value"
action 220 syslog priority info msg "Successful Calls Out Oid = $Calls"
action 230 info type snmp oid 220.127.116.11.18.104.22.168.22.214.171.124.4.1 get-type exact
action 240 add $Calls $_info_snmp_value
action 250 set Calls "$_result"
action 260 syslog priority info msg "Successful Calls In Oid = $_info_snmp_value, Total Successful Calls = $Calls"
action 270 subtract $Calls $LastCalls
action 280 subtract $Bucket $_result
action 290 set Bucket "$_result"
action 300 syslog priority info msg "Bucket leaked to $Bucket by ($Calls - $LastCalls)"
action 310 set LastCalls "$Calls"
action 320 if $Bucket lt 0
action 330 set Bucket "0"
action 340 set Flush "0"
action 350 end
action 360 if $Bucket lt $ErrThld goto 900
action 380 snmp-trap strdata "action 380: Error Bucket = $Bucket exceeds threshold = $ErrThld, starting re-registration actions."
action 400 syslog priority error msg "ReRegistration Initiated with Flush = $Flush"
action 500 if $Flush gt 0
action 510 cli command "clear host all *"
action 520 end
action 530 increment Flush 1
action 600 cli command "config t"
action 610 cli command "sip-ua"
action 620 cli command "no registrar"
action 630 wait 2
action 640 cli command "registrar dns:ims.eng.rr.com expires 600 auth-realm ims.eng.rr.com"
action 650 cli command "end"
action 660 wait 10
action 700 cli command "show sip-ua register status | include yes"
action 710 string first "yes" "$_cli_result"
action 730 if $_string_result gt 0 goto 800
action 750 if $Flush lt 6 goto 500
action 760 cli command "show clock"
action 762 regexp "..:..:..\.(...)" "$_cli_result" time ms
action 764 set i "20"
action 766 divide $ms 50
action 768 add $i $_result
action 770 set i "$_result"
action 772 multiply $i 10
action 774 snmp-trap strdata "action 774: SIP registration failed 3+ times in a row, backing off for $_result seconds."
action 780 while $i gt 0
action 782 wait 10
action 784 subtract $i 1
action 786 set i "$_result"
action 788 end
action 790 set Flush "0"
action 792 cli command "show sip-ua register status | include yes"
action 794 string first "yes" "$_cli_result"
action 796 if $_string_result le 0 goto 500
action 800 set Bucket "0"
action 810 snmp-trap strdata "action 810: SIP re-registration succeeded, exiting applet"
action 900 context save key SIPCTXT1 variable "Bucket"
action 910 context save key SIPCTXT2 variable "LastCalls"
action 920 context save key SIPCTXT3 variable "Flush"
action 930 syslog msg "Exiting with Bucket = $Bucket, LastCalls = $LastCalls, Flush = $Flush"
action 999 cli command "event manager scheduler clear all"
event manager applet InitVars
event syslog pattern "SYS-5-RESTART"
action 01 handle-error type ignore
action 02 context retrieve key SIPCTXT1 variable "Bucket"
action 03 context retrieve key SIPCTXT2 variable "LastCalls"
action 04 context retrieve key SIPCTXT3 variable "Flush"
action 05 handle-error type exit
action 10 set Bucket "0"
action 20 set LastCalls "0"
action 30 set Flush "0"
action 40 context save key SIPCTXT1 variable "Bucket"
action 50 context save key SIPCTXT2 variable "LastCalls"
action 60 context save key SIPCTXT3 variable "Flush"
action 99 syslog msg "EEM variables initialized: Bucket = $Bucket, LastCalls = $LastCalls, and Flush = $Bucket for RegBucket"
event manager detector routing bootup-delay 120
voice service voip
no ip address trusted authenticate
fax protocol none
outbound-proxy dns:pcscf-gm.ims.rr.com reuse
associate registered-number 9193030000
no call service stop
preloaded-route sip-server service-route
credentials number 9193030000 username firstname.lastname@example.org password bcpri1234 realm ims.eng.rr.com
authentication username email@example.com password bcpri1234 realm ims.rr.com
set pstn-cause 3 sip-status 503
set pstn-cause 44 sip-status 480
retry invite 2
retry register 2 exhausted minimum 3 maximum 6
timers register 1000
registrar dns:ims.rr.com expires 600 auth-realm ims.rr.com
I just googled 'eem applet reboot' and the 2nd result was:
This link shows the 'action nnn reload' command. Of course you need to use the appropriate 'if' or other condition testing command to determine when/whether to reboot. From my applet above, you could use:
action 700 cli command "show sip-ua register status | include yes"
action 710 string first "yes" "$_cli_result"
action 730 if $_string_result gt 0 goto nnn
Of course the details of these and other commands are documented in the EmbeddedEventManagerCommandReference-eem-cr-book.pdf and the Writing-EEM-Policies-Using-CLI-15.0-nm_eem_policy_cli.pdf documents, both available on the Cisco web site.