cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2920
Views
0
Helpful
9
Replies

SLA Monitor Issue

jtabasz
Cisco Employee
Cisco Employee

I'm using sla monitor to monitor 2 interfaces on a failover pair of 5525s. Here are the sla mon config lines:

sla monitor 10
 type echo protocol ipIcmpEcho 25.25.25.154 interface outside  (This IP is reachable by both interfaces and is not the gw to either ISP)
 num-packets 100
 timeout 1500
 threshold 100
 frequency 15
sla monitor schedule 10 life forever start-time now
sla monitor 50
 type echo protocol ipIcmpEcho 25.25.25.154 interface outsidebk
 num-packets 100
 timeout 1500
 threshold 100
 frequency 15
sla monitor schedule 50 life forever start-time now

Based on the my understanding of the documentation, every 15 seconds, 100 icmp requests are sent to the target IP, and if at least one comes back within 1.5 seconds, the process sleeps for another 15 seconds, then sends another 100 packets out, etc..

I'm getting log messages with a timestamp difference of less than 15 seconds. In other words, a message comes out saying a route has been removed, and less than 15 seconds later another message comes saying the same route is added back to the routing table.

The docs say that frequency is the length of time in seconds that the process waits from the sending out of num-packets packets.

Samples from the log messages:



10 seconds between messages:
Dec 25 05:19:10 fxccsmasa1 : %ASA-6-622001: Removing tracked route 0.0.0.0 0.0.0.0 58.250.205.1, distance 254, table Default-IP-Routing-Table, on interface outsidebk
Dec 25 05:19:20 fxccsmasa1 : %ASA-6-622001: Adding tracked route 0.0.0.0 0.0.0.0 58.250.205.1, distance 254, table Default-IP-Routing-Table, on interface outsidebk


Exactly 15 seconds diff nere:
Jan 12 09:46:38 fxccsmasa1 : %ASA-6-622001: Removing tracked route 0.0.0.0 0.0.0.0 116.6.108.129, distance 1, table Default-IP-Routing-Table, on interface outside
Jan 12 09:46:53 fxccsmasa1 : %ASA-6-622001: Adding tracked route 0.0.0.0 0.0.0.0 116.6.108.129, distance 1, table Default-IP-Routing-Table, on interface outside

10 Seconds difference:
Jan 15 09:27:31 fxccsmasa1 : %ASA-6-622001: Removing tracked route 0.0.0.0 0.0.0.0 58.250.205.1, distance 254, table Default-IP-Routing-Table, on interface outsidebk
Jan 15 09:27:41 fxccsmasa1 : %ASA-6-622001: Adding tracked route 0.0.0.0 0.0.0.0 58.250.205.1, distance 254, table Default-IP-Routing-Table, on interface outsidebk

I haven't tried increasing the timeout, 1500 ms should be plenty, based on the physical distance between the ASA and the target IP.

The goal is to only have the route change when the link is down hard, not just flapping.

Thanks in advance,

John

9 Replies 9

Ajay Saini
Level 7
Level 7

Sla monitoring will only use the criteria of icmp packets to track a route. Interface going down is one of the scenario wherein it can come handy.

If I understood the query. In your requirement. you need the route to change only when the interface is down hard. I would suggest having a multiple routes with adjusted metric values so that there is always a backup route when the primary route goes down. You can create a static route with preferred metric for the interface A and have a lower preferred route for interface B which would kick in if interface A is down hard. 

http://www.cisco.com/c/en/us/td/docs/security/asa/asa-command-reference/I-R/cmdref2/r2.html

HTH

-AJ

I have a route to another ISP, and I see the route out to the internet goes through this other path when primary is removed. What I'm asking is more fundamental to how sla mon actually works. The docs at cisco say (in a not very clear way) that the period defined by 'interval' is the time between sending 'num_packets', yet I'm seeing the deleted route come back in a shorter duration than that set in 'interval'. This makes me wonder how closely the functionality of sla mon actually is to it's documented use.

Changing routes doesn't help me, as it causes all devices at HQ to send to the IP addr associated with the other ISP. This causes disruptions in services.

first off your timeout setting is wrong. It is currently set to 1.5 seconds you need to add one more zero for it to be 15 seconds (15000).  Remember that this is milliseconds. try changing that and then test.

Your understanding is correct that if there is no response within the timeout period the path is considered failed and the default route will be removed.

--

Please remember to select a correct answer and rate helpful posts

--
Please remember to select a correct answer and rate helpful posts

Timeout is how long the process waits to hear back from at least 1 of the num_packets ICMP packets sent.

Frequency is how long between each iteration of sending the packets.

At least, this is what I've gathered from the docs and comments from TAC types.

Also, even though the timeout arguments indicate that valid entries are:

sla-monitor-echo mode commands/options:
  <0-604800000>  Timeout in milliseconds

in practice any number up to 4999 works, if 5000 is entered, it doesn't show up in the show run sla mon output.

So, I still have my doubts about this feature. Looking for clarity.

I stand corrected, timeout will allow numbers > 5000, just not 5000 for some reason. Could be the default?

Yes, 5000 is the default.

If you issue the command show sla monitor configuration you will see it there.

--

Please remember to select a correct answer and rate helpful posts

--
Please remember to select a correct answer and rate helpful posts

It still looks like there are some missing details in the documentation. I show log files that list a route being added back to the routing table in a shorter amount of time than the timeout is configured for. This has to mean that either;

icmp requests are continuously being sent, allowing the route to be reestablished or

the timeout configuration is ignored by the sla mon process.

I've set the timeout to a more robust amount of 6 seconds on my test bed, we'll see if that reduces the number of times the routing table gets changed.

John

Sorry for delayed response. The understanding of all the parameters seem to be correct. I would suggest leaving the timeout and the threshold value at their default values and then test. Both the timeout and threshold default values are 5000ms. You can try setting these values and test it. 

http://www.cisco.com/c/en/us/td/docs/security/asa/asa-command-reference/T-Z/cmdref4/t1.html

-

AJ

Did you ever get this resolved?  I am plagued by this issue in 2 locations.  I am still looking for a good configuration that will stop flapping.

Review Cisco Networking for a $25 gift card