03-18-2015
04:54 AM
- last edited on
03-25-2019
03:43 PM
by
ciscomoderator
Hello,
I am facing problem with ip sla track mechanism.
I have two ISPs connected to my router C881.
I am using ISP1 as primary ISP and tracking reachability of IP address 8.8.4.4 through ip sla track 200:
! ip sla 200 icmp-echo 8.8.4.4 request-data-size 200 timeout 3000 threshold 1000 owner SYSADMIN frequency 5 history hours-of-statistics-kept 25 history distributions-of-statistics-kept 20 history lives-kept 2 history buckets-kept 60 history filter all ! ip sla schedule 200 life forever start-time now ip sla enable reaction-alerts ! track 200 ip sla 200 reachability delay down 30 up 180 !
Default-route to ISP1 is tracked and second default-route is configured with higher value of metric.
This is how my static routing looks like:
! ip route 0.0.0.0 0.0.0.0 FastEthernet4 1.1.1.1 name ISP1 track 200 ip route 0.0.0.0 0.0.0.0 Vlan20 2.2.2.2 250 name ISP2 ip route 8.8.4.4 255.255.255.255 FastEthernet4 1.1.1.1 name force-ISP1 ip route 8.8.4.4 255.255.255.255 Null0 250 name deny-via-ISP2 !
It works almost as expected:
- when ISP1 is going down (i mean if 8.8.4.4 becomes unreachable via ISP1), after 30 seconds, default route is pointing to ISP2
- also when ISP1 is going up (8.8.4.4 becomes reachable again via ISP1), after 180 seconds, default route is pointing back to ISP1
*Mar 14 14:09:52.034: %TRACKING-5-STATE: 200 ip sla 200 reachability Up->Down *Mar 14 14:12:57.039: %TRACKING-5-STATE: 200 ip sla 200 reachability Down->Up
...but
In some cases (I believe that it may be in situation, that ISP1 is down for longer time), ip sla/track is unable to detect that ISP1 becomes UP again and the default route is pointing to ISP2 forever (at least until FastEthernet4 is disconnected/connected again, or shut/no shut command is applied).
*Mar 17 14:18:13.019: %TRACKING-5-STATE: 200 ip sla 200 reachability Up->Down
This is how some show command outputs looks like:
ROUTER-MD#show ip route static 8.0.0.0/32 is subnetted, 2 subnets S 8.8.4.4 [1/0] via 1.1.1.1, FastEthernet4 S* 0.0.0.0/0 [250/0] via 2.2.2.2, Vlan20 ROUTER-MD#show ip sla statistics 200 details IPSLAs Latest Operation Statistics IPSLA operation id: 200 Latest RTT: NoConnection/Busy/Timeout Latest operation start time: *12:17:51.494 MET Wed Mar 18 2015 Latest operation return code: Timeout Over thresholds occurred: FALSE Number of successes: 0 Number of failures: 31 Operation time to live: Forever Operational state of entry: Active Last time this entry was reset: Never ROUTER-MD#show track 200 Track 200 IP SLA 200 reachability Reachability is Down 42 changes, last change 22:00:06 Delay up 180 secs, down 30 secs Latest operation return code: Timeout Tracked by: STATIC-IP-ROUTING 0
But as you can see here, 8.8.4.4 is reachable from the router:
ROUTER-MD#show ip route 8.8.4.4 Routing entry for 8.8.4.4/32 Known via "static", distance 1, metric 0 Routing Descriptor Blocks: * 1.1.1.1, via FastEthernet4 Route metric is 0, traffic share count is 1 ROUTER-MD#ping 8.8.4.4 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 8.8.4.4, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 40/41/44 ms
During that behavior, I see no icmp traffic destined to 8.8.4.4 with "debug ip icmp" command enabled.
Debug IP sla & track results are here:
ROUTER-MD#show debug Track debugging is on IP SLAs: TRACE debugging is on for entries: 200 ERROR debugging is on for entries: 200 *Mar 18 12:40:16.530: IP SLAs(200) Scheduler: saaSchedulerEventWakeup *Mar 18 12:40:16.530: IP SLAs(200) Scheduler: Starting an operation *Mar 18 12:40:16.530: IP SLAs(200) echo operation: Sending an echo operation - destAddr=8.8.4.4, sAddr=1.1.1.2 *Mar 18 12:40:16.530: IP SLAs(200) echo operation: Sending ID: 27 *Mar 18 12:40:19.530: IP SLAs(200) echo operation: Timeout - destAddr=8.8.4.4, sAddr=1.1.1.2 *Mar 18 12:40:19.530: IP SLAs(200) Scheduler: Updating result *Mar 18 12:40:19.530: IP SLAs(200) Scheduler: start wakeup timer, delay = 2000 *Mar 18 12:40:21.530: IP SLAs(200) Scheduler: saaSchedulerEventWakeup *Mar 18 12:40:21.530: IP SLAs(200) Scheduler: Starting an operation *Mar 18 12:40:21.530: IP SLAs(200) echo operation: Sending an echo operation - destAddr=8.8.4.4, sAddr=1.1.1.2 *Mar 18 12:40:21.530: IP SLAs(200) echo operation: Sending ID: 27 *Mar 18 12:40:24.530: IP SLAs(200) echo operation: Timeout - destAddr=8.8.4.4, sAddr=1.1.1.2 *Mar 18 12:40:24.530: IP SLAs(200) Scheduler: Updating result *Mar 18 12:40:24.530: IP SLAs(200) Scheduler: start wakeup timer, delay = 2000 ...etc
I would appreciate any help.
Thank you,
MB
Solved! Go to Solution.
05-05-2015 01:25 PM
Right,
i believe you are hitting bug CSCso46681 "timeout issue on ip sla " which although it does not list your IOS version as a known affected version, it's not listed as a known fixed version either.
12.4(22)t1 is listed as a known fixed version. If you can upgrade to that and retest then let us know how you get on?
Mario
03-25-2015 01:22 AM
No ideas? :-(
03-25-2015 01:44 PM
As long as FA4 is up the static route to 8.8.4.4 will remain in the table. I believe you should only have the primary and backup default route and remove both static routes to 8.8.4.4. If you have a static route in the table pointing to 8.8.4.4 -> null0 during failover then IPSLA will never be able to come back up (track 200 will never be able to transition from down ->up with this static route).
Source the IPSLA from Interface FA4 instead of the static routes pointing directly to 8.8.4.4.
03-26-2015 02:08 AM
>> If you have a static route in the table pointing to 8.8.4.4 -> null0 during failover then IPSLA
>> will never be able to come back up (track 200 will never be able to transition from down ->up
>> with this static route).
I don't understand this...
With my configuration (described above), RT pointing 8.8.4.4 -> Null0 during failover (only if FA4 goes down), but immediately as FA4 goes up again, RT pointing 8.8.4.4 -> ISP1 and track 200 should be able to transition from down->up. As you can see above, 8.8.4.4 is reachable through "ping" from router...
Static route 8.8.4.4 is pointing to Null0 with higher metric, because in case FA4 will go down, 8.8.4.4 becomes reachable via ISP2 and I have no source interface in IPSLA.
>> Source the IPSLA from Interface FA4 instead of the static routes pointing directly to 8.8.4.4.
My initial configuration was almost as you are describing - two default routes + IPSLA sourced from int FA4.
But in addition, there was also 8.8.4.4/32 static route pointing to ISP1's default-gw.
The same problem occurred with such configuration.
Do you think that 8.8.4.4/32 route to ISP1 was source of my problems in initial configuration?
03-26-2015 11:34 AM
I would say to try and track an address only reachable through ISP1 like the WAN interface IP of your ISP1 router.
then you can have one static route for that on your 881 router only and no NULL route.
that way if wan goes down on isp1, then that ip is definately unreachable.
incase fa0/4 goes down instead so WAN ip is still available through ISP 2, just disable ICMP on ISP1 router with the source of ISP2
03-26-2015 11:41 AM
infact mattp0002 already suggested a similar more tidy setup... i would try that
04-28-2015 02:42 PM
This was going to be my suggestion. Have the SLA target an IP directly related to connectivity to ISP1. Source it that way too. This will keep distant connectivity issues from causing unrelated local topology changes when ISP1's connectivity might be fine. The main point of an SLA is to ensure a level of service is being maintained. Tracking a local interface is also an option for failover. An SLA can still perform measurements and failover if needed. Toss the SLA track and the interface track into a track list and tune everything as needed in order to provide network stability and routing accuracy.
Ian
05-05-2015 07:55 AM
SLA target IP is not related direct to connectivity to ISP1, but is related to reachabiliity that IP through ISP1.
The goal is to check, if the remote server (with no relation to ISP1 nor ISP2 = somewhere in public internet) can be reached through ISP1. If this server becomes unreachable, I need to switch traffic to ISP2, but when it is reachable (through ISP1) again, I need to switch traffic back to ISP1.
I can simple handle outage, when ethernet link/port goes down, but I don't know better mechanism than IP SLA, how to deal with outages deeper - in ISP1 network.
Point is that my configuration is basically working and switching from ISP1 to ISP2 and back to ISP1 is OK, but in some cases (and I can't understand in which exactly), switchover from ISP2 back to ISP1 fails because IP SLA can not reach the target IP. But at the same time, target IP can be reached manually (tested with PING) from the same router, through the ISP1.
It looks like some software bug - see last post from marioderosa2008 .
03-26-2015 08:39 AM
I do almost the same thing however instead of blackhole-ing that route via the other path using a route to null0 (which is risky as you've seen) I'm simply using an outbound ACL to block icmp echo requests on the egress interface pointing towards that other ISP - and the ACL is locked down to the source of the IP SLA.
You can hard code the ping source by changing
icmp-echo 8.8.4.4
to like icmp-echo 8.8.4.4 source-ip 1.2.3.4 (whatever IP is on the router maybe a loopback or etc)
03-27-2015 02:49 AM
Ok. I will try use source-ip instead of source-interface and to remove 8.8.4.4 blackholing, like this:
conf t ! no ip route 8.8.4.4 255.255.255.255 Null0 250 name deny-via-ISP2 no ip sla schedule 200 life forever start-time now no ip sla 200 ! ip sla 200 icmp-echo 8.8.4.4 source-ip 1.1.1.2 !...etc ! ip sla schedule 200 life forever start-time now !
Hope it will work :-)
It may takes several days to get results, but I let know here.
Thanks for you suggestions.
04-24-2015 01:57 AM
My problem still occurs :-(
Configuration was changed as you described, but right now, I have the same problem...
ROUTER#show ip sla statistics IPSLAs Latest Operation Statistics IPSLA operation id: 200 Latest RTT: NoConnection/Busy/Timeout Latest operation start time: *10:54:50.747 METDST Fri Apr 24 2015 Latest operation return code: Timeout Number of successes: 0 Number of failures: 117 Operation time to live: Forever
ROUTER#show track 200 Track 200 IP SLA 200 reachability Reachability is Down 60 changes, last change 16:09:07 Delay up 180 secs, down 30 secs Latest operation return code: Timeout Tracked by: STATIC-IP-ROUTING 0
IP SLA sees IP address down, so the track has changed default-route to ISP2
...however, IP address is reachable through ISP1:
ROUTER#ping 8.8.4.4 source fastEthernet 4 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 8.8.4.4, timeout is 2 seconds: Packet sent with a source address of 1.1.1.2 !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 40/41/44 ms
I don`t understand that.. Any ideas?
04-24-2015 10:10 AM
Hi,
when ISP 1 is down, is the static route to 8.8.4.4 via 1.1.1.1 still in the routing table?
Are you sure that reach ability to 8.8.4.4 is actually going through ISP2?
have you applied ACL denying ICMP destined to 8.8.4.4 through ISP2 to make sure that 8.8.4.4 is not pingable through ISP2?
thanks
Mario
04-27-2015 01:01 AM
Hi,
>>when ISP 1 is down, is the static route to 8.8.4.4 via 1.1.1.1 still in the routing table?
Unfortunately I can not catch the situation, when ISP1 is down. Now the ISP1 is UP.
But there can be two situations regarding this configuration:
ip route 8.8.4.4 255.255.255.255 FastEthernet4 1.1.1.1 name force-ISP1
1. If FE4 goes down, static route is removed from the routing table.
2. If FE4 remains up (but connection to 8.8.4.4 is broken within ISP1 network), static route is still in the routing table.
As I can see in logs, FE4 was not down, so route to 8.8.4.4 via ISP1 was in RT all the time.
>> Are you sure that reach ability to 8.8.4.4 is actually going through ISP2?
No, reach ability to 8.8.4.4 is actually going through ISP1 as configured:
S 8.8.4.4 [1/0] via 1.1.1.1, FastEthernet4 ROUTER#ping 8.8.4.4 source fastEthernet 4 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 8.8.4.4, timeout is 2 seconds: Packet sent with a source address of 1.1.1.2 !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 40/40/44 ms
, my problem is that ip sla is somehow not seeing this:
ROUTER#show ip sla statistics IPSLAs Latest Operation Statistics IPSLA operation id: 200 Latest RTT: NoConnection/Busy/Timeout Latest operation start time: *09:48:42.553 METDST Mon Apr 27 2015 Latest operation return code: Timeout Number of successes: 0 Number of failures: 42 Operation time to live: Forever
>> have you applied ACL denying ICMP destined to 8.8.4.4 through ISP2 to make sure that 8.8.4.4 is not pingable through ISP2?
No... I have applied more specific static route to 8.8.4.4 via ISP1.
Besides of that, I have applied source-ip command under the ip sla configuration:
ip sla 200 icmp-echo 8.8.4.4 source-ip 1.1.1.2
Sure, I can try to deny icmp to 8.8.4.4 through ISP2 as third action, and we will see...
What will be better from your point of view? To use ACL as you mentioned, or to use "ip local policy route-map" as pille1234 mentioned...? Maybe both, to be 100% sure?
04-27-2015 04:57 AM
Hi,
with the Policy Based Routing option, I am not sure which inbound interface you would apply this to as to my knowledge, this only applies to packets entering the router on the interface that you apply the policy.
Since you are sourcing ICMP with 1.1.1.2 which is the IP of Fa0/4, i am not sure whether the policy will apply. But i would try it and find out in a lab or something.
This is looking like a bug though to be honest. If your IP SLA is not recognising that 8.8.4.4 is reachable again, then it does sound like a bug.
Can you show debug of the routing table and IP SLA when ISP1 has a failure?
Just to refresh my memory, can you post the running config of the relevant bits so we know what the config looks like as it stands today?
thanks
Mario
04-28-2015 06:11 AM
Current relevant configuration:
!
track 200 ip sla 200 reachability
delay down 30 up 180
!
interface FastEthernet3
description UPLINK-ISP2
switchport access vlan 20
!
!
interface FastEthernet4
description UPLINK-ISP1
ip address 1.1.1.2 255.255.254.0
!
!
interface Vlan20
description ISP2
ip address 2.2.2.2 255.255.255.248
!
ip route 0.0.0.0 0.0.0.0 FastEthernet4 1.1.1.1 name ISP1 track 200
ip route 0.0.0.0 0.0.0.0 Vlan20 2.2.2.1 250 name ISP2
ip route 8.8.4.4 255.255.255.255 FastEthernet4 1.1.1.1 name force-ISP1
!
ip sla logging traps
ip sla 200
icmp-echo 8.8.4.4 source-ip 131.186.118.224
request-data-size 200
timeout 3000
threshold 1000
owner SYSADMIN
frequency 5
history hours-of-statistics-kept 25
history distributions-of-statistics-kept 20
history lives-kept 2
history buckets-kept 60
history filter all
ip sla schedule 200 life forever start-time now
ip sla enable reaction-alerts
!
strange thing...
I have no idea what is the ip address 131.186.118.224 under ip sla 200 command.
Source-IP 1.1.1.2 was configured as following:
ROUTER(config)#ip sla 200
ROUTER(config-ip-sla)#icmp-echo 8.8.4.4 source-ip 1.1.1.2
but in show run, it looks like this:
ROUTER# show run | inc ip sla 200|source-ip
track 200 ip sla 200 reachability
ip sla 200
icmp-echo 8.8.4.4 source-ip 131.186.118.224
Note that 1.1.1.x and 2.2.2.x are not real IPs - they were changed for simplifying configuration here, but 131.186.118.224 is completely different from original IP address (replaced by 1.1.1.2 in this forum).
As I unconfigured ip sla and then configured it again (few minutes ago), ip sla now can see 8.8.4.4 reachable via ISP1. This is correct but it was done manually, by resetting of ip sla / ip track mechanism, not by ip sla mechanism itself.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide