Solved: Hi, with the Policy Based

matejbernat · ‎03-18-2015

Hello,

I am facing problem with ip sla track mechanism.

I have two ISPs connected to my router C881.

ISP1 = primary (connected to FastEthernet4)
ISP2 = backup (connected to FastEterhet3/Vlan20)

I am using ISP1 as primary ISP and tracking reachability of IP address 8.8.4.4 through ip sla track 200:

!
ip sla 200
 icmp-echo 8.8.4.4
 request-data-size 200
 timeout 3000
 threshold 1000
 owner SYSADMIN
 frequency 5
 history hours-of-statistics-kept 25
 history distributions-of-statistics-kept 20
 history lives-kept 2
 history buckets-kept 60
 history filter all
!
ip sla schedule 200 life forever start-time now
ip sla enable reaction-alerts
!
track 200 ip sla 200 reachability
 delay down 30 up 180
!

Default-route to ISP1 is tracked and second default-route is configured with higher value of metric.
This is how my static routing looks like:

!
ip route 0.0.0.0 0.0.0.0 FastEthernet4 1.1.1.1 name ISP1 track 200
ip route 0.0.0.0 0.0.0.0 Vlan20 2.2.2.2 250 name ISP2
ip route 8.8.4.4 255.255.255.255 FastEthernet4 1.1.1.1 name force-ISP1
ip route 8.8.4.4 255.255.255.255 Null0 250 name deny-via-ISP2
!

It works almost as expected:

- when ISP1 is going down (i mean if 8.8.4.4 becomes unreachable via ISP1), after 30 seconds, default route is pointing to ISP2
- also when ISP1 is going up (8.8.4.4 becomes reachable again via ISP1), after 180 seconds, default route is pointing back to ISP1

*Mar 14 14:09:52.034: %TRACKING-5-STATE: 200 ip sla 200 reachability Up->Down
*Mar 14 14:12:57.039: %TRACKING-5-STATE: 200 ip sla 200 reachability Down->Up

...but

In some cases (I believe that it may be in situation, that ISP1 is down for longer time), ip sla/track is unable to detect that ISP1 becomes UP again and the default route is pointing to ISP2 forever (at least until FastEthernet4 is disconnected/connected again, or shut/no shut command is applied).

*Mar 17 14:18:13.019: %TRACKING-5-STATE: 200 ip sla 200 reachability Up->Down

This is how some show command outputs looks like:

ROUTER-MD#show ip route static
     8.0.0.0/32 is subnetted, 2 subnets
S       8.8.4.4 [1/0] via 1.1.1.1, FastEthernet4
S*   0.0.0.0/0 [250/0] via 2.2.2.2, Vlan20

ROUTER-MD#show ip sla statistics 200 details
IPSLAs Latest Operation Statistics

IPSLA operation id: 200
        Latest RTT: NoConnection/Busy/Timeout
Latest operation start time: *12:17:51.494 MET Wed Mar 18 2015
Latest operation return code: Timeout
Over thresholds occurred: FALSE
Number of successes: 0
Number of failures: 31
Operation time to live: Forever
Operational state of entry: Active
Last time this entry was reset: Never

ROUTER-MD#show track 200
Track 200
  IP SLA 200 reachability
  Reachability is Down
    42 changes, last change 22:00:06
  Delay up 180 secs, down 30 secs
  Latest operation return code: Timeout
  Tracked by:
    STATIC-IP-ROUTING 0

But as you can see here, 8.8.4.4 is reachable from the router:

ROUTER-MD#show ip route 8.8.4.4
Routing entry for 8.8.4.4/32
  Known via "static", distance 1, metric 0
  Routing Descriptor Blocks:
  * 1.1.1.1, via FastEthernet4
      Route metric is 0, traffic share count is 1

ROUTER-MD#ping 8.8.4.4
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 8.8.4.4, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 40/41/44 ms

During that behavior, I see no icmp traffic destined to 8.8.4.4 with "debug ip icmp" command enabled.

Debug IP sla & track results are here:

ROUTER-MD#show debug
Track debugging is on
IP SLAs:
  TRACE debugging is on for entries:
    200
  ERROR debugging is on for entries:
    200

*Mar 18 12:40:16.530: IP SLAs(200) Scheduler: saaSchedulerEventWakeup
*Mar 18 12:40:16.530: IP SLAs(200) Scheduler: Starting an operation
*Mar 18 12:40:16.530: IP SLAs(200) echo operation: Sending an echo operation - destAddr=8.8.4.4, sAddr=1.1.1.2
*Mar 18 12:40:16.530: IP SLAs(200) echo operation: Sending ID: 27
*Mar 18 12:40:19.530: IP SLAs(200) echo operation: Timeout - destAddr=8.8.4.4, sAddr=1.1.1.2
*Mar 18 12:40:19.530: IP SLAs(200) Scheduler: Updating result
*Mar 18 12:40:19.530: IP SLAs(200) Scheduler: start wakeup timer, delay = 2000

*Mar 18 12:40:21.530: IP SLAs(200) Scheduler: saaSchedulerEventWakeup
*Mar 18 12:40:21.530: IP SLAs(200) Scheduler: Starting an operation
*Mar 18 12:40:21.530: IP SLAs(200) echo operation: Sending an echo operation - destAddr=8.8.4.4, sAddr=1.1.1.2
*Mar 18 12:40:21.530: IP SLAs(200) echo operation: Sending ID: 27
*Mar 18 12:40:24.530: IP SLAs(200) echo operation: Timeout - destAddr=8.8.4.4, sAddr=1.1.1.2
*Mar 18 12:40:24.530: IP SLAs(200) Scheduler: Updating result
*Mar 18 12:40:24.530: IP SLAs(200) Scheduler: start wakeup timer, delay = 2000

...etc

I would appreciate any help.

Thank you,

MB

marioderosa2008 · ‎05-05-2015

Right,

i believe you are hitting bug CSCso46681 "timeout issue on ip sla " which although it does not list your IOS version as a known affected version, it's not listed as a known fixed version either.

12.4(22)t1 is listed as a known fixed version. If you can upgrade to that and retest then let us know how you get on?

Mario

View solution in original post

matejbernat · ‎03-25-2015

No ideas? :-(

Dan Frey · ‎03-25-2015

As long as FA4 is up the static route to 8.8.4.4 will remain in the table. I believe you should only have the primary and backup default route and remove both static routes to 8.8.4.4. If you have a static route in the table pointing to 8.8.4.4 -> null0 during failover then IPSLA will never be able to come back up (track 200 will never be able to transition from down ->up with this static route).

Source the IPSLA from Interface FA4 instead of the static routes pointing directly to 8.8.4.4.

matejbernat · ‎03-26-2015

>> If you have a static route in the table pointing to 8.8.4.4 -> null0 during failover then IPSLA

>> will never be able to come back up (track 200 will never be able to transition from down ->up
>> with this static route).

I don't understand this...

With my configuration (described above), RT pointing 8.8.4.4 -> Null0 during failover (only if FA4 goes down), but immediately as FA4 goes up again, RT pointing 8.8.4.4 -> ISP1 and track 200 should be able to transition from down->up. As you can see above, 8.8.4.4 is reachable through "ping" from router...

Static route 8.8.4.4 is pointing to Null0 with higher metric, because in case FA4 will go down, 8.8.4.4 becomes reachable via ISP2 and I have no source interface in IPSLA.

>> Source the IPSLA from Interface FA4 instead of the static routes pointing directly to 8.8.4.4.

My initial configuration was almost as you are describing - two default routes + IPSLA sourced from int FA4.
But in addition, there was also 8.8.4.4/32 static route pointing to ISP1's default-gw.
The same problem occurred with such configuration.

Do you think that 8.8.4.4/32 route to ISP1 was source of my problems in initial configuration?

marioderosa2008 · ‎03-26-2015

I would say to try and track an address only reachable through ISP1 like the WAN interface IP of your ISP1 router.

then you can have one static route for that on your 881 router only and no NULL route.

that way if wan goes down on isp1, then that ip is definately unreachable.

incase fa0/4 goes down instead so WAN ip is still available through ISP 2, just disable ICMP on ISP1 router with the source of ISP2

marioderosa2008 · ‎03-26-2015

infact mattp0002 already suggested a similar more tidy setup... i would try that

ian.m.covington · ‎04-28-2015

This was going to be my suggestion. Have the SLA target an IP directly related to connectivity to ISP1. Source it that way too. This will keep distant connectivity issues from causing unrelated local topology changes when ISP1's connectivity might be fine. The main point of an SLA is to ensure a level of service is being maintained. Tracking a local interface is also an option for failover. An SLA can still perform measurements and failover if needed. Toss the SLA track and the interface track into a track list and tune everything as needed in order to provide network stability and routing accuracy.

Ian

matejbernat · ‎05-05-2015

SLA target IP is not related direct to connectivity to ISP1, but is related to reachabiliity that IP through ISP1.

The goal is to check, if the remote server (with no relation to ISP1 nor ISP2 = somewhere in public internet) can be reached through ISP1. If this server becomes unreachable, I need to switch traffic to ISP2, but when it is reachable (through ISP1) again, I need to switch traffic back to ISP1.

I can simple handle outage, when ethernet link/port goes down, but I don't know better mechanism than IP SLA, how to deal with outages deeper - in ISP1 network.

Point is that my configuration is basically working and switching from ISP1 to ISP2 and back to ISP1 is OK, but in some cases (and I can't understand in which exactly), switchover from ISP2 back to ISP1 fails because IP SLA can not reach the target IP. But at the same time, target IP can be reached manually (tested with PING) from the same router, through the ISP1.

It looks like some software bug - see last post from marioderosa2008 .

mattp0002 · ‎03-26-2015

I do almost the same thing however instead of blackhole-ing that route via the other path using a route to null0 (which is risky as you've seen) I'm simply using an outbound ACL to block icmp echo requests on the egress interface pointing towards that other ISP - and the ACL is locked down to the source of the IP SLA.

You can hard code the ping source by changing

 icmp-echo 8.8.4.4

to like icmp-echo 8.8.4.4 source-ip 1.2.3.4 (whatever IP is on the router maybe a loopback or etc)

matejbernat · ‎03-27-2015

Ok. I will try use source-ip instead of source-interface and to remove 8.8.4.4 blackholing, like this:

conf t
!
no ip route 8.8.4.4 255.255.255.255 Null0 250 name deny-via-ISP2
no ip sla schedule 200 life forever start-time now
no ip sla 200
!
ip sla 200
 icmp-echo 8.8.4.4 source-ip 1.1.1.2
 !...etc
!
ip sla schedule 200 life forever start-time now
!

Hope it will work :-)

It may takes several days to get results, but I let know here.

Thanks for you suggestions.

matejbernat · ‎04-24-2015

My problem still occurs :-(
Configuration was changed as you described, but right now, I have the same problem...

ROUTER#show ip sla statistics
IPSLAs Latest Operation Statistics

IPSLA operation id: 200
        Latest RTT: NoConnection/Busy/Timeout
Latest operation start time: *10:54:50.747 METDST Fri Apr 24 2015
Latest operation return code: Timeout
Number of successes: 0
Number of failures: 117
Operation time to live: Forever

ROUTER#show track 200
Track 200
  IP SLA 200 reachability
  Reachability is Down
    60 changes, last change 16:09:07
  Delay up 180 secs, down 30 secs
  Latest operation return code: Timeout
  Tracked by:
    STATIC-IP-ROUTING 0

IP SLA sees IP address down, so the track has changed default-route to ISP2
...however, IP address is reachable through ISP1:

ROUTER#ping 8.8.4.4 source fastEthernet 4
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 8.8.4.4, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.2
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 40/41/44 ms

I don`t understand that.. Any ideas?

marioderosa2008 · ‎04-24-2015

Hi,

when ISP 1 is down, is the static route to 8.8.4.4 via 1.1.1.1 still in the routing table?

Are you sure that reach ability to 8.8.4.4 is actually going through ISP2?

have you applied ACL denying ICMP destined to 8.8.4.4 through ISP2 to make sure that 8.8.4.4 is not pingable through ISP2?

thanks

Mario

matejbernat · ‎04-27-2015

Hi,

>>when ISP 1 is down, is the static route to 8.8.4.4 via 1.1.1.1 still in the routing table?

Unfortunately I can not catch the situation, when ISP1 is down. Now the ISP1 is UP.
But there can be two situations regarding this configuration:

ip route 8.8.4.4 255.255.255.255 FastEthernet4 1.1.1.1 name force-ISP1

1. If FE4 goes down, static route is removed from the routing table.
2. If FE4 remains up (but connection to 8.8.4.4 is broken within ISP1 network), static route is still in the routing table.

As I can see in logs, FE4 was not down, so route to 8.8.4.4 via ISP1 was in RT all the time.

>> Are you sure that reach ability to 8.8.4.4 is actually going through ISP2?

No, reach ability to 8.8.4.4 is actually going through ISP1 as configured:

S       8.8.4.4 [1/0] via 1.1.1.1, FastEthernet4

ROUTER#ping 8.8.4.4 source fastEthernet 4
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 8.8.4.4, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.2
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 40/40/44 ms

, my problem is that ip sla is somehow not seeing this:

ROUTER#show ip sla statistics
IPSLAs Latest Operation Statistics

IPSLA operation id: 200
        Latest RTT: NoConnection/Busy/Timeout
Latest operation start time: *09:48:42.553 METDST Mon Apr 27 2015
Latest operation return code: Timeout
Number of successes: 0
Number of failures: 42
Operation time to live: Forever

>> have you applied ACL denying ICMP destined to 8.8.4.4 through ISP2 to make sure that 8.8.4.4 is not pingable through ISP2?

No... I have applied more specific static route to 8.8.4.4 via ISP1.
Besides of that, I have applied source-ip command under the ip sla configuration:

ip sla 200
 icmp-echo 8.8.4.4 source-ip 1.1.1.2

Sure, I can try to deny icmp to 8.8.4.4 through ISP2 as third action, and we will see...

What will be better from your point of view? To use ACL as you mentioned, or to use "ip local policy route-map" as pille1234 mentioned...? Maybe both, to be 100% sure?

marioderosa2008 · ‎04-27-2015

Hi,

with the Policy Based Routing option, I am not sure which inbound interface you would apply this to as to my knowledge, this only applies to packets entering the router on the interface that you apply the policy.

Since you are sourcing ICMP with 1.1.1.2 which is the IP of Fa0/4, i am not sure whether the policy will apply. But i would try it and find out in a lab or something.

This is looking like a bug though to be honest. If your IP SLA is not recognising that 8.8.4.4 is reachable again, then it does sound like a bug.

Can you show debug of the routing table and IP SLA when ISP1 has a failure?

Just to refresh my memory, can you post the running config of the relevant bits so we know what the config looks like as it stands today?

thanks

Mario

matejbernat · ‎04-28-2015

Current relevant configuration:

!
track 200 ip sla 200 reachability
 delay down 30 up 180
!
interface FastEthernet3
 description UPLINK-ISP2
 switchport access vlan 20
!
!
interface FastEthernet4
 description UPLINK-ISP1
 ip address 1.1.1.2 255.255.254.0
!
!
interface Vlan20
 description ISP2
 ip address 2.2.2.2 255.255.255.248
!
ip route 0.0.0.0 0.0.0.0 FastEthernet4 1.1.1.1 name ISP1 track 200
ip route 0.0.0.0 0.0.0.0 Vlan20 2.2.2.1 250 name ISP2
ip route 8.8.4.4 255.255.255.255 FastEthernet4 1.1.1.1 name force-ISP1
!
ip sla logging traps
ip sla 200
 icmp-echo 8.8.4.4 source-ip 131.186.118.224
 request-data-size 200
 timeout 3000
 threshold 1000
 owner SYSADMIN
 frequency 5
 history hours-of-statistics-kept 25
 history distributions-of-statistics-kept 20
 history lives-kept 2
 history buckets-kept 60
 history filter all
ip sla schedule 200 life forever start-time now
ip sla enable reaction-alerts
!

strange thing...
I have no idea what is the ip address 131.186.118.224 under ip sla 200 command.
Source-IP 1.1.1.2 was configured as following:

ROUTER(config)#ip sla 200
ROUTER(config-ip-sla)#icmp-echo 8.8.4.4 source-ip 1.1.1.2

but in show run, it looks like this:

ROUTER# show run | inc ip sla 200|source-ip
track 200 ip sla 200 reachability
ip sla 200
 icmp-echo 8.8.4.4 source-ip 131.186.118.224

Note that 1.1.1.x and 2.2.2.x are not real IPs - they were changed for simplifying configuration here, but 131.186.118.224 is completely different from original IP address (replaced by 1.1.1.2 in this forum).

As I unconfigured ip sla and then configured it again (few minutes ago), ip sla now can see 8.8.4.4 reachable via ISP1. This is correct but it was done manually, by resetting of ip sla / ip track mechanism, not by ip sla mechanism itself.

IP SLA TRACK issue