cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1092
Views
0
Helpful
8
Replies

IP SLA not recovering after failover (depending on...)

Michael Durham
Level 4
Level 4

Here iis a diagram of my current lab where I am using IP SLA to automatically switch from ISP 1 to ISP 2 should the connection go down (and vice versa)

Current.jpg

My switches are C3550 Layer 3 switches.  Both ISP's do work so connectivity is not the problem.

If I shutdown the fa0/19 port on SW1 the SLA kicks in and changes my defualt route out 10.0.1.0 without a problem.  And when I do a no shut it comes back to tge 192.168.10.0 netowrk just as we would expect.  No problem there.

When I disconnect the ISP 1 cell phone the SLA does switch the defualt route to the 10.0.1.0 netowrk.  Okay, just fine so far.  Here isthe problem, when i reconnect the cell phone the SLA does not come back to the 192.168.10.0 netowrk without first having to delete the SLA and then recresting it (both switches).

Any suggestions?  My complete config files for both switches are attached.

8 Replies 8

Abzal
Level 7
Level 7

Hi,

I looked at your configs. Try to remove one track objects (which is considered as backup) on ip routes on both switches. Something like this:

SW1:

no ip route 0.0.0.0 0.0.0.0 10.0.1.1 track 20

ip route 0.0.0.0 0.0.0.0 10.0.1.1 250

SW2:

no ip route 0.0.0.0 0.0.0.0 192.168.10.2 track 10

ip route 0.0.0.0 0.0.0.0 192.168.10.2 250

In this way in case of failure of primary route second route through neighbor switch will take over. Because backup route has AD 250 it will be put in RIB only in case of failure of primary. Once primary comes online backup route will be deleted from routing table due to higher AD.

Hope it will help.

Best regards,
Abzal

Best regards,
Abzal

Thank you for the suggestion but it is not the answer I am seeking. 

The problem is if the clee phone is just unplugged  get the followiing error:

Server_Switch#sh ip sla stat

Round Trip Time (RTT) for       Index 1

        Latest RTT: NoConnection/Busy/Timeout

Latest operation start time: .00:40:16.418 Eastern Fri Jan 11 2013

Latest operation return code: Timeout

Number of successes: 0

Number of failures: 891

Operation time to live: Forever

Round Trip Time (RTT) for       Index 2

        Latest RTT: 4 ms

Latest operation start time: .00:40:17.294 Eastern Fri Jan 11 2013

Latest operation return code: OK

Number of successes: 368

Number of failures: 0

Operation time to live: Forever

And when I reconnect the cell phone, the error should automaticall go back to OK in Index 1 but it does not on both switches. 

If one of the switches would come back online then your suggestion might help get the second one up too.

Still seeking a suggestion.

Hi Michael,

Is above output from SW1 or SW2?

When ISP 1 cell phone comes back online are you able to ping it and 192.168.42.129 from SW1?

Hope it will help.

Best regards,
Abzal

Best regards,
Abzal

That output was from SW2.  However, it is the same from SW1 even after the cell phone is reconnected. 

Yes, I can ping 192.168.42.129 from both switches even thought the SLA still shows it in timeout.

Please remember, if I enther the following commands before I disconnect the cell phone,

SW1(config)# int fa0/19

SW1(config-if)# shut

and later connect the cell phone back to the ICS box and issue these commands:

SW1(config)# int fa0/19

SW1(config-if)# no shut

SLA recovers correctly and I get the following output on both switches:

Server_Switch#sh ip sla stat    

Round Trip Time (RTT) for       Index 1

        Latest RTT: 1 ms

Latest operation start time: .10:10:03.518 Eastern Fri Jan 11 2013

Latest operation return code: OK

Number of successes: 4

Number of failures: 2673

Operation time to live: Forever

Round Trip Time (RTT) for       Index 2

        Latest RTT: 3 ms

Latest operation start time: .10:10:04.394 Eastern Fri Jan 11 2013

Latest operation return code: OK

Number of successes: 2154

Number of failures: 0

Operation time to live: Forever

If only there were DSL or cable services in my area, I would not have to be dealing with this issue.

(Server_Switch is SW2 and Office_Switch is SW1)

Hi,

I'm not sure but when you'll be ready to test it again try to remove port-security on port F0/19 on SW1. Or when cell phone goes offline and then again comes back online see a status of the port.

SW1 (status of the port):

show int status

int f0/19

no switch port-secur

no switchport port-security violation protect

no switchport port-security mac-address sticky

Hope it will help.

Best regards,
Abzal

Best regards,
Abzal

I tried your suggerstions but they did not help.  Below are my results:

Switch working correctly:

Office_Switch#sh int stat | b /19

FastEthernet0/19

          Switching path    Pkts In   Chars In   Pkts Out  Chars Out

               Processor          0          0       7949     555690

             Route cache          0          0          0          0

                   Total          0          0       7949     555690

Office_Switch#sh ip sla stat

Round Trip Time (RTT) for       Index 1

        Latest RTT: 1 ms

Latest operation start time: .10:52:56.917 Eastern Fri Jan 11 2013

Latest operation return code: OK

Number of successes: 836

Number of failures: 0

Operation time to live: Forever

Round Trip Time (RTT) for       Index 2

        Latest RTT: 8 ms

Latest operation start time: .10:52:56.945 Eastern Fri Jan 11 2013

Latest operation return code: OK

Number of successes: 2638

Number of failures: 1

Operation time to live: Forever

AFTER cell phone is disconnected (NOT by shutting down the port):

FastEthernet0/19

          Switching path    Pkts In   Chars In   Pkts Out  Chars Out

               Processor          0          0       8043     562030

             Route cache          0          0          0          0

                   Total          0          0       8043     562030

Office_Switch#sh ip sla stat    

Round Trip Time (RTT) for       Index 1

        Latest RTT: NoConnection/Busy/Timeout

Latest operation start time: .10:54:26.919 Eastern Fri Jan 11 2013

Latest operation return code: Timeout

Number of successes: 880

Number of failures: 46

Operation time to live: Forever

Round Trip Time (RTT) for       Index 2

        Latest RTT: 12 ms

Latest operation start time: .10:54:26.947 Eastern Fri Jan 11 2013

Latest operation return code: OK

Number of successes: 2728

Number of failures: 1

Operation time to live: Forever

After cell phone re-connected:

FastEthernet0/19

          Switching path    Pkts In   Chars In   Pkts Out  Chars Out

               Processor          0          0       8172     571170

             Route cache          0          0          0          0

                   Total          0          0       8172     571170

Office_Switch#sh ip sla stat    

Round Trip Time (RTT) for       Index 1

        Latest RTT: NoConnection/Busy/Timeout

Latest operation start time: .10:57:38.923 Eastern Fri Jan 11 2013

Latest operation return code: Timeout

Number of successes: 880

Number of failures: 238

Operation time to live: Forever

Round Trip Time (RTT) for       Index 2

        Latest RTT: 4 ms

Latest operation start time: .10:57:39.951 Eastern Fri Jan 11 2013

Latest operation return code: OK

Number of successes: 2921

Number of failures: 1

Operation time to live: Forever

Office_Switch#ping 192.168.42.129

Type escape sequence to abort.

Sending 5, 100-byte ICMP Echos to 192.168.42.129, timeout is 2 seconds:

!!!!!

Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms

Office_Switch#

An hour later still no change.  But as soon as I do the following, it works!

Office_Switch(config)#no ip sla 1

Office_Switch(config)#ip sla 1

Office_Switch(config-ip-sla)# icmp-echo 192.168.42.129 source-ip 192.168.10.1

Office_Switch(config-ip-sla-echo)# timeout 500

Office_Switch(config-ip-sla-echo)# frequency 1

Office_Switch(config-ip-sla-echo)#exit

Office_Switch(config)#ip sla schedule 1 life forever start-time now

Office_Switch(config)#end

Office_Switch#sh ip sla stat

Round Trip Time (RTT) for       Index 1

        Latest RTT: 1 ms

Latest operation start time: .11:59:02.607 Eastern Fri Jan 11 2013

Latest operation return code: OK

Number of successes: 3

Number of failures: 0

Operation time to live: Forever

Round Trip Time (RTT) for       Index 2

        Latest RTT: 7 ms

Latest operation start time: .11:59:03.031 Eastern Fri Jan 11 2013

Latest operation return code: OK

Number of successes: 3000

Number of failures: 5

Operation time to live: Forever

Looks like a bug for me. I've never expirienced such a problem. You may try to upgrade it.

What is the version of IOS you running?

Hope it will help.

Best regards,
Abzal

Best regards,
Abzal

System image file is "flash:c3550-ipservices-mz.122-44.SE6/c3550-ipservices-mz.122-44.SE6.bin"

The latest version from what I have seen.