Solved: IP SLA to perform failover, but can't fail back because of NAT entries?

ejensenscs · ‎02-14-2012

I'm trying to set up a router with 2 WANs and use SLA to failover the traffic. There is no static NAT, just the dynamic NAT(pat). I want to have traffic bound for one destination use WAN B and all other internet traffic use WAN A. So the SLA is using ping to hit the gateway on both WANs and failover the traffic from one to the other when it goes down. It works perfectly. The problem is the failback. I plug the 'down' wan back in and the SLA comes up and the routes fail over. When I do the failover of WAN A it works, but doesn't fail back when WAN A comes back up. When I get on the router and run

clear ip nat trans *

everything comes back up right away. This is a 861 router with version c860-universalk9-mz.124-24.T5.bin

What can I do to clear out the NAT automatically?

or Can I put a command in the SLA config to issue that

clear ip nat trans *

command?

track 101 ip sla 1 reachability

!

track 102 ip sla 2 reachability

!

ip route 0.0.0.0 0.0.0.0 100.1.1.1 track 101

ip route 209.209.209.209 255.255.255.255 200.1.1.1 track 102

ip route 0.0.0.0 0.0.0.0 200.1.1.1 10 track 102




ip sla 1

icmp-echo 100.1.1.1 source-ip 100.1.1.2

threshold 3

frequency 5

ip sla schedule 1 start-time now




ip sla 2

icmp-echo 200.1.1.1 source-ip 200.1.1.2

threshold 3

frequency 5

ip sla schedule 2 start-time now

ip sla enable reaction-alerts

lucentmoon · ‎02-14-2012

Your config looks fine. 2 options that i can find

1) change the nat timeout per protocol as you see fit for your network & let traffic naturally failback on its own as the entries timeout. Nat translations only timeout if they arent used (by default 24 hours) so you wont be interrupting any traffic IF your nat timeouts are longer than any protocol/connection keepalives you might have on your network

Pros: nobodies traffic is disconnected since you arent manually clearing the nat table. Cons: you dont have instant failover that you want

see command,

ip nat translation timeout

http://www.cisco.com/en/US/docs/ios/ipaddr/configuration/guide/iadnat_addr_consv_ps6350_TSD_Products_Configuration_Guide_Chapter.html#wp1056211

2) An EEM/Tcl script could be used to automatically execute the command when matching specific criteria. Criteria could be something basic like, When interfaceX goes up/up execute command

clear ip nat trans

I believe there is a running eem/tcl topic somewhere in these cisco forums

Pros: instant failover & You dont have to touch the router for this issue again. Cons: you might interrupt some peoples connections/traffic.

see eem/tcl sections

http://www.cisco.com/en/US/docs/ios/netmgmt/configuration/guide/12_4t/nm_12_4t_book.html

View solution in original post

Michael Couture · ‎02-14-2012

When the link comes back up is the route populated back into the routing table? And what do you have for a NAT statement?

ejensenscs · ‎02-14-2012

The route statements disappear from the

show ip route

output correctly. The traffic routes correctly, the ip SLA is working just fine. It's the NAT part that is sticking....

Everything works after fail or failback as soon as I issue the

clear ip nat trans*

command.

ip nat inside source route-map priISP interface Vlan3 overload

ip nat inside source route-map secISP interface FastEthernet4 overload




route-map priISP permit 10

match ip address 103

match interface Vlan3

!

route-map secISP permit 10

match ip address 103

match interface FastEthernet4




access-list 103 permit ip 192.168.1.0 0.0.0.255 any

rizwanr74 · ‎02-14-2012

Please change this and try...

track 101 ip sla 1 reachability

delay down 10 up 10







ip sla 1

icmp-echo 100.1.1.1 source-ip 100.1.1.2

timeout 20000
threshold 3
frequency 5

ip sla schedule 1 start-time now

----------------------------------------------

Remove this lines...

ip route 209.209.209.209 255.255.255.255 200.1.1.1 track 102

track 102 ip sla 2 reachability




ip sla 2

icmp-echo 200.1.1.1 source-ip 200.1.1.2

threshold 3

frequency 5

ip sla schedule 2 start-time now

ip sla enable reaction-alerts

Let me know the results

thanks

Rizwan Rafeek

ejensenscs · ‎02-14-2012

If I take out the route statement, my traffic goes all on the WAN 1... which defeats my whole purpose of the project.

I need more than just the primary wan failover, I need the second wan to failover to the primary wan if that goes down also. It does work this way too, so it seems possible. My problem seems to be all related to NAT. When I test the failover of each WAN it fails over correctly.

After I fail the secondary wan, I can clear the NAT table and it works. Then I fail it back, links come up, clear the NAT and traffic flows properly again. If I fail the primary WAN the main traffic switches over (after clearing the NAT), the little bit of secondary traffic never stops because that link didn't fail and the NAT doesn't need to be cleared. After the primary comes up, the general traffic fails just like always, until I enter

clear ip nat trans *

and then we're back to square one and everything is working fine.

Can IP SLA issue a

custom

command after an action like packetloss or icmp fail?

Michael Couture · ‎02-14-2012

Check your outside interfaces and make sure IP NAT OUTSIDE is assigned to both of them. I ran a simulation and the only way I was able to replicate the problem you are describing is when I had only one of the outside interfaces configured as an IP NAT outside interface.

ejensenscs · ‎02-14-2012

Yes, NAT outside is on both interfaces. I think everybody is missing the fact that this works in everyway I want it to, but requires a small amount of manual intervention (getting on the console and issuing the

 clear ip nat trans *

command).

If I clear out the existing NAT table, the traffic starts using the next interface just fine, but I have to manually clear out the table. If I bring the NAT timeout to a really small number, I think that would disrupt normal traffic by resetting the connections too frequently when no failover conditions are needed. For example, if you clear out a NAT table, some Remote Desktop connections will disconnect and reconnect. Happens automatically, but if it was constantly happening all day it would be bad. So I feel like the NAT timeout is not the answer.

lucentmoon · ‎02-14-2012

Your config looks fine. 2 options that i can find

1) change the nat timeout per protocol as you see fit for your network & let traffic naturally failback on its own as the entries timeout. Nat translations only timeout if they arent used (by default 24 hours) so you wont be interrupting any traffic IF your nat timeouts are longer than any protocol/connection keepalives you might have on your network

Pros: nobodies traffic is disconnected since you arent manually clearing the nat table. Cons: you dont have instant failover that you want

see command,

ip nat translation timeout

http://www.cisco.com/en/US/docs/ios/ipaddr/configuration/guide/iadnat_addr_consv_ps6350_TSD_Products_Configuration_Guide_Chapter.html#wp1056211

2) An EEM/Tcl script could be used to automatically execute the command when matching specific criteria. Criteria could be something basic like, When interfaceX goes up/up execute command

clear ip nat trans

I believe there is a running eem/tcl topic somewhere in these cisco forums

Pros: instant failover & You dont have to touch the router for this issue again. Cons: you might interrupt some peoples connections/traffic.

see eem/tcl sections

http://www.cisco.com/en/US/docs/ios/netmgmt/configuration/guide/12_4t/nm_12_4t_book.html

ejensenscs · ‎02-15-2012

Nicholas May! Scripting was the answer, I don't think I even knew that existed! I have a working solution, when the SLA takes a route down or out, the NAT table gets cleared and traffic gets reestablished with changed routes. Then when it comes back up, the NAT table gets cleared again and traffic is balanced out again. I used the track state "any" instead of up or down so it runs on an up or a down state change. I'm pretty impressed at how seemless this works now. I think the last tweak will be to take the sensetivity down a bit, allow more missed pings, etc. That way it won't flip flop all day.

Here's the code I used: (needed 2 instances to watch the 2 SLA tracks)

event manager applet clear_nat_1

event track 101 state any

action 0.0 cli command "enable"

action 1.0 cli command "clear ip nat trans *"

action 3.0 syslog msg "WAN failover, cleared NAT"

event manager applet clear_nat_2

event track 102 state any

action 0.0 cli command "enable"

action 1.0 cli command "clear ip nat trans *"

action 3.0 syslog msg "WAN failover, cleared NAT"

!

D00677281 · ‎06-23-2015

When running a constant ping during failover it drops out and doesnt come back unless i issue the

 clear ip nat trans *

statement manually. Is it because the enable account requires a password?

Here is what I have:

track 1 ip sla 1 reachability

ip route 0.0.0.0 0.0.0.0 x.x.x.x track 1
ip route 0.0.0.0 0.0.0.0 x.x.x.x 10

ip sla 1
icmp-echo 4.2.2.2 source-interface GigabitEthernet0/0
frequency 5000
ip sla schedule 1 life forever start-time now

route-map internet-2 permit 10
match ip address internet
match interface GigabitEthernet0/1
!
route-map internet permit 1
match ip address internet
match interface GigabitEthernet0/0


event manager applet NAT_CLEAR
event track 1 state any
action 0.0 cli command "enable"
action 1.0 cli command "clear ip nat trans *"
action 3.0 syslog msg "FAIL OVER JUST OCCURED"

Thanks for all your insight!