06-03-2010 04:27 PM - edited 03-06-2019 11:25 AM
Hello,
I am trying to use policy routing in a setup to basically override the routing table and maintain constant connectivity across a network topology. My topology is like this:
server1---router1---wireless link A---router2---server2
\---wireless link B---/
That is, two routers are connected to each other via two wireless links, and on the "lan" side of each router sits a server.
Between the two routers I am running RIP with the timers extremely shortened so failover between the links happens as fast as possible. RIP is set up to prefer link A over link B by using distance commands (see router configs below).
Between each server and router I have set up two GRE tunnels which terminate on each outside router interface. On the tunnel interfaces in the router, I have applied a route policy that will match any packet coming in on that tunnel and then set the next hop of the router interface on the opposite end of the wireless link, as well as set the interface as the outgoing interface (for good measure).
The goal behind this set up is to have traffic sourced from one GRE tunnel on the server to use link A, and traffic sourced from the other GRE tunnel to use link B. My application uses GRE tunnels in this way to balance traffic between the two links.
My problem with this setup is if the primary link (that is, the route in the routing table) goes down (for example, by me shutting down the interface that plugs into the radio), traffic will not pass over the secondary link until the routing protocol adapts. This seems strange to me, since I thought the policy on the interface would override the routing table and send it regardless. If I do the opposite (i.e. drop the secondary link when the primary is up), traffic running across the primary is unaffected by the routing changes.
I can easily illustrate this by turning on debug ip policy and debug ip routing on both routers and, from each server, ping the opposite server IP with the source interface of the GRE tunnel. The pings look like this
Server 1:
tunnel1 (connected to link A) pings:
...
1450 bytes from 172.16.250.2: icmp_seq=25 ttl=62 time=6.02 ms
1450 bytes from 172.16.250.2: icmp_seq=26 ttl=62 time=7.36 ms
1450 bytes from 172.16.250.2: icmp_seq=27 ttl=62 time=4.21 ms
1450 bytes from 172.16.250.2: icmp_seq=28 ttl=62 time=3.80 ms
1450 bytes from 172.16.250.2: icmp_seq=29 ttl=62 time=4.40 ms
1450 bytes from 172.16.250.2: icmp_seq=30 ttl=62 time=3.99 ms
1450 bytes from 172.16.250.2: icmp_seq=31 ttl=62 time=6.09 ms
(here is where I disconnected the interface on the opposite end)
tunnel2 (connected to link B) pings:
...
1450 bytes from 172.16.250.2: icmp_seq=25 ttl=62 time=4.36 ms
1450 bytes from 172.16.250.2: icmp_seq=26 ttl=62 time=4.70 ms
1450 bytes from 172.16.250.2: icmp_seq=27 ttl=62 time=3.80 ms
1450 bytes from 172.16.250.2: icmp_seq=28 ttl=62 time=5.39 ms
1450 bytes from 172.16.250.2: icmp_seq=29 ttl=62 time=4.99 ms
(here is where I disconnected the interface, there is a small "blip")
1450 bytes from 172.16.250.2: icmp_seq=32 ttl=62 time=5.52 ms
1450 bytes from 172.16.250.2: icmp_seq=33 ttl=62 time=5.87 ms
1450 bytes from 172.16.250.2: icmp_seq=34 ttl=62 time=5.96 ms
1450 bytes from 172.16.250.2: icmp_seq=35 ttl=62 time=7.05 ms
Server 2:
tunnel 1 (connected to link A) pings:
...
1450 bytes from 172.16.254.200: icmp_seq=32 ttl=62 time=4.41 ms
1450 bytes from 172.16.254.200: icmp_seq=33 ttl=62 time=3.87 ms
1450 bytes from 172.16.254.200: icmp_seq=34 ttl=62 time=4.82 ms
1450 bytes from 172.16.254.200: icmp_seq=35 ttl=62 time=4.77 ms
1450 bytes from 172.16.254.200: icmp_seq=36 ttl=62 time=4.47 ms
1450 bytes from 172.16.254.200: icmp_seq=37 ttl=62 time=5.43 ms
(here is where I disconnected the link on the router on this side)
tunnel 2 (connected to link B) pings:
...
1450 bytes from 172.16.254.200: icmp_seq=31 ttl=62 time=6.45 ms
1450 bytes from 172.16.254.200: icmp_seq=32 ttl=62 time=7.65 ms
1450 bytes from 172.16.254.200: icmp_seq=33 ttl=62 time=3.86 ms
1450 bytes from 172.16.254.200: icmp_seq=34 ttl=62 time=5.31 ms
(here is where I disconnected the primary link... note the huge gap in icmp_seq... with one second pings this is a LONG time)
1450 bytes from 172.16.254.200: icmp_seq=57 ttl=62 time=4.21 ms
1450 bytes from 172.16.254.200: icmp_seq=58 ttl=62 time=4.41 ms
1450 bytes from 172.16.254.200: icmp_seq=59 ttl=62 time=7.36 ms
1450 bytes from 172.16.254.200: icmp_seq=60 ttl=62 time=7.56 ms
Looking at the router output for router 2 shows that the policy is no longer being hit... which makes me think the GRE tunnel might have a problem during the transition.
*Jun 3 22:50:16.270: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, g=10.0.3.1, len 1470, FIB policy routedsh
*Jun 3 22:50:16.722: IP: s=10.10.20.2 (Tunnel1), d=172.16.254.200, len 1470, FIB policy match
*Jun 3 22:50:16.726: IP: s=10.10.20.2 (Tunnel1), d=172.16.254.200, g=10.0.2.1, len 1470, FIB policy routed
*Jun 3 22:50:17.270: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, len 1470, FIB policy match
*Jun 3 22:50:17.270: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, g=10.0.3.1, len 1470, FIB policy routed
*Jun 3 22:50:17.722: IP: s=10.10.20.2 (Tunnel1), d=172.16.254.200, len 1470, FIB policy match
*Jun 3 22:50:17.722: IP: s=10.10.20.2 (Tunnel1), d=172.16.254.200, g=10.0.2.1, len 1470, FIB policy routedt
router2(config-if)# (*** here is where I issue the shut command ***)
*Jun 3 22:50:18.270: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, len 1470, FIB policy match
*Jun 3 22:50:18.270: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, g=10.0.3.1, len 1470, FIB policy routed
*Jun 3 22:50:18.722: IP: s=10.10.20.2 (Tunnel1), d=172.16.254.200, len 1470, FIB policy match
*Jun 3 22:50:18.722: IP: s=10.10.20.2 (Tunnel1), d=172.16.254.200, g=10.0.2.1, len 1470, FIB policy routed
*Jun 3 22:50:19.270: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, len 1470, FIB policy match
*Jun 3 22:50:19.270: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, g=10.0.3.1, len 1470, FIB policy routed
*Jun 3 22:50:19.722: IP: s=10.10.20.2 (Tunnel1), d=172.16.254.200, len 1470, FIB policy match
*Jun 3 22:50:19.722: IP: s=10.10.20.2 (Tunnel1), d=172.16.254.200, g=10.0.2.1, len 1470, FIB policy routed
*Jun 3 22:50:19.742: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan2, changed state to down
*Jun 3 22:50:19.742: is_up: 0 state: 4 sub state: 1 line: 0 has_route: True
*Jun 3 22:50:19.742: RT: del 10.10.10.0/30 via 10.0.2.1, rip metric [100/15]
*Jun 3 22:50:19.742: RT: delete subnet route to 10.10.10.0/30
*Jun 3 22:50:19.742: RT: NET-RED 10.10.10.0/30
*Jun 3 22:50:19.742: RT: del 10.10.10.4/30 via 10.0.2.1, rip metric [100/15]
*Jun 3 22:50:19.742: RT: delete subnet route to 10.10.10.4/30
*Jun 3 22:50:19.742: RT: NET-RED 10.10.10.4/30
*Jun 3 22:50:19.742: RT: del 10.10.10.8/30 via 10.0.2.1, rip metric [100/15]
*Jun 3 22:50:19.742: RT: delete subnet route to 10.10.10.8/30
*Jun 3 22:50:19.742: RT: NET-RED 10.10.10.8/30
*Jun 3 22:50:19.742: RT: del 10.10.10.12/30 via 10.0.2.1, rip metric [100/15]
*Jun 3 22:50:19.742: RT: delete subnet route to 10.10.10.12/30
*Jun 3 22:50:19.742: RT: NET-RED 10.10.10.12/30
*Jun 3 22:50:19.742: RT: del 172.16.254.0/24 via 10.0.2.1, rip metric [100/15]
*Jun 3 22:50:19.742: RT: delete subnet route to 172.16.254.0/24
*Jun 3 22:50:19.742: RT: NET-RED 172.16.254.0/24
*Jun 3 22:50:19.742: RT: interface Vlan2 removed from routing table
*Jun 3 22:50:19.742: RT: del 10.0.2.0/24 via 0.0.0.0, connected metric [0/0]
*Jun 3 22:50:19.742: RT: delete subnet route to 10.0.2.0/24
*Jun 3 22:50:19.742: RT: NET-RED 10.0.2.0/24
*Jun 3 22:50:20.278: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, len 1470, FIB policy match
*Jun 3 22:50:20.282: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, g=10.0.3.1, len 1470, FIB policy routed
*Jun 3 22:50:20.486: RT: SET_LAST_RDB for 10.0.2.0/24
NEW rdb: via 10.0.3.1
*Jun 3 22:50:20.486: RT: add 10.0.2.0/24 via 10.0.3.1, rip metric [110/15]
*Jun 3 22:50:20.486: RT: NET-RED 10.0.2.0/24
*Jun 3 22:50:20.486: RT: SET_LAST_RDB for 10.10.10.0/30
NEW rdb: via 10.0.3.1
*Jun 3 22:50:20.486: RT: add 10.10.10.0/30 via 10.0.3.1, rip metric [110/15]
*Jun 3 22:50:20.486: RT: NET-RED 10.10.10.0/30
*Jun 3 22:50:20.486: RT: SET_LAST_RDB for 10.10.10.4/30
NEW rdb: via 10.0.3.1
*Jun 3 22:50:20.486: RT: add 10.10.10.4/30 via 10.0.3.1, rip metric [110/15]
*Jun 3 22:50:20.486: RT: NET-RED 10.10.10.4/30
*Jun 3 22:50:20.486: RT: SET_LAST_RDB for 10.10.10.8/30
NEW rdb: via 10.0.3.1
*Jun 3 22:50:20.486: RT: add 10.10.10.8/30 via 10.0.3.1, rip metric [110/15]
*Jun 3 22:50:20.486: RT: NET-RED 10.10.10.8/30
*Jun 3 22:50:20.486: RT: SET_LAST_RDB for 10.10.10.12/30
NEW rdb: via 10.0.3.1
*Jun 3 22:50:20.486: RT: add 10.10.10.12/30 via 10.0.3.1, rip metric [110/15]
*Jun 3 22:50:20.486: RT: NET-RED 10.10.10.12/30
*Jun 3 22:50:20.486: RT: SET_LAST_RDB for 172.16.254.0/24
NEW rdb: via 10.0.3.1
*Jun 3 22:50:20.486: RT: add 172.16.254.0/24 via 10.0.3.1, rip metric [110/15]
*Jun 3 22:50:20.486: RT: NET-RED 172.16.254.0/24
*Jun 3 22:50:20.742: %LINK-5-CHANGED: Interface FastEthernet2, changed state to administratively down
*Jun 3 22:50:21.278: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, len 1470, FIB policy match
*Jun 3 22:50:21.278: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, g=10.0.3.1, len 1470, FIB policy routed
*Jun 3 22:50:21.742: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet2, changed state to down
*Jun 3 22:50:22.278: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, len 1470, FIB policy match
*Jun 3 22:50:22.278: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, g=10.0.3.1, len 1470, FIB policy routed
*Jun 3 22:50:23.282: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, len 1470, FIB policy match
*Jun 3 22:50:23.282: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, g=10.0.3.1, len 1470, FIB policy routed
*Jun 3 22:50:24.278: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, len 1470, FIB policy match
*Jun 3 22:50:24.278: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, g=10.0.3.1, len 1470, FIB policy routed
*Jun 3 22:50:25.278: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, len 1470, FIB policy match
*Jun 3 22:50:25.278: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, g=10.0.3.1, len 1470, FIB policy routed
*Jun 3 22:50:26.278: IP: s=10.10.20.6 (Tunnel2), d=172.16.254.200, len 1470, FIB policy match
Attached are the complete configs from both routers. These routers are 1811s and to get more IP interfaces, I have set up vlans on the 8 port switch. The extra tunnel interfaces are for expanding this rig to use 4 links, but they aren't used right now.
Can anyone shed some light on why, when routing changes are being made, the GRE tunnel for the active side stops sending traffic, or why the policy isn't taking effect?r
06-04-2010 06:40 AM
Hi Claude,
Why don't you just loadbalance the traffic over both links by letting RIP learn the same prefix over both links with the same metric ? Both routes will be installed in the routing table and cef per-session loadbalancing algo will be used.
HTH
Laurent.
06-04-2010 06:50 AM
That does seem like a viable alternate solution and I'll try it out, thanks. However, at this point I am more interested in understanding why my Cisco router is behaving this way when it shouldn't be. Is there something I am missing with how GRE tunnels work on a Cisco router? Or is it something to do with policy routing's priority over the default routing table?
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide