cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
16656
Views
10
Helpful
27
Replies

Question about OSPF failover

Andrew Cormier
Level 1
Level 1

Good day!

I have a switch in my server room

I also have a switch at a secondary server room at a site 5 miles away

I have two separate LANEXs (1Gb each) connecting the switches. This is for fault tolerance if someone pulls down the fibre (as happened a couple of years ago).

I am using OSPF to route between the two switches. It seems to be balancing the traffic between the two links (not a bad thing)

My default gateway for the switch in my server room is both the endpoints at the other location (since that is where our internet connection is)

ip route 0.0.0.0 0.0.0.0 172.18.10.2 5

ip route 0.0.0.0 0.0.0.0 172.18.11.2 5

(10.1 and 11.1 are here ..)

When I unplug one of the links to test, I lose connectivity to the other site. I only left it unplugged maybe 5-10 seconds (an OH S#it! moment ) since I thought the failover to the second route.

Shouldnt OSPF detect the other route to be unavailable fairly quickly and increase the cost? or is it the fact that I am specifying the cost that screws up the failover? Or am I just a newb and impatient?

2 Accepted Solutions

Accepted Solutions

Hi,

a neighbour is declared down only after holdtime which is 4 times hello interval so by default it will take 40 seconds for the neighbour to be down.

You can either configure sub second hellos with ip ospf dead-interval hello-multiplier command under interfaces

or you can use BFD along with OSPF   http://www.cisco.com/en/US/docs/ios/12_0s/feature/guide/fs_bfd.html

It should reduce the convergence time needed for the traffic to be switched the only left interface.

Regards.

Alain

Don't forget to rate helpful posts.

Don't forget to rate helpful posts.

View solution in original post

Correct, you can do that with the statics in place still, checking the ospf database for the new defaults.  If they are there you can remove your statics.

View solution in original post

27 Replies 27

andrew.prince
Level 10
Level 10

It all depends on the timers - what timers have you configured?

Sent from Cisco Technical Support Android App

You have specified the cost of static routes, not OSPF routes.  OSPF has nothing to do with those routes whatsoever, unless you have redistributed the static routes you posted into OSPF, in which case they would be on the internet side of the link pointing to your edge device and advertise via OSPF to the site you are at.

Do a show ip route from the device in your server room, and post the output.

Thanks !!   Here it is

                  

MTL-STACK3750-12-1#sho ip route
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route

Gateway of last resort is 172.18.11.2 to network 0.0.0.0


S    192.168.15.0/24 [1/0] via 10.30.10.177
     64.0.0.0/32 is subnetted, 1 subnets
S       64.230.170.178 [1/0] via 10.30.10.12
     209.167.212.0/32 is subnetted, 1 subnets
O E2    209.167.212.154 [110/1000] via 172.18.11.2, 03:48:54, Vlan101
                        [110/1000] via 172.18.10.2, 03:48:54, Vlan102
     172.17.0.0/24 is subnetted, 4 subnets
C       172.17.4.0 is directly connected, Vlan34
C       172.17.2.0 is directly connected, Vlan32
     172.16.0.0/24 is subnetted, 21 subnets
     172.18.0.0/24 is subnetted, 2 subnets
C       172.18.11.0 is directly connected, Vlan101
C       172.18.10.0 is directly connected, Vlan102
     172.22.0.0/32 is subnetted, 1 subnets
S       172.22.21.166 [1/0] via 10.30.10.12
     10.0.0.0/8 is variably subnetted, 9 subnets, 3 masks
O E2    10.10.10.0/24 [110/1000] via 172.18.11.2, 03:48:56, Vlan101
                      [110/1000] via 172.18.10.2, 03:48:56, Vlan102
C       10.30.0.0/16 is directly connected, Vlan1
O       10.31.0.0/16 [110/20] via 172.18.11.2, 03:48:56, Vlan101
                     [110/20] via 172.18.10.2, 03:48:56, Vlan102
O E2    10.35.0.0/16 [110/1000] via 172.18.11.2, 03:48:56, Vlan101
                     [110/1000] via 172.18.10.2, 03:48:56, Vlan102
O E2    10.38.1.0/24 [110/1000] via 172.18.11.2, 03:48:56, Vlan101
                     [110/1000] via 172.18.10.2, 03:48:56, Vlan102
S       10.37.2.0/24 [1/0] via 10.30.10.12
O E2    10.37.1.0/24 [110/1000] via 172.18.11.2, 03:48:56, Vlan101
                     [110/1000] via 172.18.10.2, 03:48:56, Vlan102
O E2    10.36.0.0/16 [110/1000] via 172.18.11.2, 03:48:56, Vlan101
                     [110/1000] via 172.18.10.2, 03:48:56, Vlan102
O E2    10.10.10.61/32 [110/1000] via 172.18.11.2, 03:48:56, Vlan101
                       [110/1000] via 172.18.10.2, 03:48:56, Vlan102

S    192.168.1.0/24 [1/0] via 10.30.10.66
S*   0.0.0.0/0 [5/0] via 172.18.11.2
               [5/0] via 172.18.10.2

Maybe this is better?

MTL-STACK3750-12-1#sho ip route ospf

     10.0.0.0/8 is variably subnetted, 9 subnets, 3 masks
O E2    10.10.10.0/24 [110/1000] via 172.18.11.2, 03:54:26, Vlan101
                      [110/1000] via 172.18.10.2, 03:54:26, Vlan102
O       10.31.0.0/16 [110/20] via 172.18.11.2, 03:54:26, Vlan101
                     [110/20] via 172.18.10.2, 03:54:26, Vlan102
O E2    10.35.0.0/16 [110/1000] via 172.18.11.2, 03:54:26, Vlan101
                     [110/1000] via 172.18.10.2, 03:54:26, Vlan102
O E2    10.38.1.0/24 [110/1000] via 172.18.11.2, 03:54:26, Vlan101
                     [110/1000] via 172.18.10.2, 03:54:26, Vlan102
O E2    10.37.1.0/24 [110/1000] via 172.18.11.2, 03:54:26, Vlan101
                     [110/1000] via 172.18.10.2, 03:54:26, Vlan102
O E2    10.36.0.0/16 [110/1000] via 172.18.11.2, 03:54:26, Vlan101
                     [110/1000] via 172.18.10.2, 03:54:26, Vlan102
O E2    10.10.10.61/32 [110/1000] via 172.18.11.2, 03:54:26, Vlan101
                       [110/1000] via 172.18.10.2, 03:54:26, Vlan102

      

from running config

! router ospf 1
log-adjacency-changes
auto-cost reference-bandwidth 10000
redistribute static metric 1000 subnets
network 10.0.0.0 0.255.255.255 area 0
network 172.16.0.0 0.0.255.255 area 0
network 172.17.0.0 0.0.255.255 area 0
network 172.18.10.1 0.0.0.0 area 0
network 172.18.11.1 0.0.0.0 area 0
!

Could you post the other side's routing table as well?

Also post show ip protocol and show ip ospf nei from both devices.

I am cleaning out a few irrelevant entries for sercuirty/brevity.. but the important stuff should still be here

     172.17.0.0/24 is subnetted, 5 subnets
C       172.17.16.0 is directly connected, Vlan46
O       172.17.4.0 [110/20] via 172.18.11.1, 04:11:06, Vlan101
                   [110/20] via 172.18.10.1, 04:11:06, Vlan102
O       172.17.1.0 [110/20] via 172.18.11.1, 04:11:06, Vlan101
                   [110/20] via 172.18.10.1, 04:11:06, Vlan102
O       172.17.3.0 [110/20] via 172.18.11.1, 04:11:06, Vlan101
                   [110/20] via 172.18.10.1, 04:11:06, Vlan102
O       172.17.2.0 [110/20] via 172.18.11.1, 04:11:06, Vlan101
                   [110/20] via 172.18.10.1, 04:11:06, Vlan102
     172.16.0.0/24 is subnetted, 21 subnets

O       172.16.15.0 [110/20] via 172.18.11.1, 04:11:07, Vlan101
                    [110/20] via 172.18.10.1, 04:11:07, Vlan102
O       172.16.9.0 [110/20] via 172.18.11.1, 04:11:07, Vlan101
                   [110/20] via 172.18.10.1, 04:11:07, Vlan102
O       172.16.10.0 [110/20] via 172.18.11.1, 04:11:07, Vlan101
                    [110/20] via 172.18.10.1, 04:11:07, Vlan102
O       172.16.11.0 [110/20] via 172.18.11.1, 04:11:07, Vlan101
                    [110/20] via 172.18.10.1, 04:11:07, Vlan102

     172.18.0.0/24 is subnetted, 2 subnets
C       172.18.11.0 is directly connected, Vlan101
C       172.18.10.0 is directly connected, Vlan102
     172.22.0.0/32 is subnetted, 1 subnets
O E2    172.22.21.166 [110/1000] via 172.18.11.1, 04:11:07, Vlan101
                      [110/1000] via 172.18.10.1, 04:11:07, Vlan102
     10.0.0.0/8 is variably subnetted, 9 subnets, 3 masks
S       10.10.10.0/24 [1/0] via 10.31.0.1
O       10.30.0.0/16 [110/20] via 172.18.11.1, 04:11:07, Vlan101
                     [110/20] via 172.18.10.1, 04:11:07, Vlan102
C       10.31.0.0/16 is directly connected, Vlan1
S       10.35.0.0/16 [1/0] via 10.30.10.12
S       10.38.1.0/24 [1/0] via 10.31.0.1
O E2    10.37.2.0/24 [110/1000] via 172.18.11.1, 04:11:07, Vlan101
                     [110/1000] via 172.18.10.1, 04:11:07, Vlan102
S       10.37.1.0/24 [1/0] via 10.31.0.1
S       10.36.0.0/16 [1/0] via 10.31.0.1
O E2 192.168.1.0/24 [110/1000] via 172.18.11.1, 04:11:07, Vlan101
                    [110/1000] via 172.18.10.1, 04:11:07, Vlan102
S*   0.0.0.0/0 [1/0] via 10.31.0.1

What about the output of show ip protocols and show ip ospf nei?

Hi,

a neighbour is declared down only after holdtime which is 4 times hello interval so by default it will take 40 seconds for the neighbour to be down.

You can either configure sub second hellos with ip ospf dead-interval hello-multiplier command under interfaces

or you can use BFD along with OSPF   http://www.cisco.com/en/US/docs/ios/12_0s/feature/guide/fs_bfd.html

It should reduce the convergence time needed for the traffic to be switched the only left interface.

Regards.

Alain

Don't forget to rate helpful posts.

Don't forget to rate helpful posts.

But if he is ECMPing, as he is, he wouldn't have totolly lost connection to the site as he described.

Here is the info you are looking for Chris. I am fully game to wait 40 seconds but I cannot do it until the weekend (even then I have to do a CMR) I should mention that I was not able to ping servers at the site when I unplugged the redundant link (in the test is was VL101s link). Oddly no one complaned (400 users all connecting to exchange and voip at the other site) but as I mentioned.. it was a 10 second blip so... 

MTL-CISCO3750MCI-1#show ip ospf nei

Neighbor ID     Pri   State           Dead Time   Address         Interface
172.18.11.1        1   FULL/BDR        00:00:35    172.18.11.1      Vlan101
172.18.11.1        1   FULL/BDR        00:00:35    172.18.10.1      Vlan102
172.17.254.5      1   FULL/BDR        00:00:35    172.17.254.5    Vlan104
172.17.254.5      1   FULL/BDR        00:00:35    172.17.254.1    Vlan103

Routing Protocol is "ospf 1"

  Outgoing update filter list for all interfaces is not set

  Incoming update filter list for all interfaces is not set

  Router ID 172.18.11.2

  It is an autonomous system boundary router

  Redistributing External Routes from,

    static with metric mapped to 1000, includes subnets in redistribution

  Number of areas in this router is 1. 1 normal 0 stub 0 nssa

  Maximum path: 4

  Routing for Networks:

    10.0.0.0 0.255.255.255 area 0

    172.18.10.2 0.0.0.0 area 0

    172.18.11.2 0.0.0.0 area 0

  Routing Information Sources:

    Gateway         Distance      Last Update

    172.18.11.1           110      04:48:57

    172.18.10.1           110      1y35w

  Distance: (default is 110)

Routing Protocol is "ospf 1"
  Outgoing update filter list for all interfaces is not set
  Incoming update filter list for all interfaces is not set
  Router ID 172.18.11.2
  It is an autonomous system boundary router
  Redistributing External Routes from,
    static with metric mapped to 1000, includes subnets in redistribution
  Number of areas in this router is 1. 1 normal 0 stub 0 nssa
  Maximum path: 4
  Routing for Networks:
    10.0.0.0 0.255.255.255 area 0
    172.18.10.2 0.0.0.0 area 0
    172.18.11.2 0.0.0.0 area 0
  Routing Information Sources:
    Gateway         Distance      Last Update
    172.18.11.1           110      04:48:57
    172.18.10.1           110      1y35w
  Distance: (default is 110)

And along the same lines here is the other info for the timers

Timer intervals configured, Hello 10, Dead 40, Wait 40, Retransmit 5

Well I don't see anything wrong with your setup, unless those static routes in your local server room are messing with ECMP for some reason.  Remove the static default routes, as you are advertising a static default route into OSPF from the remote server room and try your test again before tuning your timers. 

chris.monk
Level 1
Level 1

I disregarded the most basic questions...

Was what you were pinging from (to the servers) also apart of VLAN 101?  If so, is your default gateway local to your site, or is it at the remote site? 

If your testing node was part of VLAN101 and your default-gatway was on the remote side of the link, when you unplugged the link you wouldn't be able to ping anything on the remote side, but members of VLAN102 would have no problems.  This wouldn't be a matter of routing convergence, but a layer2 issue.

Thanks Chris!   I have scheduled a maintenatnce for 6am tom. and will know more then.

VLAN 101 and 102 are only used to route traffic between the sites. My test was pinging from another vl here (where the users reside) to different vlan at the other site (actually pinging the mail server).

Before I do anything else I would like to understand the timers better and digest what has been said above. Thanks for all your help!

Drew

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Innovations in Cisco Full Stack Observability - A new webinar from Cisco