Solved: OSPF reconvergence after fail over

Dennis Mink · ‎08-02-2011

I am implementing a solution for for a customer that has dual WAN links for redundancy. In order to achieve this (and for security purposes) I have implemented 2 OSPF processes (one of which is a on a VRF).

The subnet that I am testing for fail over is 10.233.20.0/24. under normal circumstances that route is in the routing table in OSPF 1:

N_Primary#sh ip route 10.233.20.0

Routing entry for 10.233.20.0/24

Known via "ospf 11", distance 110, metric 110

Tag Complete, Path Length == 1, AS 65031, , type extern 1

Last update from 10.133.1.250 on Ethernet0/3, 00:04:14 ago

Routing Descriptor Blocks:

* 10.133.1.250, from 10.133.1.250, 00:04:14 ago, via Ethernet0/3

Route metric is 110, traffic share count is 1

Route tag 3489725959

When I fail the primary link, the secondary link kicks in using OSPF11 to populate the routing table (higher metric of 110)

NOC_Primary#sh ip route 10.233.20.0

Routing entry for 10.233.20.0/24

Known via "ospf 11", distance 110, metric 4010

Tag Complete, Path Length == 1, AS 65031, , type extern 1

Last update from 10.133.1.250 on Ethernet0/3, 00:01:13 ago

Routing Descriptor Blocks:

* 10.133.1.250, from 10.133.1.250, 00:01:13 ago, via Ethernet0/3

Route metric is 4010, traffic share count is 1

Route tag 3489725959

As can be seen below, both processes have the 10.233.20.0 subnet in their DB, with different metrics as the only difference

N_Primary#sh ip ospf 11 database external 10.233.20.0 (FROM SECONDARY)

OSPF Router with ID (10.133.1.254) (Process ID 11)

Type-5 AS External Link States

Routing Bit Set on this LSA
LS age: 98
Options: (No TOS-capability, DC)
LS Type: AS External Link
Link State ID: 10.233.20.0 (External Network Number )
Advertising Router: 10.133.1.250
LS Seq Number: 80000001
Checksum: 0x4DCD
Length: 36
Network Mask: /24
        Metric Type: 1 (Comparable directly to link state metric)
        TOS: 0
        Metric: 4000        <-------------------------HIGHER METRIC
        Forward Address: 0.0.0.0
        External Route Tag: 3489725959

N_Primary#sh ip ospf 1 database external 10.233.20.0 (FROM PRIMARY LINK)

OSPF Router with ID (10.133.2.1) (Process ID 1)

Type-5 AS External Link States

Routing Bit Set on this LSA
LS age: 1120
Options: (No TOS-capability, DC)
LS Type: AS External Link
Link State ID: 10.233.20.0 (External Network Number )
Advertising Router: 10.133.0.2
LS Seq Number: 80000001
Checksum: 0x623D
Length: 36
Network Mask: /24
        Metric Type: 1 (Comparable directly to link state metric)
        TOS: 0
        Metric: 1   <--------LOWEST, MOST PREFERRED
        Forward Address: 0.0.0.0
        External Route Tag: 65530

The problem is that when bring the PRIMARY link back up, the routing table on my router, will not converge back to the original route with the cheaper metric.
Not even after the 30 minute periodic LSA recomputation interval. Any suggestions on how to force fall back to the more favourable route with the lower metric to the primary WAN?

Please remember to rate useful posts, by clicking on the stars below.

lgijssel · ‎08-02-2011

Please check the following link:

http://www.cisco.com/en/US/tech/tk365/technologies_white_paper09186a0080531fd2.shtml#topic2

It explains about redistribution between two ospf processes.

Your problem is quite similar and has to with the administrative distance.

Perhaps it wil work when you reduce the AD on opsf process 1.

regards,

Leo

View solution in original post

lgijssel · ‎08-02-2011

Perhaps I understand you wrong but from the output provided it looks different:

The subnet that I am testing for fail over is 10.233.20.0/24. under normal circumstances

that route is in the routing table in OSPF 1:

N_Primary#sh ip route 10.233.20.0

Routing entry for 10.233.20.0/24

Known via "ospf 11", distance 110, metric 110

....

regards,

Leo

Dennis Mink · ‎08-02-2011

You are spot on, I pasted the wrong output in.

so in normal scenario when Primary WAN is up, the route should show the following:

N_Primary#sh ip route 10.233.20.0

Routing entry for 10.233.20.0/24

Known via "ospf 1", distance 110, metric 21

Tag 65530, type extern 1

Last update from 10.133.2.125 on Ethernet0/0.102, 00:00:08 ago

Routing Descriptor Blocks:

* 10.133.2.125, from 10.133.0.2, 00:00:08 ago, via Ethernet0/0.102

Route metric is 21, traffic share count is 1

Route tag 65530

Please remember to rate useful posts, by clicking on the stars below.

lgijssel · ‎08-02-2011

Please check the following link:

http://www.cisco.com/en/US/tech/tk365/technologies_white_paper09186a0080531fd2.shtml#topic2

It explains about redistribution between two ospf processes.

Your problem is quite similar and has to with the administrative distance.

Perhaps it wil work when you reduce the AD on opsf process 1.

regards,

Leo

Dennis Mink · ‎08-02-2011

Thanks mate, I had it fixed, before i read your post but you are spot on again, I changed the AD of one of the processes to 120 and now fail over and fall back are working properly.

I guess with fall back, although the primary link re-populates OSPF 1 , it will keep the original route even though new LSA's occurr, the LSA's from one process are not taken into account into another process, tweaking the AD changes that behaviour.

Thanks again

Please remember to rate useful posts, by clicking on the stars below.

Peter Paluch · ‎08-02-2011

Leo,

Exactly. I've just made a simple three router scenario where two routers advertised the same external network to the third one, having a different total metric. The routing table on the third router always contained the first advertised alternative, and was removed (and replaced) only if the original route became unreachable. The metric did not have any influence - modifying the administrative distance solved this.

Good point!

Best regards,

Peter