Nexus 7000 and VSS OSPF issue

trondaker · ‎04-12-2011

Hi,

We have two 6500 in a VSS-deployment in the core. To that cluster we connect 4 Nexus 7018, which have uplinks to both 6500 in a port-channel. We run the port-channel as a routed port and use subinterfaces for each vrf. The problem is when we turn off one of the chassis in the VSS, the Nexus takes down the OSPF-adjacency and rebuilds it. The FIB is also wiped clear and rebuilt. This happens immediately when one of the VSS-chassis is turned off or reloaded.

Is there some special consideration we have to take care of when it comes to port-channels and OSPF? Our understanding is that when we kill one chassis, those links go down, and are removed from the port-channel bundle. The rest of the traffic is then redistributed over the remaining links. NSF should take care of forwarding until the new supervisor in the VSS has built its FIB.

NSF never kicks in, on the Nexuses in 'show ip ospf' the grace period is never in effect.

hbruyere · ‎04-15-2011

Hello,

Trying to jump to the conclusion here, can you make sure that the IOS routers are configured with 'nsf ietf', as the Nexus OS does IETF NSF only,

and not Cisco NSF?

Regards,

Herve

trondaker · ‎04-15-2011

Hi Herve,

Thanks for your response. You're on to something, we changed it to ietf on the Nexuses from the cisco default, and it seems like NSF is working now. We verified through show ip ospf on the nexuses that the grace-period was in effect. However, the problem is that traffic is still being dropped for about 16 seconds. When we reload/shut down the active chassis, traffic is being forwarded for a while (indicating NSF is working), but after 10 seconds or so the traffic stops for 16 seconds. We're doing some more tests now to see what happens to the forwarding tables on the Nexuses, but this isn't quite right is it? If NSF was working 100%, [almost] no traffic should be dropped right?

Br,

Trond

hbruyere · ‎04-18-2011

Indeed with NSF there should be no drops. What is the grace period configured on the nexus? Default is 60 secs.

Regards,

Herve

trondaker · ‎04-18-2011

Hi Herve,

Yea, 60 seconds. Turns out that layer 3 portchannels with subinterfaces for each VRF doesn't work. We created a layer 2 trunk with SVIs instead and it works like a charm, maybe 1-2 seconds drop but thats it.

Thanks for your help Herve!

Br.

Trond

lukaszkhalil · ‎01-09-2012

Hello

I have the same problem and unfortunately I cannot migrate to SVIs.

Do you know why there is a problem with the NSF on layer 3 portchannels with subinterfaces and what can be done to fix it ?

Thanks for any help.

Regards

Lukas

trondaker · ‎01-09-2012

Hi Lukas,

There was some talk about internal state being very slow to react to changes when using l3 portchannel subifs, and there was no workaround at that time. Unfortunately, i dont have this from Cisco. Open a informational tac? Sorry that i cant be more specific:-/

Br,

Trond