Re: OSPF adjacency not forming

m.glosson · ‎10-09-2012

Greetings all. I know the answer to this issue is "MTU mismatch," but in this case that is not the problem. Here is the output from the new 2951 router:

#sho ip ospf nei
Neighbor ID     Pri   State           Dead Time   Address         Interface
<snip>
10.100.49.6       1   FULL/BDR        00:00:36    10.10.70.253    GigabitEthernet0/1

10.200.49.5       1   EXCHANGE/DR     00:00:37    10.10.70.252    GigabitEthernet0/1

Here is the output from the Nexus is should be neighboring with:

# show ip ospf nei OSPF Process ID 100 VRF default

Total number of neighbors: 57

Neighbor ID Pri State Up Time Address Interface <snip> 10.10.10.250 1 FULL/DROTHER 33w0d 10.10.10.250 Vlan1 10.10.10.251 1 EXSTART/DROTHER 01:17:21 10.10.10.251 Vlan1

Here is a config snippet from the Nexus:

router ospf 100
  router-id 10.200.49.5
  network 10.100.0.0/16 area 0.0.0.0
  network 10.200.0.0/16 area 0.0.0.0
interface Vlan1
  ip address 10.10.70.252/16
  ip router ospf 100 area 0.0.0.0

Here is a config snippet from the 2951:

router ospf 100
 router-id 10.10.10.251
 network 10.0.10.0 0.0.0.3 area 100
 network 10.0.255.1 0.0.0.0 area 0
 network 10.10.0.0 0.0.255.255 area 0
interface GigabitEthernet0/1
 ip address 10.10.10.251 255.255.0.0

Any thoughts, or would you like to see more of the config? I did not configure the Nexus, and had I done so, I don't think I would have included the "network" statements under the instance section as my understanding is that is configured by the interfaces, but is that what's screwing this up?

Thanks!

Peter Paluch · ‎10-09-2012

Hello,

This really looks like a MTU mismatch. Have you tried using the ip ospf mtu-ignore command on the 2951 and some similar command on the Nexus? (I am not well versed in Nexus' CLI).

The question is - how come your router and Nexus are using different MTUs, if at all? Please check the show ip interface gi0/1 on the 2951 and a similar command on the Nexus. If it was a Catalyst, I would be interested in seeing the show system mtu command - the routing MTU would be the one that would be of interest.

Best regards,

Peter

m.glosson · ‎10-09-2012

I mentioned it was not an MTU mismatch in my first e-mail, but just so that you'll believe me, I'll post the output:

2951

#sh int g0/1
GigabitEthernet0/1 is up, line protocol is up
  Hardware is CN Gigabit Ethernet, address is fc99.47e8.2741 (bia fc99.47e8.2741)
  Internet address is 10.10.10.251/16
  MTU 1500 bytes, BW 100000 Kbit/sec, DLY 100 usec,

Nexus

# sh int vl1
Vlan1 is up, line protocol is up
  Hardware is EtherSVI, address is  4055.3904.a341
  Internet Address is 10.10.70.252/16
  MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,

And I had already tried "ip ospf mtu-ignore" on the 2951, even though I knew it wasn't an MTU mismatch.

Here is the output from "debug ip ospf adj" on the 2951:

Oct  9 21:22:43.030: OSPF-100 ADJ   Gi0/1: Rcv DBD from 10.200.49.5 seq 0x34115C17 opt 0x42 flag 0x7 len 32  mtu 1500 state EXCHANGE
Oct  9 21:22:43.030: OSPF-100 ADJ   Gi0/1: Send DBD to 10.200.49.5 seq 0x34115C17 opt 0x52 flag 0x2 len 1252

And here it is on the Nexus:

2012 Oct  9 16:24:21.422408 ospf: 100 [4913] (default) Nbr 10.10.10.251: EXSTART --> EXSTART, event HELLORCVD
2012 Oct  9 16:24:21.422436 ospf: 100 [4913] (default) Nbr 10.10.10.251: EXSTART --> EXSTART, event TWOWAYRCVD
2012 Oct  9 16:24:24.152338 ospf: 100 [4913] (default) Sending DBD to 10.10.10.251 on Vlan1
2012 Oct  9 16:24:24.152383 ospf: 100 [4913] (default) Sent DBD with 0 entries to 10.10.10.251 on Vlan1
2012 Oct  9 16:24:24.152401 ospf: 100 [4913] (default)   mtu 1500, opts: 0x42, ddbits: 0x7, seq: 0x34115c17
2012 Oct  9 16:24:29.292289 ospf: 100 [4913] (default) Sending DBD to 10.10.10.251 on Vlan1
2012 Oct  9 16:24:29.292334 ospf: 100 [4913] (default) Sent DBD with 0 entries to 10.10.10.251 on Vlan1
2012 Oct  9 16:24:29.292352 ospf: 100 [4913] (default)   mtu 1500, opts: 0x42, ddbits: 0x7, seq: 0x34115c17

Yes, 10.10.0.0/16 is a big network, but it's left over from about 18 years ago when IP was first introduced into this network and the guy used a huge number. It was never completely moved off that, so, unusual as it might be, it is a perfectly legitimate number.

The Nexus' monster amount of adjancencies are with one other Nexus, and all the accompanying VLANs. Of course most of them should have ip ospf passive-interface enabled, but someone else configured these things--I am just trying to bring the 2951 into the mix. There is an existing 3825 that, amusingly enough, has established a relationship with the Nexus that's not working, while it is stuck in EXCHANGE with the other Nexus (the BDR)!

Peter Paluch · ‎10-09-2012

Hi,

Please don't get me wrong I know that you said that this is not an MTU mismatch, but as a troubleshooting person, I must take care of cross-checking that statement. You did not post any trusted output in your first post so the claim about this not being a MTU issue was, from my viewpoint, unwarranted. Even here, the output of your show commands is not what I was looking for because you are looking at generic MTU, not at IP MTU specifically. The show interfaces output does not say anything about the IP MTU that can be vastly different. You need to check the show ip interface to see the IP MTU. The debugs outputs currently say that the Nexus is using a MTU of 1500 bytes. It is not certain what MTU is being used by the 2951.

Anyway, what is more interesting, and Karthic has pointed that out as well, is the fact that the Nexus switch does not display any DBD packets coming from the 2951 - as if they never came. The ExStart/Exchange/Loading phases are done using unicasted packets so checking the unicast communication between these two devices is called for.

Please keep us posted.

Best regards,

Peter

Giuseppe Larosa · ‎10-09-2012

Hello M.glosson,

an OSPF neighborship stucked at EXSTART is likely an MTU mismatch

you can use

show ip ospf interface

to check MTU on C2951 and hopefully also on the Nexus

you can use

debug ip ospf adj

on the C2951 to see the messages related to the OSPF activity this should give enough information on what is going wrong.

A final note:

Why are you using a /16 subnet mask ???

I find it quite uncommon.

the Nexus has 57 neighbors, is the C2951 the only one facing troubles? If so we cannot say it (the Nexus) is working bad.

Hope to help

Giuseppe

skarthic · ‎10-09-2012

Can you post the entire o/p of "debug ip ospf adj" and the "show run int ". Also check for pings with and without df-bit set between them.

From the debugs provided here is what i see

i)OSPF is at Exchange on the router(2900) and I see that it received the DBD from the debugs.

ii)Also it looks like the Nexus is not receiving the DBDs

Check if unicast connectivity is fine between them before you proceed. Let us know things go.

Regards,

Karthic

skarthic · ‎10-10-2012

Were you able to solve this? If so, please let us know the solution

m.glosson · ‎10-14-2012

The adjancency has formed, but whether it's led to another problem. Let me explain... the router had been connected (by someone else) through a very old Nortel 450 10/100 switch. Once we moved it into a Nexus 2K FEC module (connected to Nexus1), the full neighbor relationship formed. The 450 was not configured in any kind of funky manner, but who knows...

There are two Nexus switches that the router should "neighbor-up" with. However, one of them is in a constant EXSTART state with all three routers on the segment. The following is the output from Nexus2 (the BDR):

 10.0.255.1        1 EXSTART/DROTHER  4d15h    10.10.10.251    Vlan1
 10.10.10.250      1 EXSTART/DROTHER  1w5d     10.10.10.250    Vlan1
 10.10.50.5        1 EXSTART/DROTHER  8w0d     10.10.50.5      Vlan1

Nexus1 shows all these as healthy connections. I have a feeling we would be getting into another monster can of worms with this information. The only neighbor this Nexus has paired up with is the other Nexus:

10.200.49.5 1 FULL/DR 36w0d 10.10.70.252 Vlan1

The Nexus' are connected together via direct 4-port 10 Gbps (40 Gbps total) etherchannel.

Thanks for your insight.

williamvinson · ‎03-20-2019

Had the same issue using nexus. Problem was that the switches were in a vpc so only one of the routers was forming neighbors, had to add mac address command to the ospf interface used the mac address of the switch and then all neighbors came up. Problem was with the vpc confusing the ospf process thinking that both switches were 1 switch instead of 2 different switches. adding the mac address command on the ospf interface corrected the issue.

cequint_soksavik · ‎06-19-2019

Thanks for bringing that up; I'm facing what I believe is the same issue. I thought this was also why Cisco recommends a dedicated L3 interface between two switches in a VPC pair. I'm thinking I should be trying to force all my OSPF adjacencies across that interface, basically by doing "passive-interface default" and only excepting my L3 link.

May still need MAC addresses on that link, though. Wish I had a non-production Nexus pair to try this out on.