cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
21225
Views
20
Helpful
18
Replies

OSPF Adjacency stuck in EXCHANGE/EX-START states

bobbydazzler
Level 1
Level 1

Hello!

 

I am facing a problem with an OSPF link between an ASR and a 6509. It has been working fine for months and all of a suden the process went down, and is stuck in EXCHANGE/EXSTART states . I have another OSPF link (for redundancy purpose) between these two devices that is still working fine.

 

Interface on the ASR:

interface GigabitEthernet0/0/2
 description -- 6509 Gi3/1
 ip address xxx.xxx.247.243 255.255.255.254
 no ip redirects
 no ip unreachables
 no ip proxy-arp
 ip nat outside
 ip portbundle outside
 ip ospf authentication message-digest
 ip ospf message-digest-key 1 md5 7 0607032E444F071C11
 ip ospf network point-to-point
 load-interval 30
 negotiation auto
end

Interface on the 6509:

interface GigabitEthernet3/1
 description -- ASR1002 Gi0/0/2
 ip address xxx.xxx.247.242 255.255.255.254
 no ip redirects
 no ip unreachables
 no ip proxy-arp
 ip ospf authentication message-digest
 ip ospf message-digest-key 1 md5 7 0607032E444F071C11
 ip ospf network point-to-point
 ip ospf priority 128
end

I can ping between the two without pb (with different MTU size and df-bit set)

 

From the debug log it seems that the ASR receive the DBD from the 6509 and send it back but the 6509 doesn't, and it keeps on re-sending.

 

ASR log:

Dec 29 12:07:17.840: OSPF-65535 ADJ   Gi0/0/2: Rcv DBD from xxx.xxx.247.255 seq 0x159E opt 0x50 flag 0x7 len 32  mtu 1500 state EXCHANGE
Dec 29 12:07:17.840: OSPF-65535 ADJ   Gi0/0/2: Send DBD to xxx.xxx.247.255 seq 0x159E opt 0x50 flag 0x2 len 192
Dec 29 12:07:17.840: OSPF-65535 ADJ   Gi0/0/2: Send with youngest Key 1

6509 log:

Dec 29 12:06:54.202: OSPF: Send DBD to xxx.xxx.247.254 on GigabitEthernet3/1 seq 0x159E opt 0x50 flag 0x7 len 32
Dec 29 12:06:54.202: OSPF: Send with youngest Key 1
Dec 29 12:06:54.202: OSPF: Retransmitting DBD to xxx.xxx.247.254 on GigabitEthernet3/1 [12]
Dec 29 12:06:56.410: OSPF: Send with youngest Key 1
Dec 29 12:06:58.714: OSPF: Send DBD to xxx.xxx.247.254 on GigabitEthernet3/1 seq 0x159E opt 0x50 flag 0x7 len 32
Dec 29 12:06:58.714: OSPF: Send with youngest Key 1
Dec 29 12:06:58.714: OSPF: Retransmitting DBD to xxx.xxx.247.254 on GigabitEthernet3/1 [13]
Dec 29 12:06:59.306: OSPF: Send with youngest Key 1

Any suggestion or pointer of things I should check is more than welcome.

 

Best!

18 Replies 18

Hi Bob,

It has been a sheer pleasure for me!

I think we are only halfway there - we need to identify the reason why the 6509 starts dropping the unicast OSPF packets, as this is something entirely unexpected.

I am not sure if you can share the complete detailed configuration - I understand that due to the public nature of these forums and the confidential nature of your configuration, this might not be possible. If you have a support contract, then I would definitely encourage you to open a TAC case and point the engineer to this thread.

If not, please let me know - I will think of something else to help you. We have to get this thing going :)

Best regards,
Peter

Hi Bob,

I just wanted to check with you whether you have been able to find out what was causing the problems with OSPF adjacencies, namely, the 6509 dropping incoming OSPF packets instead of forwarding them to the supervisor CPU.

Thank you!

Best regards,
Peter

Hello Peter,

Thank you for following up my case. I have found out what was causing the pb but not why.
According to our sys admin, one instance of our virtualisation backing up system had silently crashed, it was still doing its job, only the management of the NAS was impacted.
Two days ago, the OSFP sessions came back up, today our sys admin told me about this storage pb and I realize that when he rebooted the backup NAS the OSFP came back up. Now looking in the logs I don't see any reason why it would affect the 6509 routing process (the NAS are connected to 10Gb line card on a dedicated VLAN, no routing involved).
I don't see any error on the line card/ports connected to the NAS, nothing in the 6509 logs neither.
I have never came across this kind of issue, and my sysadmin handling the storage system neither (storage wise).

Thanks again for your help and I hope I didn't make you loose your time. In any case this thread has some interesting debugging post.
Best regards

Hi Bob,

You are very much welcome, and please rest assured that this case was as interesting for me as it was for you, and I certainly do not consider the time spent on this case to be lost at all!

With the explanation you have just provided, I can only think of one thing: If the crashed virtualisation backup system instance, or the rebooted NAS, originated a significant amount of traffic that would be hitting the Cat6509 and forwarded to the CPU for whatever reason, the amount of the CPU-bound traffic could have been hitting a limit after which additional traffic would be dropped. If that traffic was in the same class as the OSPF traffic, this could have caused the OSPF traffic to be lost.

Unfortunately, at this time, I don't know of a way of proving this hypothesis anymore.

Either way, thank you very much for sharing your update - it was a pleasure to look into this for you!

Best regards,
Peter

Review Cisco Networking for a $25 gift card