Solved: No Neighbor Restoration Message after OSPF state change

trey.s.grunDD · ‎10-16-2015

All,

Our monitoring team received a trap for an "ospf message", and opened a ticket for it. These are the corresponding logs from the switch that generated the trap:

Oct 17 00:19:41: %OSPF-5-ADJCHG: Process 1, Nbr 144.45.239.11 on GigabitEthernet1/1/2 from 2WAY to DOWN, Neighbor Down: Dead timer expired
Oct 17 00:20:47: %PIM-5-NBRCHG: neighbor 144.45.239.11 DOWN on interface GigabitEthernet1/1/2 non DR
Oct 17 00:21:17: %PIM-5-NBRCHG: neighbor 144.45.239.11 UP on interface GigabitEthernet1/1/2

The are the last messages in the buffer, so I was expecting that neighbor to still show down (output truncated):

dal-do-warehouse-sw1#show ip ospf neigh

Neighbor ID Pri State Dead Time Address Interface
144.45.239.11 1 2WAY/DROTHER 00:00:33 144.45.239.11 GigabitEthernet1/1/2

Connected to Gi1/1/2 is a 4500. He recorded the event in his logs as:

Oct 16 19:19:06: %LINEPROTO-5-UPDOWN: Line protocol on Interface TenGigabitEthernet1/5, changed state to down
Oct 16 19:19:07: %LINK-3-UPDOWN: Interface TenGigabitEthernet1/5, changed state to down
Oct 16 19:19:43: %OSPF-5-ADJCHG: Process 1, Nbr 144.45.239.11 on Vlan102 from FULL to DOWN, Neighbor Down: Dead timer expired
Oct 16 19:20:48: %LINK-3-UPDOWN: Interface TenGigabitEthernet1/5, changed state to up
Oct 16 19:20:49: %PIM-5-NBRCHG: neighbor 144.45.239.11 DOWN on interface Vlan102 non DR
Oct 16 19:20:49: %LINEPROTO-5-UPDOWN: Line protocol on Interface TenGigabitEthernet1/5, changed state to up
Oct 16 19:21:18: %PIM-5-NBRCHG: neighbor 144.45.239.11 UP on interface Vlan102
Oct 16 19:21:21: %OSPF-5-ADJCHG: Process 1, Nbr 144.45.239.11 on Vlan102 from DOWN to INIT, Received Hello
Oct 16 19:21:21: %OSPF-5-ADJCHG: Process 1, Nbr 144.45.239.11 on Vlan102 from INIT to 2WAY, 2-Way Received
Oct 16 19:21:21: %OSPF-5-ADJCHG: Process 1, Nbr 144.45.239.11 on Vlan102 from 2WAY to EXSTART, AdjOK?
Oct 16 19:21:26: %OSPF-5-ADJCHG: Process 1, Nbr 144.45.239.11 on Vlan102 from EXSTART to EXCHANGE, Negotiation Done
Oct 16 19:21:26: %OSPF-5-ADJCHG: Process 1, Nbr 144.45.239.11 on Vlan102 from EXCHANGE to LOADING, Exchange Done
Oct 16 19:21:26: %OSPF-5-ADJCHG: Process 1, Nbr 144.45.239.11 on Vlan102 from LOADING to FULL, Loading Done

The interface T1/5 is connected to a device that advertises itself in CDP as a "LAB" switch. So it would appear, someone was messing with a "lab" switch that is participating in OSPF, and that's a set of administrative headaches that I'm really not interested in discussing here.

I'm trying to understand, why we didn't get an OSPF neighbor restoration log. Gi1/1/2 is an L3 interface:

dal-do-warehouse-sw1#show run int g1/1/2
Building configuration...

Current configuration : 181 bytes
!
interface GigabitEthernet1/1/2
description to dal-do-core-b
no switchport
ip address 144.45.239.12 255.255.255.192
ip pim sparse-mode
ip igmp version 3
auto qos trust
end

So by my reckoning, one of 2 things is happening here: We didn't REALLY lose the neighbor in question, and that log was generated due to a bug (and would explain why there's no "up" alert)

or

The neighbor relationship is being created by a method I am not understanding and there's a bug related to the lack of "up" information regarding the neighbor relationship.

I should mention that the port we connect to on the 4507 is an access port in vlan102, so at this time, there's clearly an L2 connection between the neighbors.

Current configuration : 91 bytes
!
interface TenGigabitEthernet1/1
switchport access vlan 102
switchport mode access
end

I did a bug search for "no up message OSPF, 3650, IOS-XE", etc.. but all I got was a bunch of Prime/Monitoring system bugs. Can anyone tell me if I'm misinterpreting information here? Why wouldn't we have received a neighbor up message in this case?

Peter Paluch · ‎10-16-2015

Hi Trey and Reza,

Please allow me to join.

By default, logging messages about OSPF adjacency state changes are generated only when either the adjacency enters the FULL state, or when it is entirely lost (the DOWN state). Intermediate states and transitions between them (Init, Attempt, 2Way, ExStart, Exchange, Loading) are not logged.

So what you saw is entirely right and expected. A neighborship went down from 2Way to Down state, and that was intended to be logged. After the neighborship came back up and entered the 2Way state, no message was logged because by default, intermediate states are not logged. This is also explained in the command reference for the log-adjacency-changes OSPF command that says:

The log-adjacency-changes command is on by default but only up/down (full/down) events are reported, unless the detail keyword is also used.

It would perhaps be worth considering adding the detail keyword to the configuration of log-adjacency-changes. While it will generate more messages about OSPF neighbor state transitions, it won't let you in the dark like it happened right now. Of course, this has to be tested in a lab first, as the logs will become quite chatty.

To sum it up, though - you have not experienced a bug, rather a feature :-P

Best regards,
Peter

View solution in original post

trey.s.grunDD · ‎10-16-2015

I should also add, the OSPF config is very basic. There are no commands that might suppress logging under the ospf process:

router ospf 1
area 0.0.1.8 stub
passive-interface Vlan100
passive-interface Vlan200
network 10.212.0.0 0.0.255.255 area 0.0.1.8
network 144.45.0.0 0.0.255.255 area 0.0.1.8

Reza Sharifi · ‎10-16-2015

The logs from the first switch is from Oct 16 and the logs from the 4500 is Oct 17. The time difference is approximately 5 hours. Does that mean the OSPF neighbors were down for about 5 hours?

Can you clarify?

trey.s.grunDD · ‎10-16-2015

That confused me too - on the 3650, the logs are timestamped GMT, on the 4500, the logs are timestamped with localtime.

Don't blame me. I just troubleshoot this stuff, I don't configure it...

Peter Paluch · ‎10-16-2015

Hi Trey and Reza,

Please allow me to join.

By default, logging messages about OSPF adjacency state changes are generated only when either the adjacency enters the FULL state, or when it is entirely lost (the DOWN state). Intermediate states and transitions between them (Init, Attempt, 2Way, ExStart, Exchange, Loading) are not logged.

So what you saw is entirely right and expected. A neighborship went down from 2Way to Down state, and that was intended to be logged. After the neighborship came back up and entered the 2Way state, no message was logged because by default, intermediate states are not logged. This is also explained in the command reference for the log-adjacency-changes OSPF command that says:

The log-adjacency-changes command is on by default but only up/down (full/down) events are reported, unless the detail keyword is also used.

It would perhaps be worth considering adding the detail keyword to the configuration of log-adjacency-changes. While it will generate more messages about OSPF neighbor state transitions, it won't let you in the dark like it happened right now. Of course, this has to be tested in a lab first, as the logs will become quite chatty.

To sum it up, though - you have not experienced a bug, rather a feature :-P

Best regards,
Peter

trey.s.grunDD · ‎10-16-2015

And for clarity:

router ospf 1
router-id 10.212.0.253
log-adjacency-changes detail
area 0.0.1.8 stub
area 0.0.1.8 range 10.212.0.0 255.254.0.0
summary-address 10.212.0.0 255.254.0.0
redistribute connected subnets
redistribute static subnets
network 10.212.0.206 0.0.0.0 area 0
network 10.0.0.0 0.255.255.255 area 0.0.1.8
network 144.45.0.0 0.0.255.255 area 0.0.1.8
neighbor 222.18.65.1

That is the OSPF configuration on the 4500 that gave us verbose output. I guess I should've checked that, but I thought that was default. Thanks for your input!