10-14-2012 10:32 AM - edited 03-04-2019 05:51 PM
The scenario is an MPLS/LDP/ISIS/BGP network, with two PE routers connected together by two switches between them: PE1 - SW1 - SW2 - PE2
Weird flaps are happening between PE1 and PE2 and I can't work out why based on the logged messages on the routers. Can someone please expand on the meaning of the syslog messages?
FYI; no devices have rebooted, no interfaces have gone down, nothing is unplugged, CPU is <40% on all devices, memory usage is less that 50% on all devices, no interface errors or drops are being counted when this happens; I can't work it out? The network was quiet when this happened;
From PE1;
We see a HSRP state change on an interface facing PE2 (which was the Active HSRP device), then all the BGP sessions dropping to other PE's and CE's
Oct 14 17:32:38.592: %HSRP-5-STATECHANGE: GigabitEthernet0/0.555 Grp 1 state Standby -> Active
Oct 14 17:32:38.608: %BGP-5-ADJCHANGE: neighbor 192.168.1.1 vpn vrf VRF1 Down BGP Notification sent
Oct 14 17:32:38.608: %BGP-3-NOTIFICATION: sent to neighbor 192.168.1.1 4/0 (hold time expired) 0 bytes
Oct 14 17:32:38.608: %BGP-5-ADJCHANGE: neighbor 10.0.234.135 Down BGP Notification sent
Oct 14 17:32:38.608: %BGP-3-NOTIFICATION: sent to neighbor 10.0.234.135 4/0 (hold time expired) 0 bytes
Oct 14 17:32:38.608: %BGP-5-ADJCHANGE: neighbor 10.0.234.133 Down BGP Notification sent
Oct 14 17:32:38.608: %BGP-3-NOTIFICATION: sent to neighbor 10.0.234.133 4/0 (hold time expired) 0 bytes
.....Lots of the above for all the peerings, Also ISIS goes down;
Oct 14 17:32:38.612: %CLNS-5-ADJCHANGE: ISIS: Adjacency to p01 (GigabitEthernet0/1.6) Down, neighbor forgot us
Oct 14 17:32:38.612: %CLNS-5-ADJCHANGE: ISIS: Adjacency to cr01 (GigabitEthernet0/1.6) Down, neighbor forgot us
Oct 14 17:32:38.612: %CLNS-5-ADJCHANGE: ISIS: Adjacency to pe02 (GigabitEthernet0/0.555) Down, hold time expired
Oct 14 17:32:38.612: %CLNS-5-ADJCHANGE: ISIS: Adjacency to ar01 (GigabitEthernet0/2) Down, hold time expired
Oct 14 17:32:38.612: %CLNS-5-ADJCHANGE: ISIS: Adjacency to pe02 (GigabitEthernet0/2) Down, hold time expired
Oct 14 17:32:38.612: %CLNS-5-ADJCHANGE: ISIS: Adjacency to ar01 (GigabitEthernet0/2) Down, hold time expired
...And LDP neighbours are lost
Oct 14 17:32:38.864: %LDP-5-NBRCHG: LDP Neighbor 10.0.224.133:0 (3) is DOWN (Session KeepAlive Timer expired)
Oct 14 17:32:38.864: %LDP-5-NBRCHG: LDP Neighbor 10.0.224.145:0 (5) is DOWN (Session KeepAlive Timer expired)
Oct 14 17:32:38.864: %LDP-5-NBRCHG: LDP Neighbor 10.0.224.138:0 (6) is DOWN (Session KeepAlive Timer expired)
Oct 14 17:32:38.864: %LDP-5-NBRCHG: LDP Neighbor 10.0.224.135:0 (1) is DOWN (Session KeepAlive Timer expired)
Oct 14 17:32:38.868: %CLNS-5-ADJCHANGE: ISIS: Adjacency to ar01 (GigabitEthernet0/1.6) Down, neighbor forgot us
...Then it all starts to come back
Oct 14 17:32:39.272: %CLNS-5-ADJCHANGE: ISIS: Adjacency to cr01 (GigabitEthernet0/1.6) Up, new adjacency
Oct 14 17:32:39.272: %CLNS-5-ADJCHANGE: ISIS: Adjacency to p01 (GigabitEthernet0/1.6) Up, new adjacency
Oct 14 17:32:39.272: %CLNS-5-ADJCHANGE: ISIS: Adjacency to ar01 (GigabitEthernet0/1.6) Up, new adjacency
Oct 14 17:32:39.900: %CLNS-5-ADJCHANGE: ISIS: Adjacency to ar01 (GigabitEthernet0/2) Up, new adjacency
Oct 14 17:32:39.900: %CLNS-5-ADJCHANGE: ISIS: Adjacency to ar01 (GigabitEthernet0/2) Up, new adjacency
Oct 14 17:32:46.104: %LDP-5-NBRCHG: LDP Neighbor 10.0.224.138:0 (1) is UP
Oct 14 17:32:46.508: %LDP-5-NBRCHG: LDP Neighbor 10.0.224.145:0 (3) is UP
Oct 14 17:32:46.916: %LDP-5-NBRCHG: LDP Neighbor 10.0.224.141:0 (2) is DOWN (Session KeepAlive Timer expired)
Oct 14 17:32:46.916: %LDP-5-NBRCHG: LDP Neighbor 10.0.224.133:0 (5) is UP
Oct 14 17:32:46.920: %BGP-5-ADJCHANGE: neighbor 10.0.224.138 Down BGP Notification sent
Oct 14 17:32:46.920: %BGP-3-NOTIFICATION: sent to neighbor 10.0.224.138 4/0 (hold time expired) 0 bytes
Oct 14 17:32:49.384: %LDP-5-NBRCHG: LDP Neighbor 10.0.224.129:0 (4) is DOWN (TCP connection closed by peer)
Oct 14 17:32:50.088: %HSRP-5-STATECHANGE: GigabitEthernet0/0.555 Grp 1 state Standby -> Active
Oct 14 17:32:50.756: %LDP-5-NBRCHG: LDP Neighbor 10.0.224.141:0 (6) is UP
...Why do I see these "no supported AFI/SAFI" no new neighbours have been configured it's just old ones comming back up,
so I don't see why one neighbour would be proposing something in BGP to another that it doesn't support?
Oct 14 17:32:51.332: %BGP-3-NOTIFICATION: sent to neighbor 10.0.224.138 passive 2/8 (no supported AFI/SAFI) 3 bytes 000180
Oct 14 17:32:51.332: %BGP-3-NOTIFICATION: sent to neighbor 10.0.224.137 passive 2/8 (no supported AFI/SAFI) 3 bytes 000180
Oct 14 17:32:51.332: %BGP-3-NOTIFICATION: received from neighbor 78.33.30.53 active 2/8 (no supported AFI/SAFI) 3 bytes 000000
PE2 is logging the same stuff;
Oct 14 17:32:53.064: %BGP-3-NOTIFICATION: sent to neighbor 10.255.0.14 4/0 (hold time expired) 0 bytes
Oct 14 17:32:53.064: %BGP-5-ADJCHANGE: neighbor 10.0.224.137 Down BGP Notification sent
Oct 14 17:32:53.064: %BGP-3-NOTIFICATION: sent to neighbor 10.0.224.137 4/0 (hold time expired) 0 bytes
Oct 14 17:32:53.064: %CLNS-5-ADJCHANGE: ISIS: Adjacency to pe01 (GigabitEthernet0/0.555) Down, hold time expired
Oct 14 17:32:53.064: %CLNS-5-ADJCHANGE: ISIS: Adjacency to p01 (GigabitEthernet0/1) Down, hold time expired
Oct 14 17:32:53.180: %LDP-5-NBRCHG: LDP Neighbor 10.0.224.133:0 (4) is DOWN (Session KeepAlive Timer expired)
Oct 14 17:32:53.180: %LDP-5-NBRCHG: LDP Neighbor 10.0.224.141:0 (1) is DOWN (Session KeepAlive Timer expired)
Oct 14 17:32:53.180: %LDP-5-NBRCHG: LDP Neighbor 10.0.224.138:0 (2) is DOWN (Session KeepAlive Timer expired)
Oct 14 17:32:53.180: %LDP-5-NBRCHG: LDP Neighbor 10.0.224.134:0 (3) is DOWN (Session KeepAlive Timer expired)
.Oct 14 17:32:54.808: %CLNS-5-ADJCHANGE: ISIS: Adjacency to pe01 (GigabitEthernet0/0.555) Up, new adjacency
.Oct 14 17:32:54.808: %CLNS-5-ADJCHANGE: ISIS: Adjacency to p01 (GigabitEthernet0/1) Up, new adjacency
.Oct 14 17:33:01.176: %LDP-5-NBRCHG: LDP Neighbor 10.0.224.133:0 (1) is UP
.Oct 14 17:33:01.816: %LDP-5-NBRCHG: LDP Neighbor 10.0.224.141:0 (2) is UP
If anyone can elaborate on possible causes, I'm all ears
10-15-2012 12:54 AM
Can you please let us know how both PE connected to the WAN portion....and if the WAN portion is Gig interface then we can't see any flaps because Gig interface never goes down...and because of that i can see the HSRP is flapped first if lan is ok then it would be because of WAN if tracking is enabled.
Please let us know how the PE is communicating to WAN side... Also if it is connected thru a Gig interface then u can use the IP sla so, it will generate the logs if any ping fails.
Regards,
Amit
10-15-2012 07:13 AM
Hi Amit,
Thanks for your input. Can you explain please what you mean by "Gig interface never goes down"? All interfaces in the test are gig.
An ASCII diagram of a part of the network (the best I can do) is as follows;
p01
|
|
|-------------------------------SW3----------------------------------|
| |
gi0/1 |
PE1 gi0/0.555 --- SW1 --- SW2 --- gi0/0.555 PE2 |
gi0/2 gi0/1 |
| | |
| | |
------------------SW4 ------------------------------------------| |
| |
ar01----------------------------------------------|
Giuseppe, yes switches in between all routers for multiple adjacencies rather than lots of p2p links.
I have looked through the switches though and no log entries at all, last STP state change was several weeks ago. No interface errors were logged. No interface up/down events. Maybe it was an IOS error on PE1, this is the device that logs the errors I posted above, first? The rest are several seconds behind (waiting for dead times I guess).
Many thanks!
10-15-2012 03:07 AM
Hello Jwbensley,
I suppose you are working on a lab setup.
We see that almost at the same time ISIS adjacencies on three different interfaces have gone down.
>> Oct 14 17:32:38.612: %CLNS-5-ADJCHANGE: ISIS: Adjacency to p01 (GigabitEthernet0/1.6) Down, neighbor forgot us
Oct 14 17:32:38.612: %CLNS-5-ADJCHANGE: ISIS: Adjacency to cr01 (GigabitEthernet0/1.6) Down, neighbor forgot us
Oct 14 17:32:38.612: %CLNS-5-ADJCHANGE: ISIS: Adjacency to pe02 (GigabitEthernet0/0.555) Down, hold time expired
Oct 14 17:32:38.612: %CLNS-5-ADJCHANGE: ISIS: Adjacency to ar01 (GigabitEthernet0/2) Down, hold time expired
Oct 14 17:32:38.612: %CLNS-5-ADJCHANGE: ISIS: Adjacency to pe02 (GigabitEthernet0/2) Down, hold time expired
Oct 14 17:32:38.612: %CLNS-5-ADJCHANGE: ISIS: Adjacency to ar01 (GigabitEthernet0/2) Down, hold time expired
We note that ISIS adjacencies on 3 different interfaces gi0/0.555, gi0/1.6 and gi0/2 have fallen down at the same time,
We note also that the links for building the POP infrastructure are not direct p2p links but they are probably built using one or more L2 switches in the middle.
(Pe01 was adjacent to p01 and cr01 on gi0/1.6, PE01 was adjacent to pe02, ar01 on gi0/2). You have already explained this in the beginning there are two switches SW1 and SW2 in the middle between PE01 and PE02.
All the LDP, ISIS, and BGP failures are symptoms of something happened. The same for the HSRP messages.
In short I would look at SW1 and SW2 to see if any STP activity happened on multiple Vlans ( those used to build all the logical links involved in the flaps)
show spanning-tree detail on SW1 and SW2 may be useful.
Hope to help
Giuseppe
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: