11-03-2021 10:08 PM - edited 11-03-2021 10:09 PM
Hi ,
i have a pair of 6807's in VSL mode/pair with two routers on the upstream in a typical failover config...
ASR1 goes to SW1 and ASR2 goes to SW2 in the core-pair.
not sure when, but about a month ago we started to see drops during the day... first noticed by our cisco VoIP users since the reconvergence of OSPF causes it to fail and drop the connection to the CM that is at a remote site/across the WAN.
after some considerable troubleshooting, at first we tackled the Layer1, then looked at maybe issues with loops or L2 problems but were back to the same spot again.
every 30-200minutes.. very random, i get the same OSPF drop.. and then it re-establishes the connection after a few seconds it seems.
as a side note, i do notice some HOSTFLAPPING ... on the standard switchports but not on the WAN uplinks to the OSPF neighbor. it might just be coincidence , but just something i noticed and that too i can't seem to pinpoint. but the OSPF drops are definitely causing issue. Any help would be appreciated of course. Thanks.
Nov 3 21:29:45 PDT: %OSPF-SW1-5-ADJCHG: Process 1, Nbr 10.0.0.65 on GigabitEthernet2/3/1 from FULL to DOWN, Neighbor Down: Dead timer expired
Nov 3 21:29:45 PDT: %OSPF-SW1-5-ADJCHG: Process 1, Nbr 10.1.0.65 on GigabitEthernet1/4/1 from FULL to DOWN, Neighbor Down: Dead timer expired
Nov 3 21:29:45 PDT: %OSPF-SW1-5-ADJCHG: Process 1, Nbr 10.2.7.36 on GigabitEthernet1/3/1 from FULL to DOWN, Neighbor Down: Dead timer expired
Nov 3 21:29:46 PDT: %OSPF-SW1-5-ADJCHG: Process 1, Nbr 10.2.9.65 on GigabitEthernet2/3/1 from LOADING to FULL, Loading Done
Nov 3 21:29:46 PDT: %OSPF-SW1-5-ADJCHG: Process 1, Nbr 10.2.7.36 on GigabitEthernet1/3/1 from LOADING to FULL, Loading Done
Nov 3 21:29:46 PDT: %OSPF-SW1-5-ADJCHG: Process 1, Nbr 10.2.7.65 on GigabitEthernet1/4/1 from LOADING to FULL, Loading Done
11-04-2021 12:41 AM
Hello,
on the interfaces that are configured, do you see any packet drops (sh interfaces x) ?
You might want to turn on:
debug io ospf adj
and post the results here...
11-04-2021 09:17 AM
i enabled to see what happens... thanks.
11-04-2021 09:41 AM
didnt think of that. here is what i have of the 4 OSPF links.
i guess i should look at that one with thousands of drops.
should i clear the counters of each of the interfaces ? is that just clear counters intx/x ?
i checked the other side of 1/4/1, which is the router .. and it doesnt show drops but instead has lots of unknown protocol drops.
i guess i best clear it all out , since that might be months/yrs old now. but there certainly is a pattern on that one link.
--------------
GigabitEthernet0/0/2 is up, line protocol is up
Hardware is 6XGE-BUILT-IN, address is 00d7.8fa5.8702 (bia 00d7.8fa5.8702)
Description: SW2-G1-4-1
Internet address is 10.10.0.157/30
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive not supported
Full Duplex, 1000Mbps, link type is auto, media type is SX
output flow-control is on, input flow-control is on
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:01, output 00:00:13, output hang never
Last clearing of "show interface" counters never
Input queue: 0/375/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
30 second input rate 2523000 bits/sec, 659 packets/sec
30 second output rate 7794000 bits/sec, 1251 packets/sec
4005640361 packets input, 3466425004245 bytes, 0 no buffer
Received 6 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 11116591 multicast, 0 pause input
5841044750 packets output, 4168885168986 bytes, 0 underruns
0 output errors, 0 collisions, 2 interface resets
297304 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out
-------------
Neighbor ID Pri State Dead Time Address Interface
10.9.7.36 1 FULL/BDR 00:00:03 10.10.0.153 GigabitEthernet2/4/1
10.3.7.65 1 FULL/BDR 00:00:03 10.10.0.149 GigabitEthernet2/3/1
10.2.7.65 1 FULL/BDR 00:00:03 10.10.0.157 GigabitEthernet1/4/1
10.2.3.36 1 FULL/BDR 00:00:03 10.10.0.145 GigabitEthernet1/3/1
int gi2/4/1 | i Input queue Input queue: 0/75/0/1 (size/max/drops/flushes); Total output drops: 0
#sh int gi2/3/1 | i Input queue Input queue: 0/75/0/2 (size/max/drops/flushes); Total output drops: 0
#sh int gi1/4/1 | i Input queue Input queue: 0/75/1025/39 (size/max/drops/flushes); Total output drops: 0
#sh int gi1/3/1 | i Input queue Input queue: 0/75/0/3 (size/max/drops/flushes); Total output drops: 0
11-04-2021 10:24 AM
Hello,
the 'unknown protocol' drops are typically caused by stuff like DTP. How are your trunks set up ?
Actually, if you can post the full configs of all 4 devices, that would make troubleshooting easier...
11-04-2021 11:30 AM
Hello
i wouldn’t read to much into those unknowns drops - as most probably dtp and that can be negated via - switchport nonegotiate however fist t thing is to clear the interface counters as they have never been cleared as such you may be looking at historical information
Post the output of the debug ospf adj and confirm also how you are peering with those rtrs ?
11-04-2021 04:42 PM
not sure why i am not getting any output from the debug OSPF ... ?
sh debug ospf ?
11-05-2021 01:32 AM
Hello
@rikdrt1 wrote:
not sure why i am not getting any output from the debug OSPF ... ?
sh debug ospf ?
You may not have logging enble for either console or monitor, try the following:
Terminal monitor < Remotley connected to device
conf t
Logging console < Physcally connected to device
11-04-2021 01:26 AM
Hello
@rikdrt1 wrote:
ASR1 goes to SW1 and ASR2 goes to SW2 in the core-pair.
If those rtrs are in a vpc pairing then this could be the issue, you should not have vpc l3 towards rtrs, vpc s l2 feature for loop prevention, connection to rtrs with an IGP should be via indivdual links then you would have correct ecmp load balancing.
11-04-2021 02:53 AM - edited 11-04-2021 02:55 AM
At this it is hard to say, You need to look at all the device logs, is the OSPF going all same time, as per the Logs it all going down and coming up same time (since you confirmed no Physical issue)
is the VSS configured NSF with OSFP - please refer to the below document :
how is your OSPF config Layer 3 Physical Interface or Layer 3 SVI ?
If this happening after some time, i am sure there is some toplogy change - which causing the issue. (setup some syslog server capture the logs to identify the issue - before OSPF go down, you may see some other issue to co-related the issue)
11-04-2021 09:36 PM
11-04-2021 09:52 PM
thanks for that info. we didnt have any topology change recently, and we do have tight control over what changes and where.
first thing i thought of is maybe someone plugged in an unauthorized switch somewhere but we have been going thru all the access switches to eliminate that also. very strange.
11-05-2021 01:27 AM
one of your post
#sh int gi1/4/1 | i Input queue Input queue: 0/75/1025/39 (size/max/drops/flushes); Total output drops: 0
I would suggest having the OSPF interface point-to-point here. if you are not peering mesh.
can you post running config of OSPF and Interface config both the side.
11-05-2021 07:19 AM - edited 11-05-2021 07:25 AM
SW side..
router ospf 1
router-id 10.2.15.4
auto-cost reference-bandwidth 100000
redistribute static metric-type 1 subnets
passive-interface default
no passive-interface GigabitEthernet1/3/1
no passive-interface GigabitEthernet1/4/1
no passive-interface GigabitEthernet2/3/1
no passive-interface GigabitEthernet2/4/1
network 10.0.0.0 0.255.255.255 area 0
SW Gi1/4/1
interface GigabitEthernet1/4/1
description ASR2 G2
no switchport
ip address 10.10.0.158 255.255.255.252
ip ospf hello-interval 1
load-interval 30
service-policy input LAN-CoS-Ingress
end
Router OSPF
router ospf 1
router-id 10.2.79.6
auto-cost reference-bandwidth 100000
redistribute static metric-type 1 subnets
passive-interface default
no passive-interface GigabitEthernet0/0/0
no passive-interface GigabitEthernet0/0/1
no passive-interface GigabitEthernet0/0/2
no passive-interface GigabitEthernet0/0/3
Router-INT
interface GigabitEthernet0/0/2
description SW2-G1-4-1
bandwidth 1000000
ip address 10.10.0.157 255.255.255.252
ip ospf hello-interval 1
load-interval 30
negotiation auto
cdp enable
ALL THE OTHER 3 interfaces are basically the same setup .. different IP. super simple and like i said i set this up about 6yrs ago and all was working fine until recently and i really can't tell what changed .. which is why this is so confusing . setup is relatively simple and i have this same thing in other buildings .. same 4 uplinks to dual cores'....
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide