10-26-2021 03:06 AM
Maybe someone can help me out here.
I'm using a Cisco ASR 1002-x with two SPA-1X10GE-L-V2 cards. One for WAN and one for my LAN.
However I am randomly dropping bandwidth and pings to my BGP peer on my WAN. They are stating its not on their end..
This is what I have so far:
No links go down when it happens. No alarms are triggered either.
I have a 5G ESP installed in the Cisco, and the maximum TX bandwidth Ive pushed in all directions is 3 Gbps.
When it does drop the whole network goes down completely, my SNMP traffic capture shows the drop in bandwidth on both interfaces. These are the only two interfaces I use on the router. The drops are pretty brief about 3 mins each. Sometimes it can go a week with no issues, other times its multiple times per day. I've checked all physical connections, the switch on the LAN side, and of course the subslots and transceivers.
Here are the outputs of show interface (IP addresses excluded)
WAN interface:
TenGigabitEthernet0/2/0 is up, line protocol is up
Hardware is SPA-1X10GE-L-V2, address is c471.fe9e.9220 (bia c471.fe9e.9220)
Description: PFN-WAN-IN
MTU 1500 bytes, BW 10000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 3/255, rxload 20/255
Encapsulation ARPA, loopback not set
Keepalive not supported
Full Duplex, 10000Mbps, link type is force-up, media type is 10GBase-SR
output flow-control is on, input flow-control is on
ARP type: ARPA, ARP Timeout 04:00:00
Last input 03:41:05, output 03:41:05, output hang never
Last clearing of "show interface" counters 1w5d
Input queue: 5/375/80301/29822 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 791908000 bits/sec, 81378 packets/sec
5 minute output rate 130106000 bits/sec, 36984 packets/sec
151441936953 packets input, 195026467118484 bytes, 0 no buffer
Received 0 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
52466060 input errors, 0 CRC, 0 frame, 52466060 overrun, 0 ignored
0 watchdog, 6190 multicast, 0 pause input
61089778297 packets output, 16363564168894 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out
LAN Interface:
TenGigabitEthernet0/3/0 is up, line protocol is up
Hardware is SPA-1X10GE-L-V2, address is c471.fe9e.9230 (bia c471.fe9e.9230)
MTU 1500 bytes, BW 10000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 19/255, rxload 3/255
Encapsulation 802.1Q Virtual LAN, Vlan ID 1., loopback not set
Keepalive not supported
Full Duplex, 10000Mbps, link type is force-up, media type is 10GBase-SR
output flow-control is on, input flow-control is on
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:00, output 00:00:00, output hang never
Last clearing of "show interface" counters 08:47:54
Input queue: 0/375/0/1418 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 155048000 bits/sec, 43271 packets/sec
5 minute output rate 752332000 bits/sec, 82386 packets/sec
1662017565 packets input, 578807299248 bytes, 0 no buffer
Received 3730125 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 1673756 multicast, 0 pause input
3613821370 packets output, 4459177693037 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
120451 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out
Here are the transceiver statuses
The Transceiver in slot 0 subslot 2 port 0 is enabled.
Module temperature = 29.382 C
Transceiver Tx bias current = 10110 uAmps
Transceiver Tx power = -2.2 dBm
Transceiver Rx optical power = -5.5 dBm
The Transceiver in slot 0 subslot 3 port 0 is enabled.
Module temperature = 31.082 C
Transceiver Tx bias current = 10122 uAmps
Transceiver Tx power = -2.2 dBm
Transceiver Rx optical power = -5.5 dBm
And the qfp utilization (the max Ive ever seen this is 45%)
CPP 0: 5 secs 1 min 5 min 60 min
Input: Total (pps) 139846 122885 126820 123881
(bps) 1038519704 901127984 938178976 923344608
Output: Total (pps) 139571 122152 126145 123164
(bps) 1040124864 902409456 939363024 924495968
Processing: Load (pct) 17 15 16 15
At this point I just dont know where to look.. I did check my core switch, but from my Cisco Im also dropping connection to the outside world, so its before the LAN anyways..
Any help or a finger in any direction would be great!
Thanks!
10-26-2021 03:36 AM
where is the WAN Interface connected that need to be investigated :
See the input Queue and Drops as below :
Input queue: 5/375/80301/29822 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 791908000 bits/sec, 81378 packets/sec
5 minute output rate 130106000 bits/sec, 36984 packets/sec
151441936953 packets input, 195026467118484 bytes, 0 no buffer
Received 0 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
52466060 input errors, 0 CRC, 0 frame, 52466060 overrun, 0 ignored
Troubleshoot :
https://www.cisco.com/c/en/us/support/docs/routers/10000-series-routers/6343-queue-drops.html
10-26-2021 03:48 AM
The WAN interface is connected to our ISPs switch..
The switch they are stating there are no issues with..
I dont understand how I can have overflow on a 10G interface in and 10G out, when only using 2G..
My provider has me capped at 3Gbps, and I'm not anywhere near that..
10-26-2021 03:37 AM
Hello,
WAN interface:
TenGigabitEthernet0/2/0 is up, line protocol is up
Hardware is SPA-1X10GE-L-V2, address is c471.fe9e.9220 (bia c471.fe9e.9220)
Description: PFN-WAN-IN
--> Queueing strategy: fifo
--> 52466060 input errors, 0 CRC, 0 frame, 52466060 overrun, 0 ignored
I am not sure if the 10G interfaces support anything other than 'fifo' but check if 'weighted fair' is available under the interface.
The input errors and overruns are typically caused by microbursts, and basically, there is too much incoming traffic which the interface cannot handle.
Post the output of:
show buffers
10-26-2021 03:46 AM
I dont see "Weighted Fair" available under the interface..
But here is the buffer output:
Buffer elements:
2126 in free list
2922586376 hits, 0 misses, 1242 created
Public buffer pools:
Small buffers, 104 bytes (total 1200, permanent 1200, peak 1310 @ 7w0d):
1196 in free list (200 min, 2500 max allowed)
2160215107 hits, 357 misses, 189 trims, 189 created
23 failures (0 no memory)
Middle buffers, 600 bytes (total 900, permanent 900, peak 1400 @ 7w0d):
896 in free list (100 min, 2000 max allowed)
1313222600 hits, 5603 misses, 7536 trims, 7536 created
0 failures (0 no memory)
Big buffers, 1536 bytes (total 900, permanent 900, peak 924 @ 7w0d):
900 in free list (50 min, 1800 max allowed)
2222079823 hits, 36 misses, 24 trims, 24 created
0 failures (0 no memory)
VeryBig buffers, 4520 bytes (total 100, permanent 100, peak 104 @ 7w0d):
100 in free list (0 min, 300 max allowed)
77564196 hits, 25 misses, 5 trims, 5 created
25 failures (0 no memory)
Large buffers, 5024 bytes (total 100, permanent 100, peak 101 @ 7w0d):
100 in free list (0 min, 300 max allowed)
31 hits, 0 misses, 1 trims, 1 created
0 failures (0 no memory)
VeryLarge buffers, 8228 bytes (total 100, permanent 100):
100 in free list (0 min, 300 max allowed)
10582 hits, 0 misses, 0 trims, 0 created
0 failures (0 no memory)
Huge buffers, 18024 bytes (total 20, permanent 20, peak 21 @ 7w0d):
20 in free list (0 min, 33 max allowed)
6 hits, 0 misses, 1 trims, 1 created
0 failures (0 no memory)
Interface buffer pools:
CF Small buffers, 104 bytes (total 101, permanent 100, peak 101 @ 7w0d):
101 in free list (100 min, 200 max allowed)
0 hits, 0 misses, 3930 trims, 3931 created
0 failures (0 no memory)
Generic ED Pool buffers, 512 bytes (total 101, permanent 100, peak 101 @ 7w0d):
101 in free list (100 min, 100 max allowed)
0 hits, 0 misses
CF Middle buffers, 600 bytes (total 101, permanent 100, peak 101 @ 7w0d):
101 in free list (100 min, 200 max allowed)
0 hits, 0 misses, 3930 trims, 3931 created
0 failures (0 no memory)
Syslog ED Pool buffers, 600 bytes (total 1057, permanent 1056, peak 1057 @ 7w0d):
1025 in free list (1056 min, 1056 max allowed)
1392357946 hits, 0 misses
EOBC0 buffers, 1524 bytes (total 256, permanent 256):
256 in free list (0 min, 256 max allowed)
237 hits, 0 fallbacks
CF Big buffers, 1536 bytes (total 26, permanent 25, peak 26 @ 7w0d):
26 in free list (25 min, 50 max allowed)
0 hits, 0 misses, 3930 trims, 3931 created
0 failures (0 no memory)
IPC buffers, 4096 bytes (total 1176, permanent 1176):
1159 in free list (392 min, 3920 max allowed)
251 hits, 0 fallbacks, 0 trims, 0 created
0 failures (0 no memory)
CF VeryBig buffers, 4520 bytes (total 3, permanent 2, peak 3 @ 7w0d):
3 in free list (2 min, 4 max allowed)
0 hits, 0 misses, 3930 trims, 3931 created
0 failures (0 no memory)
CF Large buffers, 5024 bytes (total 2, permanent 1, peak 2 @ 7w0d):
2 in free list (1 min, 2 max allowed)
0 hits, 0 misses, 3930 trims, 3931 created
0 failures (0 no memory)
IPC Medium buffers, 16384 bytes (total 2, permanent 2):
2 in free list (1 min, 8 max allowed)
0 hits, 0 fallbacks, 0 trims, 0 created
0 failures (0 no memory)
Private Huge IPC buffers, 18024 bytes (total 1, permanent 0, peak 1 @ 7w0d):
1 in free list (0 min, 4 max allowed)
0 hits, 0 misses, 3930 trims, 3931 created
0 failures (0 no memory)
Private Huge buffers, 65280 bytes (total 1, permanent 0, peak 1 @ 7w0d):
1 in free list (0 min, 4 max allowed)
0 hits, 0 misses, 3930 trims, 3931 created
0 failures (0 no memory)
IPC Large buffers, 65535 bytes (total 17, permanent 16, peak 17 @ 7w0d):
17 in free list (16 min, 16 max allowed)
0 hits, 0 misses, 633761 trims, 633762 created
0 failures (0 no memory)
Header pools:
Header buffers, 0 bytes (total 266, permanent 256, peak 266 @ 7w0d):
10 in free list (10 min, 512 max allowed)
253 hits, 3 misses, 0 trims, 10 created
0 failures (0 no memory)
256 max cache size, 256 in cache
898638885 hits in cache, 0 misses in cache
Particle Clones:
1024 clones, 0 hits, 0 misses
Public particle pools:
F/S buffers, 256 bytes (total 384, permanent 384):
128 in free list (128 min, 1024 max allowed)
256 hits, 0 misses, 0 trims, 0 created
0 failures (0 no memory)
256 max cache size, 256 in cache
0 hits in cache, 0 misses in cache
Normal buffers, 512 bytes (total 512, permanent 512):
384 in free list (128 min, 1024 max allowed)
128 hits, 0 misses, 0 trims, 0 created
0 failures (0 no memory)
128 max cache size, 128 in cache
0 hits in cache, 0 misses in cache
Private particle pools:
lsmpi_rx buffers, 416 bytes (total 8194, permanent 8194):
0 in free list (0 min, 8194 max allowed)
8194 hits, 0 misses
8194 max cache size, 0 in cache
906905822 hits in cache, 0 misses in cache
lsmpi_tx buffers, 416 bytes (total 4098, permanent 4098):
0 in free list (0 min, 4098 max allowed)
4098 hits, 0 misses
4098 max cache size, 4097 in cache
999528638 hits in cache, 0 misses in cache
10-26-2021 04:07 AM
Hello,
the input and overrun errors are often due to microbursts, which you don't see in the interface statistics, because they are too short.
Try to tune the buffers globally, start with the values below:
buffers small permanent 2400
buffers small min-free 400
buffers small max-free 5000
!
buffers middle permanent 1800
buffers middle min-free 200
buffers middle max-free 4000
!
buffers big permanent 1800
buffers big min-free 100
buffers big max-free 3600
!
buffers verybig permanent 200
buffers verybig min-free 100
buffers verybig max-free 600
10-26-2021 04:42 AM
I increased the global buffers and cleared the counters on the interfaces.
What I dont get is the complete loss of traffic.. I go from receiving 2Gbps to nothing, and when it happens I have no connection to the outside world from the cisco on that WAN interface.. Like I'm losing WAN connection from my provider. (even though they say its me..)
I would imagine the buffer would drop packets and not drop all traffic right?
Its been running like this for years.. But definitely the buffer is still an issue, so I will start there.
Thank you
10-26-2021 08:04 AM
Hello
not much i guess you can do on the rtr that’s encountering these overruns apart from tweekint the interface buffers but i’m quite positive it won’t fix the issue unless you upgrade-you could try negating the traffic rate at the other end if that’s under your control
10-26-2021 10:05 AM
". . . and the maximum TX bandwidth Ive pushed in all directions is 3 Gbps. "
BTW, 10g interfaces always transmit at 10g. 3 Gbps is a usage average over some time period.
Although you have ingress queue issues on both LAN and WAN interfaces, as noted by the others, your WAN interface is showing the biggest ingress problem.
Possibly pushing the ingress queue limit up might help (try 4K, if supported) on the WAN interface.
What might also help, if not so configured, is enabling PMTUD to get BGP to use MTU (rather than the minimal 576).
Also BTW, if you wish to better optimize buffer allocation, later IOSs support an "auto" optimize command for that (although I forget the exact syntax of the command).
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide