Re: strange BUFFER_THRESHOLD_EXCEEDED log on N9K

from88 · ‎04-06-2021

Hello,

We've this type of device:

Software
  BIOS: version 05.39
  NXOS: version 7.0(3)I7(8)
  BIOS compile time:  08/30/2019
  NXOS image file is: bootflash:///nxos.7.0.3.I7.8.bin
  NXOS compile time:  3/3/2020 20:00:00 [03/04/2020 04:49:49]


Hardware
  cisco Nexus9000 C93240YC-FX2 Chassis 
  Intel(R) Xeon(R) CPU D-1526 @ 1.80GHz with 24571696 kB of memory.
  Processor Board ID FDO24080WJB

  Device name: CORE01
  bootflash:  115805708 kB
Kernel uptime is 259 day(s), 20 hour(s), 12 minute(s), 34 second(s)

And sometimes logs like these appears:

%TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool-group buffer 90 percent threshold is exceeded!

Could this bug which is descripted here: https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/7-x/release/notes/70379_nxos_rn.html and here https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvu69850 be the reason, why new 100G link goes sometimes flaps ? or this bug doesn't do anything with flapping it just not correctly reacts to it ? Thanks

could

marce1000 · ‎04-06-2021

>....why new 100G link goes sometimes flaps

Probably not , check the logs on the switch when this happens, look for additional info's if any. As far as

TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED

is concerned, you may try using the recommended software version as mentioned in this document :

https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/recommended_release/b_Minimum_and_Recommended_Cisco_NX-OS_Releases_for_Cisco_Nexus_9000_Series_Switches.html

check if the problem remains in place or not afterwards

M.

-- ' 'Good body every evening' ' this sentence was once spotted on a logo at the entrance of a Weight Watchers Club !

rmikisa1 · ‎09-27-2021

Hi @marce1000 i have this exact issue even after upgrading the NXOS to 9.3(8) as recommended?

What's the command to see the current configured buffer depth on the interface?

Christopher Hart · ‎09-27-2021

Hello!

The show hardware internal buffer info pkt-stats command when attached to the relevant module with the attach module <x> command will display an instantaneous snapshot of each ASIC slice's buffer utilization.

For more information about the BUFFER_THRESHOLD_EXCEEDED syslog, I highly recommend reviewing the Understand the TAHUSD BUFFER_THRESHOLD_EXCEEDED Syslog and Congestion on Nexus 9000 Cloud Scale ASIC NX-OS Switches document. This document contains details abut what this syslog means and how you can identify congested egress interfaces on Cisco Nexus 9000 Series switches with the Cloud Scale ASIC.

I hope this helps - thank you!

-Christopher

NetEng_Worky1 · ‎12-07-2021

something is definitely up with these later updates. We were running stable on a much older version of nx-os but when we updated to 9.3(7) our switches became unstable to the point traffic wasn't passing as expected disrupting both services and storage traffic.

We were running stable for up to 600 days on the much older OS version.

9.3(8) is just as unstable as 9.3(7) and there is definitely something wrong in it causing our switches to slow, drop traffic, and spam buffer full logs with not much change to our network architecture since prior to the update with the exception of 10-20 additional 10G links which we had capacity for our 2 x 2 side by side VPC design.

Due to how my company interacts with cisco I'm unable to directly open a TAC case right now, but this needs escalation as the error is appearing within hours of a reboot whereas the first instance showed up 4 months after first reboot.

franklinb · ‎03-30-2022

Started seeing this error on C93240YC-FX2 running 9.3(5)

NetEng_Worky1 · ‎03-30-2022

we updated past 9.3(x) and the issues reduced to one port from 10.
Our current suspicion for our issue is SDN causing packet splitting going from a 1500 mtu to a 14xx mtu.

hope this helps!

kwuenP · ‎06-28-2023

we have the same issue with cisco Nexus9000 C9364C Chassis

Software
BIOS: version 05.44
NXOS: version 9.3(8)
BIOS compile time: 04/02/2021
NXOS image file is: bootflash:///nxos.9.3.8.bin
NXOS compile time: 8/4/2021 13:00:00 [08/05/2021 05:25:26]

at the time syslog generate "%TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 3 Pool-group buffer 90 percent threshold is exceeded!", the switch drop traffic and cause fatal loss in traffic. We tried to reload the switch but it still happen after that