We've this type of device:
Software BIOS: version 05.39 NXOS: version 7.0(3)I7(8) BIOS compile time: 08/30/2019 NXOS image file is: bootflash:///nxos.7.0.3.I7.8.bin NXOS compile time: 3/3/2020 20:00:00 [03/04/2020 04:49:49] Hardware cisco Nexus9000 C93240YC-FX2 Chassis Intel(R) Xeon(R) CPU D-1526 @ 1.80GHz with 24571696 kB of memory. Processor Board ID FDO24080WJB Device name: CORE01 bootflash: 115805708 kB Kernel uptime is 259 day(s), 20 hour(s), 12 minute(s), 34 second(s)
And sometimes logs like these appears:
%TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool-group buffer 90 percent threshold is exceeded!
Could this bug which is descripted here: https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/7-x/release/notes/70379_nxos_rn.html and here https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvu69850 be the reason, why new 100G link goes sometimes flaps ? or this bug doesn't do anything with flapping it just not correctly reacts to it ? Thanks
>....why new 100G link goes sometimes flaps
Probably not , check the logs on the switch when this happens, look for additional info's if any. As far as
is concerned, you may try using the recommended software version as mentioned in this document :
check if the problem remains in place or not afterwards
The show hardware internal buffer info pkt-stats command when attached to the relevant module with the attach module <x> command will display an instantaneous snapshot of each ASIC slice's buffer utilization.
For more information about the BUFFER_THRESHOLD_EXCEEDED syslog, I highly recommend reviewing the Understand the TAHUSD BUFFER_THRESHOLD_EXCEEDED Syslog and Congestion on Nexus 9000 Cloud Scale ASIC NX-OS Switches document. This document contains details abut what this syslog means and how you can identify congested egress interfaces on Cisco Nexus 9000 Series switches with the Cloud Scale ASIC.
I hope this helps - thank you!
something is definitely up with these later updates. We were running stable on a much older version of nx-os but when we updated to 9.3(7) our switches became unstable to the point traffic wasn't passing as expected disrupting both services and storage traffic.
We were running stable for up to 600 days on the much older OS version.
9.3(8) is just as unstable as 9.3(7) and there is definitely something wrong in it causing our switches to slow, drop traffic, and spam buffer full logs with not much change to our network architecture since prior to the update with the exception of 10-20 additional 10G links which we had capacity for our 2 x 2 side by side VPC design.
Due to how my company interacts with cisco I'm unable to directly open a TAC case right now, but this needs escalation as the error is appearing within hours of a reboot whereas the first instance showed up 4 months after first reboot.
we have the same issue with cisco Nexus9000 C9364C Chassis
BIOS: version 05.44
NXOS: version 9.3(8)
BIOS compile time: 04/02/2021
NXOS image file is: bootflash:///nxos.9.3.8.bin
NXOS compile time: 8/4/2021 13:00:00 [08/05/2021 05:25:26]
at the time syslog generate "%TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 3 Pool-group buffer 90 percent threshold is exceeded!", the switch drop traffic and cause fatal loss in traffic. We tried to reload the switch but it still happen after that