cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
10709
Views
20
Helpful
6
Replies

strange BUFFER_THRESHOLD_EXCEEDED log on N9K

from88
Participant
Participant

Hello,

 

We've this type of device: 

Software
  BIOS: version 05.39
  NXOS: version 7.0(3)I7(8)
  BIOS compile time:  08/30/2019
  NXOS image file is: bootflash:///nxos.7.0.3.I7.8.bin
  NXOS compile time:  3/3/2020 20:00:00 [03/04/2020 04:49:49]


Hardware
  cisco Nexus9000 C93240YC-FX2 Chassis 
  Intel(R) Xeon(R) CPU D-1526 @ 1.80GHz with 24571696 kB of memory.
  Processor Board ID FDO24080WJB

  Device name: CORE01
  bootflash:  115805708 kB
Kernel uptime is 259 day(s), 20 hour(s), 12 minute(s), 34 second(s)

 

 

And sometimes logs like these appears:

%TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool-group buffer 90 percent threshold is exceeded!

Could this bug which is descripted here: https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/7-x/release/notes/70379_nxos_rn.html and here https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvu69850 be the reason, why new 100G link goes sometimes flaps ? or this bug doesn't do anything with flapping it just not correctly reacts to it ? Thanks

 

could

 

 

 

 

6 Replies 6

marce1000
VIP Mentor VIP Mentor
VIP Mentor

                             >....why new 100G link goes sometimes flaps 

 Probably not , check the logs on the switch when this happens, look for additional info's if any. As far as 

TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED

 is concerned, you may try using the recommended software version as mentioned in this document :

        https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/recommended_release/b_Minimum_and_Recommended_Cisco_NX-OS_Releases_for_Cisco_Nexus_9000_Series_Switches.html

 

          check if the problem remains in place or not afterwards

 

 M.

Hi @marce1000 i have this exact issue even after upgrading the NXOS to 9.3(8) as recommended?

What's the command to see the current configured buffer depth on the interface?

Hello!

The show hardware internal buffer info pkt-stats command when attached to the relevant module with the attach module <x> command will display an instantaneous snapshot of each ASIC slice's buffer utilization.

For more information about the BUFFER_THRESHOLD_EXCEEDED syslog, I highly recommend reviewing the Understand the TAHUSD BUFFER_THRESHOLD_EXCEEDED Syslog and Congestion on Nexus 9000 Cloud Scale ASIC NX-OS Switches document. This document contains details abut what this syslog means and how you can identify congested egress interfaces on Cisco Nexus 9000 Series switches with the Cloud Scale ASIC.

I hope this helps - thank you!

-Christopher

something is definitely up with these later updates.  We were running stable on a much older version of nx-os but when we updated to 9.3(7) our switches became unstable to the point traffic wasn't passing as expected disrupting both services and storage traffic.

We were running stable for up to 600 days on the much older OS version.

9.3(8) is just as unstable as 9.3(7) and there is definitely something wrong in it causing our switches to slow, drop traffic, and spam buffer full logs with not much change to our network architecture since prior to the update with the exception of 10-20 additional 10G links which we had capacity for our 2 x 2 side by side VPC design.

Due to how my company interacts with cisco I'm unable to directly open a TAC case right now, but this needs escalation as the error is appearing within hours of a reboot whereas the first instance showed up 4 months after first reboot.

Started seeing this error on C93240YC-FX2 running 9.3(5)

we updated past 9.3(x) and the issues reduced to one port from 10.
Our current suspicion for our issue is SDN causing packet splitting going from a 1500 mtu to a 14xx mtu.

hope this helps!

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: