cancel
Showing results for 
Search instead for 
Did you mean: 
cancel

Who Me Too'd this topic

TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED (Nexus9000)

Ozy
Level 1
Level 1

Hello everyone!

I'm not a network engineer but as a subsystem (kernel + filesystem) engineer I know the network concept.

Few months ago, I designed a DMZ and configured my 2 node VPC with 6 rack switch.

Everything was smoothly working but I started to see some problems a month ago and my life turn into hell:

 

2024 Mar 28 22:48:09 NILE1 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool-group 
buffer 90 percent threshold is exceeded!
2024 Mar 28 22:50:10 NILE1 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool-group 
buffer 90 percent threshold is exceeded! (message repeated 1 time)
2024 Mar 28 22:52:10 NILE1 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool-group 
buffer 90 percent threshold is exceeded! (message repeated 1 time)


I was using "nxos.9.3.9", I saw a bug report and solution was upgrade and I upgrade it to "nxos.9.3.13" But my problem not solved.

I don't know what is the issue and I'm not able to digg due to I don't know how to diagnost..

When I get the buffer error all the packages are drops thats what I know.

 

VPC-SW-2#     show interface counters errors non-zero

--------------------------------------------------------------------------------
Port          Align-Err    FCS-Err   Xmit-Err    Rcv-Err  UnderSize OutDiscards
--------------------------------------------------------------------------------
Eth1/1                0          0          0          0          0      369651
Eth1/8                0          0          0          0          0     1968446
Eth1/9                0          0          0          0          0      124332
Eth1/17               0          0          0          0          0      101073
Eth1/18               0          0          0          0          0      102809
Eth1/19               0          0          0          0          0      100208
Eth1/20               0          0          0          0          0      102725
Eth1/21               0          0          0          0          0      102590
Eth1/25               0          0          0          0          0       48752
Eth1/26               0          0          0          0          0      102281
Eth1/27               0          0          0          0          0       70208
Eth1/28               0          0          0          0          0      102652
Eth1/34               0          0          0          0          0      102646
Eth1/35               0          0          0          0          0      102849
Eth1/36               0          0          0          0          0      102430
Eth1/42               0          0          0          0          0     1968435
Eth1/43               0          0          0          0          0     1966384
Eth1/44               0          0          0          0          0     1968448
Eth1/46               0          0          0          0          0       32722
Eth1/47               0          0          0          0          0       45342
Eth1/48               0          0          0          0          0       24724
Eth1/49               0          0          0          0          0      102454
Eth1/50               0          0          0          0          0       99501
Eth1/51               0          0          0          0          0      100564
Eth1/52               0          0          0          0          0      102824
Eth1/53               0          0          0          0          0      102935
Eth1/54               0          0          0          0          0      103074
Po8                   0          0          0          0          0     1968446
Po9                   0          0          0          0          0      124332
Po17                  0          0          0          0          0      101073
Po18                  0          0          0          0          0      102809
Po19                  0          0          0          0          0      100208
Po20                  0          0          0          0          0      102725
Po21                  0          0          0          0          0      102590
Po25                  0          0          0          0          0       48752
Po26                  0          0          0          0          0      102281
Po27                  0          0          0          0          0       70208
Po28                  0          0          0          0          0      102652
Po34                  0          0          0          0          0      102646
Po35                  0          0          0          0          0      102849
Po36                  0          0          0          0          0      102430
Po42                  0          0          0          0          0     1968435
Po43                  0          0          0          0          0     1966384
Po44                  0          0          0          0          0     1968448
Po49                  0          0          0          0          0      102454
Po50                  0          0          0          0          0       99501
Po51                  0          0          0          0          0      100564
Po52                  0          0          0          0          0      102824
Po53                  0          0          0          0          0      102935
Po54                  0          0          0          0          0      103074
Po100                 0          0          0          0          0      102788

What changed? Maybe wrong cabling overtime my best bet..

I have some IPMI switches and I shut their port now and hunting the root cause.

My switches are:

VPC-SW-1     : C93180YC-FX3 [ BIOS: version 01.09 | NXOS: version 9.3(13) ]
VPC-SW-2     : C93180YC-FX3 [ BIOS: version 01.09 | NXOS: version 9.3(13) ]
datasw-aa-03: C93180YC-FX   [ BIOS: version 05.51 | NXOS: version 9.3(13) ]
datasw-aa-04: C93180YC-FX   [ BIOS: version 05.51 | NXOS: version 9.3(13) ]
datasw-aa-06: C93180YC-FX   [ BIOS: version 05.51 | NXOS: version 9.3(13) ]
datasw-aa-08: C93180YC-FX3 [ BIOS: version 01.09 | NXOS: version 9.3(13) ] 
datasw-aa-10: C92160YC-X     [ BIOS: version 07.41 | NXOS: version 7.0(3)I3(1) ]
datasw-aa-11: C92160YC-X     [ BIOS: version 07.41 | NXOS: version 7.0(3)I3(1) ]

 

 

VPC-SW-1# sh logg |include BUFFER_THRESHOLD_EXCEEDED | last 3
2024 Mar 28 22:48:09 NILE1 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool-group 
buffer 90 percent threshold is exceeded!
2024 Mar 28 22:50:10 NILE1 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool-group 
buffer 90 percent threshold is exceeded! (message repeated 1 time)
2024 Mar 28 22:52:10 NILE1 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool-group 
buffer 90 percent threshold is exceeded! (message repeated 1 time)
---------------------------------------------------------------------------------------
VPC-SW-2# sh logg |include BUFFER_THRESHOLD_EXCEEDED | last 3
2024 Mar 28 22:48:18 NILE2 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool-group 
buffer 90 percent threshold is exceeded!
2024 Mar 28 22:49:03 NILE2 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool-group 
buffer 90 percent threshold is exceeded! (message repeated 2 times)
2024 Mar 28 22:51:58 NILE2 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool-group 
buffer 90 percent threshold is exceeded! (message repeated 3 times)
---------------------------------------------------------------------------------------
datasw-aa-03# sh logg |include BUFFER_THRESHOLD_EXCEEDED | last 3
2024 Mar 21 18:30:34 datasw-aa-03 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool
-group buffer 90 percent threshold is exceeded!
2024 Mar 27 02:09:17 datasw-aa-03 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool
-group buffer 90 percent threshold is exceeded!
2024 Mar 27 02:11:18 datasw-aa-03 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool
-group buffer 90 percent threshold is exceeded!
---------------------------------------------------------------------------------------
datasw-aa-04# sh logg |include BUFFER_THRESHOLD_EXCEEDED | last 3
2024 Mar 27 00:56:36 datasw-aa-04 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool
-group buffer 90 percent threshold is exceeded! (message repeated 1 time)
2024 Mar 28 22:49:58 datasw-aa-04 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool
-group buffer 90 percent threshold is exceeded!
2024 Mar 28 22:51:58 datasw-aa-04 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool
-group buffer 90 percent threshold is exceeded! (message repeated 1 time)
---------------------------------------------------------------------------------------
datasw-aa-06# sh logg |include BUFFER_THRESHOLD_EXCEEDED | last 3
2024 Mar 21 20:53:36 datasw-aa-06 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool
-group buffer 90 percent threshold is exceeded!
2024 Mar 21 22:13:36 datasw-aa-06 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool
-group buffer 90 percent threshold is exceeded!
2024 Mar 21 22:15:36 datasw-aa-06 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool
-group buffer 90 percent threshold is exceeded!
---------------------------------------------------------------------------------------
datasw-aa-08# sh logg |include BUFFER_THRESHOLD_EXCEEDED | last 3
2024 Mar 27 16:44:26 datasw-aa-08 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool
-group buffer 90 percent threshold is exceeded! (message repeated 1 time)
2024 Mar 27 16:46:26 datasw-aa-08 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool
-group buffer 90 percent threshold is exceeded! (message repeated 1 time)
2024 Mar 27 16:55:56 datasw-aa-08 %TAHUSD-SLOT1-4-BUFFER_THRESHOLD_EXCEEDED: Module 1 Instance 0 Pool
-group buffer 90 percent threshold is exceeded! (message repeated 1 time)
---------------------------------------------------------------------------------------
datasw-aa-10# sh logg |include BUFFER_THRESHOLD_EXCEEDED | last 3
datasw-aa-10# 
---------------------------------------------------------------------------------------
datasw-aa-11# sh logg |include BUFFER_THRESHOLD_EXCEEDED | last 3
datasw-aa-11# 

 

The interesting part is I only do not see this issue on datasw-aa-10 and 11 "C92160YC-X [ BIOS: version 07.41 | NXOS: version 7.0(3)I3(1) ]"

 

Dear experienced network engineers...

Even before I find the command "show interface counters errors non-zero" I was suffering with "sh int | include discard".

As you can see I don't know how to check logs, monitor ports etc.  

Please help me to find the root cause. What should I do?

 

 

 

Who Me Too'd this topic