cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
10216
Views
10
Helpful
6
Replies

N9K-C9372PX Output Discards

from88
Level 4
Level 4

Hello,

We're experiencing output discards in couple of our N9K-C9372PXs. Seems like egress queues fills and it ends up in a lots of discarded packets.


All interfaces where discards occur are FEX ports.

show interface ethernet 1/21 | inc "output discard"
0 lost carrier 0 no carrier 0 babble 32043410 output discard

Bandwidth wise they're OK. I think it's due to bursty traffic from multiple uplinks

So, things like DNS queries - which're using UDP packets are really damaged by that.

Could you say how to check the size of the buffer and how they're filling up ?


I'm wondering: is there are any workaround for this type of behavior ?

Now we're using:

show hardware qos ns-buffer-profile 
NS Buffer Profile: Burst optimized

i see there are commands like these:

hardware qos ns-buffer-profile ultra-burst

Maybe you know, would ultra-burst mode help in our situation ? And what is requirements on entering that command ? do it requires reboot ?

or maybe it's possible to make some type of queuing where we'd treat UDP packets differently ?

also we have few older N5K-C5672UP, N5K-C5548UP and them are not experiencing it. Maybe they have 'bigger hardware' buffers ?

Thanks for any input. 

6 Replies 6

from88
Level 4
Level 4

someone ?

 

its 0.03% of traffic. Quite high in terms of lost packets.

Hello,

 

I think buffer and queue tuning options are limited, try and enable flow control on the interface and check if that makes a difference:

 

interface Ethernet 1/21

flowcontrol send on

flowcontrol receive on

thank you, im wondering how that supposed to help. That port where the discards are occuring is FEX port. So, that interface will start to listening to pause frames ? Btw I tried to check thing related to flow control i see lots of TX pause frames on FEX ports.

 

Nexus3A# show  interface  flowcontrol fex 124

--------------------------------------------------------------------------------
Port         Send FlowControl  Receive FlowControl  RxPause   TxPause  
             admin    oper     admin    oper
--------------------------------------------------------------------------------
Eth124/1/1   on       on       off      off         0         242193068
Eth124/1/2   on       on       off      off         0         266813791
Eth124/1/3   on       on       off      off         0         272267423
Eth124/1/4   on       on       off      off         0         246887448
Eth124/1/5   on       on       off      off         0         251238116
Eth124/1/6   on       on       off      off         0         248983081
Eth124/1/7   on       on       off      off         0         254025228
Eth124/1/8   on       on       off      off         0         245693038
Eth124/1/9   on       on       off      off         0         257724327
Eth124/1/10  on       on       off      off         0         232567506
Eth124/1/11  on       on       off      off         0         176434018
Eth124/1/12  on       on       off      off         0         405106259
Eth124/1/13  on       on       off      off         0         240322964
Eth124/1/14  on       on       off      off         0         227890549

As i presume FEX is sending pause frames to the servers, but why ? Is it directly related to discards im seeing ?

Hello from88,

a device sends IEEE Pause frames for asking some time of silence from the ther other side.

The sending of IEEE Pause frames should minimize input errors of type ignored or overruns.

Each PAUSE frame asks for a little silence time interval to the other device.

 

(the time equivalent of 512 bit times)

 

see

http://en.wikipedia.org/wiki/Ethernet_flow_control

 

see also this related thread

 

https://community.cisco.com/t5/switching/pause-output-errors/td-p/1433255

 

I would say that the fact FEX interfaces are sending pause frames is a sign of a performance issues.

Note that the pause frames will move the pressure to the outgoing buffers and queues of connected device.

 

To be noted all RX Pause frames counters are 0, this is beacuse receive flow control is disabled on all FEX ports.

 

Edit:

>> about output discards rate :

its 0.03% of traffic. Quite high in terms of lost packets.

A packet loss probability of 3 every 10000 is just a little more then acceptable for TCP traffic that should support packet loss ratio of 1 every 10000 without great performance penalty.

It is interesting that older Nexus devices are not showing any output discards.

 

Hope to help

Giuseppe

 

Thank you, i will dig about flow control thing.

 

Ok, I could understand that FEX is overwhelmed with traffic. But the output discards are occurring on N9K downlink which points to FEX Uplink. 

Also, about TCP thing. If 0.01 is acceptable for TCP , what is for UDP ? It's hard to say for service Owners to switch from UDP to TCP. I dont know if't even possible..

 

i think it;s 2 different problems. 1. Nexus Output discards (which is really impacting UDP services like DNS)

2) FEXes TX-PAUSEs.. i see we have it on every fex interface, even where're no discards present.

 

Could you give any advice on Nexus output discards ?

my last question: As Northstar interfaces is 40GB ones.

do change'ing this  hardware qos ns-buffer-profile ultra-burst  changes something  when my traffic is passing only on 10GB interfaces ? Or it's works only when 40GB interfaces is used ?

Review Cisco Networking products for a $25 gift card