Solved: Re: Nexus 9K + FEX TX Pause

morabusa · ‎02-02-2021

Hi, I am getting tons of TX Pause in some of my Host Interfaces (HIF), and I am strugling to find the issue. We have a 8 Fabric Interfaces and just 11 HIFs (All TenGiga Interfaces), and my uplinks interfaces looks like are not over 30-40% of load. The Port Channel hash is source IP - Destination IP (same for MAC).

I am thinking that maybe something is going wrong with the load sharing for FEX uplink interfaces, and one or two of them are getting eventually congested. Is there a way to troubleshoot NIF (Physical uplink interfaces on the FEX). Thank you very much.

Sergiu.Daniluk · ‎02-02-2021

Hi @morabusa

In the links I referenced in previous post, there is a diagram with how the internal architecture looks like for your N2K model:

There are 4 subsystems for the HIF side. You should spread the HIFs along those SS.

In the other link, there are additional suggestions on how you could avoid drops of traffic due to bursts, specifically on 2232:

Conclusions and Best Practices

1. TX Pause is normal operational mechanism in order to avoid packet drops in 2232/2248UPQ/B22 FEX.

2. Maximise number of uplinks between 2232/2248UPQ/B22 FEX and parent. In order to be able to have more paths towards the network and also it helps to have max buffers for N2H traffic.

3. If uplinks between FEX and parent and not evenly used, the change of port-channel hashing can help.

4. Since there is no local switching on FEX, avoid have east-west traffic flow profiles on hosts on FEX.

5. Avoid bursty appliances such as NAS devices, blade chassis on FEXes. These need to be on the parent.

6. Newer 2348UPQ FEX with 32M shared buffer, has 1MB shared buffer per HIF for H2N traffic for better burst absorption. Also, with 40G NIF uplinks, chances of a hash collisions/uplink congestion are minimized greatly.

I would suggest you read the full document to understand the reasons behind these suggestions.

Stay safe,

Sergiu

View solution in original post

Sergiu.Daniluk · ‎02-02-2021

Hi @morabusa

TX Pause frames on HIF ports means that the server is sending bursts of traffic, causing the FEX buffer to get exhausted.

This is not related to uplink (NIF) utilization.

To solve the problem you have two options:

1. Enable flow control on server side and make sure it honours the pause frames received from FEX

2. spread the 11 HIFs along the FEX ports. Basically, avoid having all 11 ports one after another (Eth111/1/10-20), but rather something like Eth111/1/1, Eth111/1/5, Eth111/1/9.... etc

You can find more details and available tshoot commands on FEX here:

https://www.cisco.com/c/en/us/support/docs/switches/nexus-2000-series-fabric-extenders/200260-Troubleshooting-Tx-Pauses-on-Nexus-2232.html

https://www.cisco.com/c/en/us/support/docs/switches/nexus-2000-series-fabric-extenders/200265-Troubleshooting-Fabric-Extender-FEX-Pe.html

Stay safe,

Sergiu

morabusa · ‎02-02-2021

Thank you very much. Just an extra question, is there a way to check how FEX ports are pinned to uplink ports? We are using static pinning (max-links 1), and I would like to check which HIF ports are using the same NIF.

EDIT: Uplinks are grouped in a port-channel, so I think that all HIF interfaces see the 8 uplinks like only one local uplink and all HIF make use of the 8 NIF (1 logical interface) buffer, right?

Best regards.

Sergiu.Daniluk · ‎02-02-2021

If the fex fabric ports (ports on the parent switch) are configured in a port-channel, then there will be one logical uplink from perspective of the FEX and traffic is load balanced on all NIFs. Just as a recommendation, never use the static pinning mode and no port-channel - this is the most inefficient design. Plus is not supported on 7k/9k.

Cheers,

Sergiu

morabusa · ‎02-02-2021

So, just for confirmation, if I am using Port Channel Fabric Interface connection (all NIFs into one Port Channel), it is not a problem if I connect HIFs one after another using a FEX N2K-C2232PP-10GE device (for example: Eth111/1/10-20), right? Or it is still a good idea to separate the HIFs (I have not been able to find any information about this if using NIFs port-channel). I have read that the N2K-C2232PP-10GE has just one ASIC, so I believe that it won't change much how I connect the HIFs if using a NIFs port channel, but I am not totally sure about this. Thank you very much for the help!

Best regards.

Sergiu.Daniluk · ‎02-02-2021

Hi @morabusa

In the links I referenced in previous post, there is a diagram with how the internal architecture looks like for your N2K model:

There are 4 subsystems for the HIF side. You should spread the HIFs along those SS.

In the other link, there are additional suggestions on how you could avoid drops of traffic due to bursts, specifically on 2232:

Conclusions and Best Practices

1. TX Pause is normal operational mechanism in order to avoid packet drops in 2232/2248UPQ/B22 FEX.

2. Maximise number of uplinks between 2232/2248UPQ/B22 FEX and parent. In order to be able to have more paths towards the network and also it helps to have max buffers for N2H traffic.

3. If uplinks between FEX and parent and not evenly used, the change of port-channel hashing can help.

4. Since there is no local switching on FEX, avoid have east-west traffic flow profiles on hosts on FEX.

5. Avoid bursty appliances such as NAS devices, blade chassis on FEXes. These need to be on the parent.

6. Newer 2348UPQ FEX with 32M shared buffer, has 1MB shared buffer per HIF for H2N traffic for better burst absorption. Also, with 40G NIF uplinks, chances of a hash collisions/uplink congestion are minimized greatly.

I would suggest you read the full document to understand the reasons behind these suggestions.

Stay safe,

Sergiu