cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1088
Views
4
Helpful
8
Replies

MAC learning issue on Nexus 93180+Nexus 2K FEX

imhessam
Level 1
Level 1

Hi all — I have a design question + troubleshooting help request.

Topology / environment

  • 2 x Nexus 93180 switches in a vPC pair (acting as distribution).

  • 6 x Nexus 2000 FEX units connected to the 9Ks.

  • Servers are connected to the FEX host ports(no aggrigation on target ports). The traffic path is: server → FEX → Nexus 9K → core/router → outside.

  • FEX host-facing ports (server-facing) are configured as trunk ports. (no vpc on ports with problem)

Problem
Some servers’ MAC addresses are not being learned on the parent Nexus 9Ks, and those servers’ networks are effectively down (no L2 reachability). This is intermittent — some servers on the same FEX learn normally, others do not. No obvious physical cabling change.

Questions

  1. Is there a known limitation on:

    • number of FEX units supported per Nexus 93180,

    • number of MAC addresses learned per FEX or per parent 9K,

    • number of VLANs allowed per FEX host port or trunk,
      or anything else that could cause selective MAC learning failure?

  2. What are the most likely causes and the recommended troubleshooting steps?

I checked everything that I thought can make the problem such as device compability, version, physical connection, servers configurations and more....

What I suspect / possible causes

  • hardware/resource limits on the 9K (MAC table exhaustion, CPU)

  • spanning-tree blocking on certain host VLANs

  • storm-control or ACLs filtering traffic

This is my icam scale if it can help:

sho icam scale l2-switching
Retrieving data. This may take some time ...
==================================================
Info Threshold = 80 percent (default) |
Warning Threshold = 90 percent (default) |
Critical Threshold = 100 percent (default) |
All timestamps are in UTC |
==================================================

------------------------------------------------------------------------------------------------
Scale Limits for L2 Switching
------------------------------------------------------------------------------------------------
Feature                             Verified Scale      Config Scale      Cur Scale      Cur Util      Threshold Exceeded      Polled Timestamp
----------------------------------------------------------------------------------------------------------
MAC Add                          -                         -                        -                   -               -                                   -

(Mod:1,FE:0)                     92000                 92000               1811            1.96            None                            2025-10-19

MST Instances                   64                       64                     1                  1.56            None                            2025-10-19

MST vPorts                       48000                  48000              18431           38.39         None                            2025-10-19 

RPVST vPorts                   48000                   48000              0                   0.00           None                           2025-10-19

RPVST VLANs                   3967                    3967                0                   0.00           None                            2025-10-19 

VLANs                              3967                    3967                509               12.83          None                           2025-10-19 

Isolated Port*Vlan             190000                190000             0                   0.00           None                           2025-10-19

RPVST lPorts                     22000                  22000              0                   0.00           None                           2025-10-19

 

2 Accepted Solutions

Accepted Solutions

balaji.bandi
Hall of Fame
Hall of Fame

You can check the number of MACs learning from FEX :

Show sprom fex xx all

What I suspect / possible causes

hardware/resource limits on the 9K (MAC table exhaustion, CPU)

spanning-tree blocking on certain host VLANs

storm-control or ACLs filtering traffic

You can verify that any STP Blocking

9K have a big capacity - how many servers do you have here? You can also look at the MAC address and count them.

Check the FEX troubleshooting; you can find any information to fix the issue.:

https://www.cisco.com/c/en/us/support/docs/switches/nexus-2000-series-fabric-extenders/200265-Troubleshooting-Fabric-Extender-FEX-Pe.html

Last one i suggest, check the show logging see any errors or complaints about the issue.

BB

=====Preenayamo Vasudevam=====

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

View solution in original post

Before we ran into these issues, everything was running smoothly with our Nexus 5k switches. But after swapping them out for Nexus 9k switches, the problems started. We think the root cause is a combination of device limitations, the deprecation of FEX technology, and the incompatibility between FEX and the 9k. With the large number of virtual machines and their heavy load, we decided to balance things out by adding more Nexus 9k switches and connecting fewer FEXes to each one. Now, with two FEXes connected to each 9k, everything’s working fine. In fact, this issue has really sped up our network modernization plans, pushing us to move away from FEXes and transition to a new spine-and-leaf design. All in all, we believe the problem boils down to the limitations of the Nexus 9k switches.

View solution in original post

8 Replies 8

balaji.bandi
Hall of Fame
Hall of Fame

You can check the number of MACs learning from FEX :

Show sprom fex xx all

What I suspect / possible causes

hardware/resource limits on the 9K (MAC table exhaustion, CPU)

spanning-tree blocking on certain host VLANs

storm-control or ACLs filtering traffic

You can verify that any STP Blocking

9K have a big capacity - how many servers do you have here? You can also look at the MAC address and count them.

Check the FEX troubleshooting; you can find any information to fix the issue.:

https://www.cisco.com/c/en/us/support/docs/switches/nexus-2000-series-fabric-extenders/200265-Troubleshooting-Fabric-Extender-FEX-Pe.html

Last one i suggest, check the show logging see any errors or complaints about the issue.

BB

=====Preenayamo Vasudevam=====

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

There is no stp blocking 100% sure.

There no error on device even with logging level 7.

There are many virtual machines connected to esxi host's and there are about 5 host's connected to each fex switch.

This required more investigation and collecting all the information at every level

 

BB

=====Preenayamo Vasudevam=====

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Mark Elsen
Hall of Fame
Hall of Fame

 

 - @imhessam    Check the software version running on the nexus pair, consider using the latest advisory release, especially
                          if you are currently using an older one,

  M.



-- Let everything happen to you  
       Beauty and terror
      Just keep going    
       No feeling is final
Reiner Maria Rilke (1899)

As I said I have checked everything related to software version; its up to date 

pieterh
VIP
VIP

have you considered MAC aging time as cause of missing entries ?
this could explain the intermittent behavior as entries disappear when no traffic is passing the parent Nexus for that MAC address?

sscobee
Level 1
Level 1

@imhessam Did you ever find a resolution to this issue? I have been working on a like deployment and we are running into the exact same problem.

Before we ran into these issues, everything was running smoothly with our Nexus 5k switches. But after swapping them out for Nexus 9k switches, the problems started. We think the root cause is a combination of device limitations, the deprecation of FEX technology, and the incompatibility between FEX and the 9k. With the large number of virtual machines and their heavy load, we decided to balance things out by adding more Nexus 9k switches and connecting fewer FEXes to each one. Now, with two FEXes connected to each 9k, everything’s working fine. In fact, this issue has really sped up our network modernization plans, pushing us to move away from FEXes and transition to a new spine-and-leaf design. All in all, we believe the problem boils down to the limitations of the Nexus 9k switches.