10-19-2025 06:44 AM
Hi all — I have a design question + troubleshooting help request.
Topology / environment
2 x Nexus 93180 switches in a vPC pair (acting as distribution).
6 x Nexus 2000 FEX units connected to the 9Ks.
Servers are connected to the FEX host ports(no aggrigation on target ports). The traffic path is: server → FEX → Nexus 9K → core/router → outside.
FEX host-facing ports (server-facing) are configured as trunk ports. (no vpc on ports with problem)
Problem
Some servers’ MAC addresses are not being learned on the parent Nexus 9Ks, and those servers’ networks are effectively down (no L2 reachability). This is intermittent — some servers on the same FEX learn normally, others do not. No obvious physical cabling change.
Questions
Is there a known limitation on:
number of FEX units supported per Nexus 93180,
number of MAC addresses learned per FEX or per parent 9K,
number of VLANs allowed per FEX host port or trunk,
or anything else that could cause selective MAC learning failure?
What are the most likely causes and the recommended troubleshooting steps?
I checked everything that I thought can make the problem such as device compability, version, physical connection, servers configurations and more....
What I suspect / possible causes
hardware/resource limits on the 9K (MAC table exhaustion, CPU)
spanning-tree blocking on certain host VLANs
storm-control or ACLs filtering traffic
This is my icam scale if it can help:
sho icam scale l2-switching
Retrieving data. This may take some time ...
==================================================
Info Threshold = 80 percent (default) |
Warning Threshold = 90 percent (default) |
Critical Threshold = 100 percent (default) |
All timestamps are in UTC |
==================================================
------------------------------------------------------------------------------------------------
Scale Limits for L2 Switching
------------------------------------------------------------------------------------------------
Feature Verified Scale Config Scale Cur Scale Cur Util Threshold Exceeded Polled Timestamp
----------------------------------------------------------------------------------------------------------
MAC Add - - - - - -
(Mod:1,FE:0) 92000 92000 1811 1.96 None 2025-10-19
MST Instances 64 64 1 1.56 None 2025-10-19
MST vPorts 48000 48000 18431 38.39 None 2025-10-19
RPVST vPorts 48000 48000 0 0.00 None 2025-10-19
RPVST VLANs 3967 3967 0 0.00 None 2025-10-19
VLANs 3967 3967 509 12.83 None 2025-10-19
Isolated Port*Vlan 190000 190000 0 0.00 None 2025-10-19
RPVST lPorts 22000 22000 0 0.00 None 2025-10-19
Solved! Go to Solution.
10-19-2025 09:20 AM
You can check the number of MACs learning from FEX :
Show sprom fex xx all
What I suspect / possible causes
hardware/resource limits on the 9K (MAC table exhaustion, CPU)
spanning-tree blocking on certain host VLANs
storm-control or ACLs filtering traffic
You can verify that any STP Blocking
9K have a big capacity - how many servers do you have here? You can also look at the MAC address and count them.
Check the FEX troubleshooting; you can find any information to fix the issue.:
Last one i suggest, check the show logging see any errors or complaints about the issue.
=====Preenayamo Vasudevam=====
***** Rate All Helpful Responses *****
11-26-2025 11:20 PM
Before we ran into these issues, everything was running smoothly with our Nexus 5k switches. But after swapping them out for Nexus 9k switches, the problems started. We think the root cause is a combination of device limitations, the deprecation of FEX technology, and the incompatibility between FEX and the 9k. With the large number of virtual machines and their heavy load, we decided to balance things out by adding more Nexus 9k switches and connecting fewer FEXes to each one. Now, with two FEXes connected to each 9k, everything’s working fine. In fact, this issue has really sped up our network modernization plans, pushing us to move away from FEXes and transition to a new spine-and-leaf design. All in all, we believe the problem boils down to the limitations of the Nexus 9k switches.
10-19-2025 09:20 AM
You can check the number of MACs learning from FEX :
Show sprom fex xx all
What I suspect / possible causes
hardware/resource limits on the 9K (MAC table exhaustion, CPU)
spanning-tree blocking on certain host VLANs
storm-control or ACLs filtering traffic
You can verify that any STP Blocking
9K have a big capacity - how many servers do you have here? You can also look at the MAC address and count them.
Check the FEX troubleshooting; you can find any information to fix the issue.:
Last one i suggest, check the show logging see any errors or complaints about the issue.
=====Preenayamo Vasudevam=====
***** Rate All Helpful Responses *****
11-26-2025 11:18 PM
There is no stp blocking 100% sure.
There no error on device even with logging level 7.
There are many virtual machines connected to esxi host's and there are about 5 host's connected to each fex switch.
11-27-2025 01:08 AM
This required more investigation and collecting all the information at every level
=====Preenayamo Vasudevam=====
***** Rate All Helpful Responses *****
10-19-2025 09:33 AM
- @imhessam Check the software version running on the nexus pair, consider using the latest advisory release, especially
if you are currently using an older one,
M.
10-19-2025 03:03 PM - edited 11-26-2025 10:56 PM
As I said I have checked everything related to software version; its up to date
10-21-2025 05:19 AM
have you considered MAC aging time as cause of missing entries ?
this could explain the intermittent behavior as entries disappear when no traffic is passing the parent Nexus for that MAC address?
11-18-2025 05:31 PM
@imhessam Did you ever find a resolution to this issue? I have been working on a like deployment and we are running into the exact same problem.
11-26-2025 11:20 PM
Before we ran into these issues, everything was running smoothly with our Nexus 5k switches. But after swapping them out for Nexus 9k switches, the problems started. We think the root cause is a combination of device limitations, the deprecation of FEX technology, and the incompatibility between FEX and the 9k. With the large number of virtual machines and their heavy load, we decided to balance things out by adding more Nexus 9k switches and connecting fewer FEXes to each one. Now, with two FEXes connected to each 9k, everything’s working fine. In fact, this issue has really sped up our network modernization plans, pushing us to move away from FEXes and transition to a new spine-and-leaf design. All in all, we believe the problem boils down to the limitations of the Nexus 9k switches.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide