cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
438
Views
0
Helpful
1
Replies

Nexus 5548/2248 communication issues after reboot

TYLER WEST
Level 1
Level 1

I have two Nexus 5548 switches and 4 2248 FEX all interconnected.  The 5548s are connected to each other with 4 10GE ports and configured for vPC.  The 5548s also have the L3 daughtercard.  The 2248s are redundantly connected to each 5548 with 2 of the 4 FEX ports going to each 5548.  They were running 5.2.1.N1.5 and were recently upgraded to 5.2.1.N1.7.  I have noticed a communication problem if all of the switches boot simultaneously.  This has happened in the lab environment and has happened twice in their production location.  It is new construction so there have been at least a couple of planned power outages where they would all come up simultaneously after the outage.  Anyway, there seems to be two problems and I would honestly need somebody to lead me down a troubleshooting path.  NX-OS is still fairly new to me but I have almost 20 years of experience with IOS.

Problem 1:  Communications across the layer 3 gateway seems to slow down dramatically.  I don't know if this is due to packet loss and retransmissions or if it is just delayed.  I believe it may be isolated to traffic that crosses layer 3 but is also sourced by or destined to one of the FEX ports.

Problem 2:  Some communications within the same VLAN from one FEX port to another FEX port doesn't work while some does.  For example, I have 10.225.3.6 on FEX 101 that can ping 10.225.3.7 on FEX 103.  But 10.225.3.7 cannot ping 10.225.3.6.  I also have had 10.225.3.20 on FEX 101 that could not ping 10.225.3.6 on the same FEX or vice-versa.

During reboot there are no noticeable errors and all FEXs show they are online.  I can find nothing in my limited knowledge of NX-OS that shows a problem exists on the surface.  Once it is in this state if I power cycle all of the FEXs and leave the 5548s up, everything will start working normally.  If I only power up the 5548s and let them completely boot then power up the FEXs everything will work normally.

Has anyone seen this problem or anything similar?  Could it be related to some nuance of vPC or config sync that I'm overlooking?  Could it be related to all of the limitations imposed by using the L3 daughtercard?

1 Reply 1

Marvin Rhoads
Hall of Fame
Hall of Fame

Are you using the "ip arp synchronize", "peer-gateway" and "peer-switch" features? Those all help mitigate aspects of traffic flow that could be related to what you're seeing.

 

Review Cisco Networking for a $25 gift card