N7K 10Gig Port Channel vPC to host bringing down network
Hello - We are running two 7010 set up together as vPC peer. We have at least 8 or 9 switches connected via vPC with LACP running just fine. I have three Unix hosts with 10Gig connections. I have had one connected in a vPC Port Channel (NOT LACP) for at least four or five months just fine.
Last week, I went to connect the other two in the same fashion (adding the 2nd 10Gig link). When I did that last week, about 14 hours later our network became rather crippled as some hosts could not be pinged (nothing to do with these Unix boxes). I finally turned down the 2nd 10Gig ports on these two other boxes. The network came back to life over the next 10-20 minutes.
Cisco TAC suggested that I (1) check the configuration on the Unix boxes for difference and (2) rebuilding the Port Channel by letting the channel-group statement create the port channel. I did both of those and connected back one of the Unix boxes with no issue. The next day, I added back in the 2nd Unix box (2nd 10Gig). About 20 minutes later, our network went crazy. I did not get to look at spanning tree, but it seems like it could be that type of trouble. Or something else creating a broadcast storm.
The TAC's suggestion is to work to change these connections to LACP as it is has more negoiation options. I will be doing that in a couple of days. I now no longer can connect this second port on this last unix box during working hours as it impacts production. Of course, the first time it took 14 hours to manifest itself while it took 20 minutes the second time.
Does this issue sound familiar to anybody else? I am willing to bring both port channels to the same 7010 for troubleshooting to see if it is vPC related. I would do that after I rebuild the port-channel with my Unix vendor to run LACP.
The symptoms you explain here could happen with any port-channel, vpc or not. It sounds like the unix host may be bridging packet from 1 NIC to the other which could be causing this. I've not personally setup port-channels on Unix hosts much, so I don't really have an idea of what to tell you to check, however I agree with TAC...use LACP.
For the reason you describe here, its the top reason I always suggest LACP. It can also help with any mis-cabling.
You could setup span on your 7k to see what the unix box is sending back, that could help as well...but before that I'd start by enabling LACP on each side of the link.
Thanks for attending our ATXs sessions! Here’s the post-session resources for easy reference.
New to ATXs? An ATXs session, offered at no cost, is an hour of real-time learning led by Cisco experts, who will answer your technology questions through produ...
Thanks for attending our Ask the Experts (ATXs) sessions! Here’s the post-session resources for easy reference.
New to ATXs? An ATXs session, offered at no cost, is an hour of real-time learning led by Cisco experts, who will answer your technology quest...
New Cisco Champion Radio release on Cisco Intersight Cloud Operations PlatformListen: https://smarturl.it/CCRS8E15Follow us: https://twitter.com/CiscoChampion Known as Project Starship when it was introduced in June 2017, Cisco Intersight has come a ...
Join us live on Thursday, April 8 at 10 am PT (and on demand after) as we join Cisco and HashiCorp executives to discuss the importance of IaC automation, Intersight Service for Terraform, and how to better manage hybrid cloud infrastructure at scale...