ā08-26-2011 03:15 PM - edited ā03-07-2019 01:55 AM
With Hatim Badr and Iqbal Syed
Welcome to the Cisco Support Community Ask the Expert conversation. This is an opportunity to learn about design, configuration and troubleshooting of VPC with Cisco experts Hatim Badr and Iqbal Syed. Iqbal is a product manager and technical marketing engineer for the Cisco Nexus 7000 Series of switches. He is responsible for product road-mapping and marketing the Nexus 7000 line of products with a focus on virtual port channel design and training. Syed has been with Cisco for more than 8 years, which includes experience in Cisco Advanced Services and the Cisco Technical Assistance Center. His experience ranges from reactive technical support to proactive engineering, design, and optimization. He holds CCIE (Routing & Switching), CCDP, Cisco Data Center, and TOGAF (v9) certifications. Hatim is a network consulting engineer for Cisco Advanced Services in Toronto, where he supports Cisco customers across Canada as a specialist in data center architecture, design, and optimization projects. He has more than 10 years of experience in the networking industry. He holds CCIE certification #14847 in Routing and Switching and also holds TOGAF 9, VCPv4, and PMP certifications.
Remember to use the rating system to let Hatim and Iqbal know if you have received an adequate response.
Hatim and Iqbal might not be able to answer each question due to the volume expected during this event. Remember that you can continue the conversation on the discussion forum shortly after the event. This event lasts through September 9, 2011. Visit this forum often to view responses to your questions and the questions of other community members.
ā08-29-2011 07:30 AM
Hello Hatim and others,
I'm sorry to have to report my bad experiences with Nexus here but I think it is important to know if the problem is fixed.
Short history:
while building a new datacenter last year with two Nexus 7018 used as L2 only with VPC we faced a serious issue.
We had 20 C4948-10GE connected downstream to the Nexus pair
One of my colleagues trying to add a vlan to the VPC made an error:
instead of using
switchport trunk allowed vlan add x,y,z
he typed by error
switchport trunk allowed vlan x,y,z
on second nexus only
the VPC consistency check was triggered and all of this network block was isolated from the rest of the network.
We had to power off the second Nexus to restore connectivity
It took one hour to recover.
In my humble opinion this is not acceptable on devices like this with a full price in excess of 100,000 $ each
Nexuss CLI should be changed in order to have the commit logic like IOS XR
Forgive me if this post looks like aggressive, I'm usually more friendly here in CSC.
I hope no one will be hurt.
the NX-OS version was something like 4.0
Best Regards
Giuseppe Larosa
CCIE SP#14802
ā08-29-2011 10:23 AM
Hi Giuseppe,
Thanks for reporting your issue to us , we are constantly working to improve user experience with NX-OS .
There is a new feature in NX-OS release 5.2 which can help in the scenarios like this , This feature is enabled by default ( cant be disabled) in 5.2 and is called 'Per-Vlan Consistency Check'.. ( Not applicable to Spanning tree MST Mode )
--------------------
ā08-29-2011 11:09 AM
Hello Igbal,
I agree that this is a smarter way to perform VPC consistency check, however it would have been of limited use in my case:
in our case peer 1 had a vlan list made of
{existing Vlans on VPC} U {x,y,z}
peer2 had only ( for the missing add keyword)
{x,y,z}
as a result of this only the new vlans {x,y,z} would be in service and downstream C4948-10GE would be still unaccessible in-band if an access-class is applied and none of the few in service vlans is the management vlan
both NX would be not accessible on the management vlan too, so a console access would be needed to recover from this.
Notice that in environments with strict control on the VLANs allowed on trunks for STP scalability the risk of a similar error is not so rare.
Thanks for your attention I appreciate that the behaviour has been improved
By the way I think this thread is really high quality with useful and interesting information for us that are spread around the world.
Best Regards
Giuseppe
ā08-29-2011 12:07 PM
Hi Giuseppe,
Hmm ...I see - yeah in your particular case - it makes sense as none of the in service vlans x,y,z are mgmt vlans .
Anyway , thanks for bringing this scenario to our attention.
Regards,
Iqbal
ā08-29-2011 02:12 PM
Mismatched configurations can cause errors or misconfigurations that can result in service disruptions. The configuration synchronization (config-sync) feature in Cisco NX-OS Release 5.0(2)N1(1), allows you to configure one switch profile and have the configuration be automatically synchronized to the peer switch
Not sure if this is a planned feature for future releases in the N7K as i think this will over come issues like the one Giuseppe mentioned
Thanks
Marwan
ā08-29-2011 02:46 PM
Hi Marwan,
Agreed , Config-sync will help too!
At this time - this feature is in the roadmap for N7K .
Regards,
Iqbal
ā08-29-2011 04:07 PM
By the way I think this thread is really high quality with useful and interesting information for us that are spread around the world.
Best Regards
Giuseppe
100% Agree with you Giuseppe
ā10-21-2019 08:19 AM
Hi, would you have an Updated link, as the one posted here does not work anymore.
IĀ“m tryng to do a Port channel between to DC that are less than 10Km apart between a cisco nexus 3000 (vPC) and Cisco nexus 7009, how ever the port channel would not come up, what would ia have to ask my carriers (2 of them) ending in each of the pair of nexus 3000 (directly) and in the other side they end:
Carrier 1 in a 3750
Carrier 2 in a Mikrotik cloud router
and from there to the nexus7K.
Thanks very much in advance.
Tahanks very much in advance
ā08-29-2011 06:01 AM
Hi Hatim/Iqbal
Firstly, i dont have much knowledge on Nexus switches so some of my questions may not make a lot of sense
From a previous answer in this thread -
1) From the Access layer, if we have Nexus for example 5000 series, Can we have One VPc identifier connects to both Nexus 7000 Distrbuttion/COre VPc pair?
I assume by VPC identifer you mean VPC domain , if yes then jst like Marwan mentioned wou will need two VPC domains
Can you elaborate on this. Does this mean if you have 2 x 5000 and 2 x 7000 and you wanted to run a vPC from each 5000 to the pair of 7000 switches then you need to use 2 vPC domains.
I'm not sure i understand what is meant by domain in this context.
2) In a recent thread Marwan posted that if you have 2 x 2000 FEX using vPC to a pair of 5000 switches then you cannot run a vPC from your server to the 2000 FEXs. ie.
FEX1 has a vPC to both 5000s
FEX2 also has a vPC to both 5000s
Is this a specific server limitation or does it apply to all Nexus products ie. if you have -
5000_1 vPC to 7000_1 and 7000_2
5000_2 vPC to 7000_1 and 7000_2
then you cannot run vPCs from the 2000 FEXs to the 5000s ?
3) What is the best practice design for vPCs ie.
is it recommended to always run vPC's upwards towards the distro/core layers. What i mean is does it ever make sense to have a vPC pointing the other way in that each core switch has a vPC that terminates across a pair of access layer switches ?
Can you actually have a 2 way vPC ie. at either end the vPC is terminated across a pair of switches ?
Hope some of the above makes sense !
Jon
ā08-29-2011 10:00 AM
Hi Jon,
##I have to delete my previous response and chage it with this since diagrams were not properly added ##
You are always welcome, this discussion is for all people interested vPC.
My answers are inline
Q1) Can you elaborate on this. Does this mean if you have 2 x 5000 and 2 x 7000 and you wanted to run a vPC from each 5000 to the pair of 7000 switches then you need to use 2 vPC domains.
I'm not sure i understand what is meant by domain in this context.
A1) There are two scenarios here
1- Connecting each N5K to N7K vPC domain as shown below does not need any vPC configuration in N5K and all you need in N5K is regular port channel configuration similar to any other switch
2- Creating vPC between two N5K and connect them to N7K vPC domain 1. As illustrated below
In this scenario you need different vPC domain ID for N5K as we used in the above diagram (vPC domain ID for N7K is 1 and for N5K is 2)
Q2) In a recent thread Marwan posted that if you have 2 x 2000 FEX using vPC to a pair of 5000 switches then you cannot run a vPC from your server to the 2000 FEXs. ie.
FEX1 has a vPC to both 5000s
FEX2 also has a vPC to both 5000s
Is this a specific server limitation or does it apply to all Nexus products ie. if you have -
5000_1 vPC to 7000_1 and 7000_2
5000_2 vPC to 7000_1 and 7000_2
then you cannot run vPCs from the 2000 FEXs to the 5000s ?
A2) You can run vPC either
1- between N5K and N2K
2- Extend to Server with N5K or N7K which is called "host vPC"
But you cannot run vPC between N5K --> N2K and then from N2K --> server (host vPC).
Q3) What is the best practice design for vPCs ie.
is it recommended to always run vPC's upwards towards the distro/core layers. What i mean is does it ever make sense to have a vPC pointing the other way in that each core switch has a vPC that terminates across a pair of access layer switches ?
Can you actually have a 2 way vPC ie. at either end the vPC is terminated across a pair of switches ?
A3) Yes you can do that. It is called Double Sided vPC as shown in the diagram above. If your design has vPC in access N5K and in distribution N7K then it is recommended to run double sided vPC.
Thanks
Hatim Badr
ā08-29-2011 10:11 AM
Hatim
Many thanks for the reply, a lot clearer now.
Jon
ā08-30-2011 03:24 PM
I have an interesting one that hasnt been discussed thus far....
What does a vPC failure look like? Take into consideration a software failure or hardware port and link failures. What is the WORST thing that can happen if vPC fails?
I'm asking this because we just had an interesting discussion at work. There was a comparison drawn between vPC and stacking in terms of failures, not that they are deployed for the same reason. As an FYI, the two technologies were being considered to achieve one particular goal for the particular situation we were confronted with at work. The goal was to have a design that would allow for full cross-sectional BW between the access layer (ToR) and the EoR/Agg layer in a non-blocking architecture. So having the aggregation layer in a vPC domain or stacking two switches would both achieve that.
One of the distinctions made against stacking is that the technology is notoriously buggy (on ALL vendor platforms) and can represent a single point of failure with a huge failure domain, depending on where it is deployed. For example, if a single stack is deployed at the EoR/aggregation layer and it requires a reboot to fix a problem or some other bug causes the whole stack to go down, the results can be devastating. I have seen that on Cisco StackWise and Juniper VC.
Hence, the question: what is the worst conceivable result from a vPC failure? And which is the safer bet when being deployed in a data center agg/core layer? By the way, PLEASE, let us NOT have a Cisco marketing discussion about how great vPC and stacking are and how they never fail. Let's assume they will fail for the purposes of this discussion.
For what it's worth, my answer is that if vPC fails, some traffic can be blackholed or a link can "fall" out of the vPC domain, creating a redundnat path/bridging loop, which will be addressed with STP. So, it won't be as potentially devastating as a stack failure.
ā08-31-2011 07:42 AM
Hi ex-engineer,
Thank you for your question about vPC failure scenarios and yes vPC can fail as documented in the vPC design document at
http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/design_guide_c07-625857.pdf
Please look at "vPC Failure Scenarios" section where it provides several failure scenarios and highlight the impact and how vPC behaves in each scenario. Also as you mentioned Spanning tree is running in the background as backup to prevent any loop in the network.
Also there are enhancements/features added with new releases to help reduce failure impact. Here are the major enhancements for vPC
Peer switch
Beginning with Cisco NX-OS Release 5.0.2a peer-switch feature is introduced to help both switches appear as single STP bridge and sending BPDU with same bridge ID ensure that the downstream device does not detect a spanning-tree misconfiguration.
Auto Recovery
Beginning with Cisco NX-OS Release 5.2(1), you can configure the Cisco Nexus 7000 Series device to restore vPC services when its peer fails to come. If both switches fails and only 1 switch come back online vPC will start working with 1 switch after configured delay (default 240 seconds). In previous versions this functionality is achieved using reload restore feature .
Delay restore
Beginning with Cisco NX-OS Release 4.2.(1)Delays vPCs bringup after a vPC device reload (SVI bring-up timing is unchanged) to avoid blackholing of routed traffic from access to core until layer 3 connectivity is reestablished
vPC config-sync
Config-Sync provides a mechanism to synchronize configuration between a pair of switches in a network to reduce mis-configuration problems as discussed earlier in previous thread. Config-synch is available in Nexus 5000 beginning NX-OS 5.x and in roadmap for Nexus 7000.
Thanks
Hatim Badr
ā08-30-2011 08:22 PM
I would like to pose a question regarding a specific vPC failure scenario:
When you have a keep-alive and Peer link failure the vPC domain is completely broken and Spanning-Tree protocol is required to prevent loops and place ports in blocking mode.
What is the mechanism by which STP gets to block some paths in the dual-active vPC scenario, after failure?
If everything was working perfectly fine and someone (theoretically) cuts the peer-link and keep-alive link between the peers.... well, HOW does STP kick in (?) or would you potentially have a scenario where you have both switches forwarding which could cause wierd paths, duplicate packets etc?
You would assume a change in the MAC id, but you would require an LACP renegotiation (changing of system-id) - but there is none (correct?)
5k-A thinks he is alone, he tries to be non-disruptive, he still uses old system MAC derived from vPC Domain
5k-B does exactly the same. They both forward up and down...
7k A and B (upstream) for them them nothing changed they still talk to a switch with 5k MAC address (vPC derived). The 7Ks still think they are talking to the same device...
STP could only kick in if 5k (A or B) would change system MAC ("Im no longer in vPC, I'll use my real MAC, lets do LACP renegotiation").
would love to hear your thoughts
many thanks
Adriaan Steyn
ā08-31-2011 06:54 AM
Hey, where did all the experts go? Calling Hatim, Iqbal, Jerry, Mohammad and Marwan....?
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: