Re: ASK THE EXPERT - CISCO NEXUS 5000 AND 2000 SERIES

ciscomoderator · ‎10-07-2010

Welcome to the Cisco Networking Professionals Ask the Expert conversation. This is an opportunity to get information on Data Center Switching the Cisco Nexus 5000 Series Switches and Nexus 2000 Series Fabric Extenders with Lucien Avramov. Lucien is a Customer Support Engineer at the Cisco Technical Assistance Center. He currently works in the data center switching team supporting customers on the Cisco Nexus 5000 and 2000. He was previously a technical leader within the network management team. Lucien holds a bachelor's degree in general engineering and a master's degree in computer science from Ecole des Mines d'Ales. He also holds the following certifications: CCIE #19945 in Routing and Switching, CCDP, DCNIS, and VCP #66183.

Remember to use the rating system to let Lucien know if you have received an adequate response.

Lucien might not be able to answer each question due to the volume expected during this event. Our moderators will post many of the unanswered questions in other discussion forums shortly after the event. This event lasts through October 22, 2010. Visit this forum often to view responses to your questions and the questions of other community members.

GIULIO FAINI · ‎10-08-2010

I have 5 questions I hope you can enlighten me about:

If I do multi-layer vPCs (or maybe its called back-to-back vPCs), does the vPC IDs that you associate to the interfaces must be the same on all facing-swicthes or they just have local important to the switch?
LACP between NX-OS and a 6708 card of a 6500 running 12.2(33)SXI3 is not working (I tested it and channel-group works only as 'on' on both sides and not 'active' as it is reccomended on CCO). So, basically, how do you run LACP between NX-OS and IOS 6500?
if both vPC peers are DOWN becuase of a network failure and then they all restart again, how long does it take vPC to converge and relay traffic? i think we need to configure the 'reload restore' under "vpc domain" config. Is that all or more commands are needed? Why this command is not enabled by default?
If i connect a L3 device (like a router) to both vPC peers, best practices say that you need to use L3 links to both vPC peers. Can I run the L-3 escape-link (in case one of the 2 L3 links are down) over the vPC peer link? Where does it say it on CCO? I just need to add the VLAN where i want the escape-link traffic pass to.
Over the vPC peer link, should i run bridge assurance, udld, loopguard, mtu 9216? what is the best practice reccomendation for configuring a vPC peer link?

Thanks,

Giulio.

Lucien Avramov · ‎10-09-2010

1. The vPC domain id needs to be local to the pair of Nexus switches you are pairing together. For example if you have a pair of N7K and a pair of N5K, your 5K needs to be both in the same domain id number, and the pair of 7K in their own domain id number.

2. We need to troubleshoot here lacp to find out where the problem is. It should work with Catalyst 6500 or any other switch LACP capable, so we need to look further at the specific code you are running here to figure out the LACP issue.

3. I'm not sure about the reload restore command you are refering too. Are you sure this is on a Nexus 5000? If so, let me know what code you are running.

4. Configure vpc on the two physical connections from your 5ks to the router. Then for that one vPC, use switchport mode access, and place both in the same vlan. You won't need an SVI with an IP address for that one vlan except it you want to run a ping test for example. This vlan used for the router, will have of course to be allowed in the vlan list on the peer link. In general all the vlans you have for vPC need to be allowed on the vpc peer link.

5. Over the peer-link, run spanning tree port type network, this will enable bridge assurance. As far as the MTU, you don't need to make any specific change: if your mtu is globally enabled as 9216, then it will on the peer-link as well.

GIULIO FAINI · ‎10-11-2010

1.- But can you confirm that in a multi-layer vPC the vPC domain IDs must be unique on the 2 different vPC domains? If 2 N7K have "vpc domain 1" and they are connected to 2 N5K, the 2 N5K cannot have as well "vPC domain 1", right?

2.- I opened a case.

3.- Yes its in the N7K running 5.03. Anyways, In the vPC protocol (I guess its the same with N5K), what happens if both peers are down and we restart the 2 switches?? Please see:

http://www.cisco.com/en/US/docs/switches/datacenter/sw/5_x/nx-os/interfaces/configuration/guide/if_vPC.html#wp1643030

Why is not enabled by default both on the N7K and N5K.

4.- yes we use 2 VLANS with SVI (can we do without SVI??) to connect the 2 Nexus to the router... i am not clear the config. if I have for example:

Nexus1----1/29 ----------------------------------

1/31 ||1/32 | 1/2

|| Router

1/31 ||1/32 | 1/1

Nexus2----1/29----------------------------------

Can you tell me the config you would use in Nexus1, Nexus2 interfaces 1/31-32 and 1/29 and VLANs?

5.- Do you reccomend to use UDLD on the peer link?

Lucien Avramov · ‎10-11-2010

1. You can use the same vpc domain for different nexus pairs. It will just confuse you more than anything else, but it will work. So I don't recommend you on such configuration.

3. This feature is not yet on N5K, but will be in the 5.0 release coming out soon. As far as why it's not enabled by default, I don't know. I will enquire.

4. Why 2 vlans here? That makes you use 2 networks on your router.

You could have a layer 3 port-channel on the router and bundle both router interfaces for the same network. Then on both nexus switches, you can have an etherchannel with an access vlan.

5. No, lacp is sufficient on the peer link for failure detection. UDLD would be useful if you are using no lacp over the peer-link. Overral it's a better solution to use lacp.

GIULIO FAINI · ‎10-12-2010

Thanks for your valuable answers.

About point 4), its very interesting what you say but I am not sure that it will work using a L-3 etherchannel.

If you see the 2 animation slides, I am fraid it may create a Black-Hole of traffic when u use a Etherchannel with a L3 device.

What do you think ?

Lucien Avramov · ‎10-12-2010

I dont see this as an issue, your document seems out dated. There is now the peer-gateway feature on 7K to prevent this from happening.

The vPC peer-gateway capability allows a vPC switch to act as the active gateway for packets that are addressed to the router MAC address of the vPC peer. This feature enables local forwarding of such packets without the need to cross the vPC peer-link. In this scenario, the feature optimizes use of the peer-link and avoids potential traffic loss.

aamercado · ‎10-11-2010

2 questions:

1. With n5k active-active topology under "system qos" - anyway to setup qos without it restarting the 2148 fex? I had qos setup and then change it back to defaullt base on cco instrutions which restarted all the fex associated with this n5k pair but cco doc didn't mention it would do this. Got the global vpc inconsistency log message when I change qos back to default

2. N5k on 4.2(1)N1(1) and N7k on 5.0(2) [build 5.0(0.66)] has packet loss (see SR 615676537)

A pair of N5k on active-active mode with Vlan 211 and 207 up to the N7k redundant cores. I checked the following for N7k Core-1:

ospf DR master

hsrp primary

vpc primary

stp root

Below is an example of a vlan config:

interface Vlan207
no shutdown
no ip redirects
ip address 10.100.207.2/24
ip ospf passive-interface
ip router ospf 1 area 0.0.0.200
ip pim sparse-mode
ip igmp version 2
hsrp 207
    preempt delay minimum 180
    priority 90
    timers 1 3
    ip 10.100.207.1
ip dhcp relay address 10.100.211.71

Both N7k and N5k has the same config below:

interface port-channel13
description TO-N5K-DC-EDGE3and4 PORTS 3/9 and 3/10****
switchport
switchport mode trunk
vpc 13
switchport trunk allowed vlan 203-204,207-208,211,223-224
logging event port link-status
logging event port trunk-status

On the N5k, the switchport is basic:

interface Ethernet141/1/11
switchport access vlan 207
spanning-tree port type edge

When server (2003 SP3 or 2008) on vlan 211 copies files from 207, it takes a long time. Wireshark trace shows dropped packets meaning I get a lot of "tcp previous segment loss" For example, a 1.5G files which normally take less than a minute, takes 15 minute to copy.

I tried different servers and it doesn't appear to be server related but network. I tried to turn off "checksum offload" and "large send offload" on the NIC but still a problem...there is no nic-teaming - just one nic on each server

From 10.100.211.X server, I tried a "ping 10.100.207.X -t -l 1514" and there was only a few packet drops as oppose to the ethereal trace so I am not sure if I am actually getting packet loss from the network or if it is still a server issues. I tried copy from servers on diff OS (ie 2003 versus 2008) and same problem

I also looked at perfmon during a file transfer and it looks fine.

On non-active/actvie setups, like single switch user-IDF (ie 4500, 3750., 6500) or a single N5k, which has a vpc up to the N7k cores, file transfers are fine...so it just seems to be related to my redundant N5k (active-active) topology. I tried with turning Jumbo on and off as well as setting up QoS on and off but no work. Currently Jumbo and QoS is set back to default. I also tried enabling "peer-gateway" on the N7k VPC cores but same problem.

I also tried file transfer btwn n5k pairs meaning N5k(active-active) pair #1 to/from N5k(active-active) pair #2 and slow file transfers....so issue seems to related only to n5k active-active topology:

sh platform and sh hardware shows negligible discards/drops. Although not sure on the mtu and crc stomps???

Gatos 0 interrupt statistics:
Interrupt name                                 |Count   |ThresRch|ThresCnt|Ivls
-----------------------------------------------+--------+--------+--------+----
gat_fw2_INT_ig_pkt_err_eth_crc_stomp           |a5a3    |0       |3       |0
gat_mm2_INT_rlp_rx_pkt_crc_stomped             |a5a3    |0       |3       |0
Done.

Gatos 1 interrupt statistics:
Interrupt name                                 |Count   |ThresRch|ThresCnt|Ivls
-----------------------------------------------+--------+--------+--------+----
gat_fw0_INT_eg_pkt_err_eth_crc_stomp           |14bd    |0       |1       |0
gat_fw1_INT_eg_pkt_err_eth_crc_stomp           |3a04    |0       |4       |0
gat_fw3_INT_eg_pkt_err_eth_crc_stomp           |6759    |0       |1       |0

Any ideas

Lucien Avramov · ‎10-11-2010

1. Can you be more specific and show me what configuration you applied?

2. You have the exact same configuration for 141/1/1 on both N5Ks? Do you mean the transfer is also slow, when you connect the server directly to the N5K? Is it a dual homed server or single homed (this is key)? Where is the vlan 211 located, is it on the same N5K pair? Are you using enough peer-links between the two 5ks? How many physical 10 GE links do you have between them? What protocol is used for the file transfer? CIFS?

aamercado · ‎10-11-2010

1. Spoke to TAC, apparently on a multi-tier VPC (ie my N7k cores are vpc and my downstream n5k dualhome are vpc), putting a qos change will create a vpc inconsistency that will take down my network until the configs are manually sync. Darn - wish that wasn't the case as I want to turn on Jumbo frames on the N7k and N5k without disrupting the network.

2a. Yes, exact same configuration for 141/1/1 on both N5Ks

2b. No, never tried connecting servers directly to N5k

2c. All windows server is single-homed while our non-window servers (ie Isilon) are static 2-port-channel (non-LACP). When I transfer from single to single, same problem or single to dual-home servers. Whether single or dual-home doesn't matter as when I do the same file transfer from a non-pair N5k (ie single N5k, 45XX, 65XX, 3750 which has a VPC to N7k Core), transfer is fast...so only seems isolated to servers hanging off N5k pair.

2d. Yes, vlan 211 and 207 located are on the same N5K pair

2e. Yes, 20Gig btwn the N5k with 7 qty 2148Fex on the N5k pair.

2f. Yes, CIFS for file transfer - ethereal traces show SMB setting up the connection and file transfer

Lucien Avramov · ‎10-12-2010

1. The config-sync feature coming in the next code release of NX-OS for 5K will allow you to make those changes more smoothly accross your N5K pair.

2. In your design for the file transfer , is it 1GB server ---- Nexus ---- 1 GB server ?

Let's look at the counters for drops / discards and at the queuing:

Assuming you have a 2148, look at the following outputs:

N5K#attach fex 100

Fex-100#show platform software redwood sts -> look at the asic and HI port concerned

Then show platform software redwood drops

Look if this increments overtime

aamercado · ‎10-12-2010

The FEX upstream to the N5k has no increments but the host ports do slowly increment on:

red_hix_cnt_tx_lb_drop

Lucien Avramov · ‎10-12-2010

Increment of red_hix_cnt_tx_lb_drop is not an issue.

This counts for frames received and not send out from the interface they were received, its normal to see an increment there.

TAC would need to troubleshoot this further. From what you are saying the FEX is not dropping traffic.

aamercado · ‎10-12-2010

Any chance you have a method to confirm this on the N5k?

or even on the N7k?

Lucien Avramov · ‎10-13-2010

A couple of other things for the 5K:

-Do you see CRC or other input errors to increment on the interfaces? (show interface)

-On the 5k, you can identify drops with show platform fwm info pif ethernet 1/1

-Also look at the queuing, show queuing interface e1/1 and see if those increment