cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2005
Views
0
Helpful
14
Replies

Cisco Nexus 1000v stops inheriting

Guys,

I have an issue with the Nexus 1000v, basically the trunk ports on the ESXi hosts stop inheriting from the main DATA-UP link port profile, which means that not all VLANS get presented down that given trunk port, its like it gets completey out of sync somehow. An example is below,

THIS IS A PC CONFIG THAT'S NOT WOKRING CORRECTLY

show int trunk

Po9        100,400-401,405-406,412,430,434,438-439,446,449-450,591,850

sh run int po9

interface port-channel9

  inherit port-profile DATA-UP

  switchport trunk allowed vlan add 438-439,446,449-450,591,850 (the system as added this not user)

THIS IS A PC CONFIG THAT IS WORKING CORRECTLY

show int trunk

Po2        100,292,300,313,400-401,405-406,412,429-430,434,438-439,446,449-450,582,591,850

sh run int po2

interface port-channel2

    inherit port-profile DATA-UP

I have no idea why this keeps happening, when i remove the manual static trunk configuration on po9, everything is fine, few days later, it happens again, its not just po9, there is at least 3 port-channel that it affects.

My DATA-UP link port-profile configuration looks like this and all port channels should reflect the VLANs allowed but some are way out.

port-profile type ethernet DATA-UP

  vmware port-group

  switchport mode trunk

  switchport trunk allowed vlan 100,292,300,313,400-401,405-406,412,429-430,434,438-439,446,449-450,5

82,591,850

  channel-group auto mode on sub-group cdp

  no shutdown

  state enabled

The upstream switches match the same VLANs allowed and the VLAN database is a mirror image between Nexus and Upstream switches.

The Cisco Nexus version is 4.2.1

Anyone seen this problem?

Cheers

14 Replies 14

Robert Burns
Cisco Employee
Cisco Employee

Chirs,

1. What is the exact full version of 1000v - 1.4 or 1.4a?

2. Is this the only Uplink Port Profile you're seeing this issue with?  Is your system uplink every affected like this?

3. Are the upstream switchports configured with Port Fast?

If/when this happens next can you please run a "vem-support all" from the host CLI.  We'll need this to see if there's any flapping or programming events happening.  Also if you could include your VSM config that would help also.

Regards,

Robert

Hi Robert,

answer to your questions are below.

1 - Version number is

Software

  loader:    version unavailable [last: loader version not available]

  kickstart: version 4.2(1)SV1(4)

  system:    version 4.2(1)SV1(4)

  kickstart image file is: bootflash:/nexus-1000v-kickstart-mz.4.2.1.SV1.4.bin

  kickstart compile time:  1/27/2011 14:00:00 [01/27/2011 22:26:45]

  system image file is:    bootflash:/nexus-1000v-mz.4.2.1.SV1.4.bin

  system compile time:     1/27/2011 14:00:00 [01/28/2011 00:56:08]

2 - Yes, the only profile is the DATA-UP (the data traffic) the system uplink is unaffected. Plus it's only certain ESXi hosts where this is affected.

3 - Upstream ports arent configured with portfast, the exact config is below,

interface gi1/4

description wak-esxihost-04

switchport

switchport trunk encapsulation dot1q

switchport trunk allowed vlan 100,292,300,304,308,313,335,399-401,405,406,412

switchport trunk allowed vlan add 429,430,434,438,439,446,449,450,582,591,850

switchport mode trunk

I can fix the issue quite quick by running "no switchport trunk allowed vlan" on the VSM under the port-channel but i want to know why it keeps doing it. Also this is not something i would normally check unless vMotion fails from host to host, which is because the VLANS are not correct due to this issue. The logs attached are from a host that i have not fixed yet, hoping to get an answer from Cisco.

Thanks Robert

I'll review the logs, but something you said interests me. 

What are the similarities of the hosts seeing this problem.  You mentoined "only certain hosts" - What is the hardware, and topology (connected switching devices) of affected ESX hosts?

Robert

Hi Robert,

All hosts are an exact replica of each other. Host hardware is HP DL385 G6, in terms of networking all run Cisco Nexus 1000v as DVS switch.

When i said certain hosts, i was meaning it only happens maybe two at a time but its been seen across all hosts in the environment at certain time periods, which consists of 5 ESXi hosts.

They all have 12 NICS, 6 for DATA, 4 for SYSTEM, and 2 for DMZ

The DATA side (which is the only port profile affected, SYS and DMZ are fine) are cabled back to a Cisco 4948, which are cabled so there is resilience, if one dies, the other can forward traffic into the core network. The 6 DATA connections are split into 2 Port-Channels on the 4948 side, again both for data, which 3 go to 4948A and 3 go to 4948B.

Cheers

Thanks Chris - exactly the info I needed.

One more thing, can you advise the NIC model being used and driver version.  Mainly concerns about the ones connected to the Data Uplink PP.  (Seen some strange issues in the past with NetXen cards - common in HP boxes.)

I'll review the rest of the logs and get back to you.

Regards,

Robert

Hi Robert,

The NIC model across all servers are - Intel Corporation NC364T PCI Express Quad Port Gigabit Server Adapters

The Driver Version is as follows  -

ethtool -i vmnic8

driver: e1000e

version: 1.1.2-NAPI

firmware-version: 5.12-2

bus-info: 0000:0e:00.0

ethtool -i vmnic8

driver: e1000e

version: 1.1.2-NAPI

firmware-version: 5.12-2

bus-info: 0000:0e:00.0

The driver version is all the same across all vmnics, not sure if you need anymore information, if so please ask.

cheers

Chris,

Found your bug - CSCto51780.

This is fixed in SV1.4a - a quick upgrade will resolve your issues permanently.

Bug Detail:

Symptom:
VMs on some ESX hosts have connectivity issues and maybe fine on other ESX hosts despite vmnic uplinks inheriting the same ethernet port-profile.
This occurs due to inconsistent allowed trunk vlan on different port-channels.

Conditions:
This occurs when the affected esx host port-channel config has port-profile switchport trunk allowed vlans has leaked into the child configuration like such.

Good port-channel example
interface port-channel1
  inherit port-profile system-uplink

Affected port-channel example
interface port-channel2
  inherit port-profile system-uplink
  switchport trunk allowed vlan add 1,2,3,4,5

Workaround:
Identify affected port-channels and apply default switchport like such

Nexus1000v# conf t
Nexus1000v(config)# interface port-channel 
Nexus1000v(config-if)# default switchport trunk allowed vlan 

Repeat for all affected port-channels/vems/esx hosts.

Regards,

Robert

Hi Robert,

Thats great, i did find this last night. but thank you

what is the best way of doing an upgrade to minimise downtime within our environment.

thanks

Chris

Using vMotion you can perform the entire upgrade with no disruption to your virtual infrastructure. 

If this is your first upgrade, I highly recommend you go through the upgrade guides in detail.

There are two main guides.  One details the VSM and overall process, the other covers the VEM (ESX) side of the upgrade.  They're not very long guides, and should be easy to follow.

1000v Upgrade Guide:

http://www.cisco.com/en/US/docs/switches/datacenter/nexus1000/sw/4_2_1_s_v_1_4_a/upgrade/software/guide/n1000v_upgrade_software.html


VEM Upgrade Guides:

http://www.cisco.com/en/US/docs/switches/datacenter/nexus1000/sw/4_2_1_s_v_1_4_a/install/vem/guide/n1000v_vem_install.html

In a nutshell the procedure looks like this:

-Backup of VSM Config

-Run pre-upgrade check script (which will identify any config issues & ensures validation of new version with old config)

-Upgrade standby VSM

-Perform switchover

-Upgrade image on old active (current standby)

-Upgrade VEM modules

One decision you'll need to make is whether to use Update Manager or not for the VEM upgrades.  If you don't have many hosts, the manual method is a nice way to maintain control on exactly what's being upgrade & when.  It will allow you to migrate VMs off the host, upgrade it, and then continue in this manner for all remaining hosts.  The alternate is Update Manager, which can be a little sticky if it runs into issues.  This method will automatically put hosts in Maintenance Mode, migrate VMs off, and then upgrade each VEM one by one.  This is a non-stop process so there's a little less control from that perspective.   My own preference is any environment with 10 or less hosts, I use manual, for more than that let VUM do the work.

Let me know if you have any other questions.

Regards,

Robert

Hi Robert,

Thank you for the detail information, although i cannot access the link you attached, i get an error saying Forbidden File or Application?

cheers

We performed an upgrade last weekend and we had one issue that slowed us down for a while.

The VSM upgrade went well, but the VEM upgrade (we did use VUM) and it kept failing when we tried to add the host in Inventory -> Networking.

Ultimately, we found the issue, we had HA enabled.

Now the other twist, is we were migrating from vSwitch to the vDS on these hosts. We were on an old version of ESX 4.0 and were upgrading to ESXi 4.1.  To do that step earlier, we migrated 14 of our 16 hosts back to the vSwitch.  We left two blades, one on each chassis with the 1000v so we could upgrade it last week.  This was just brining the 1000v back to all of the blades.

YMMV

/alan

Sorry Chris - fixed the links.  My default when I acces links there's a slightly different URL. 

Yeah HA can cause issue - which is why I normally prefer the manual method.  16 hosts isn't too bad for a manual you're welcome to attempt either.

With ESX 4.1 and later, and 1000v 1.3 and later we've removed the dependency of the VEM .vib from the ESX version.  You'll find that 4.1 or later uses the same .vib file regardless of the underlying ESX build.

A simple upgrade from 1.4 ->1.4a should be very straight forward.  There's no feature changes, just bug fixes.

Let me know if you run into any challenges - always appreciate any & all feedback.

Best of luck,

Robert

Hi Robert,

Sorry to bother you again but those links still don't work, same error.

thanks

Third time is a charm.  My edit didn't "stick" last time.

These docs are found on CCO also.

Rob

Review Cisco Networking for a $25 gift card

Review Cisco Networking for a $25 gift card