cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
9312
Views
0
Helpful
8
Replies

VEM tanked... interesting error.

ryan.lambert
Level 1
Level 1

As part of our testing, I published two new port-profiles for one of our vSphere hosts to use. The one profile carries our VMotion, SC/Mgmt, Control, Packet, and Guest traffic. The second uplink contains simply the iSCSI network. The objective was to split them up temporarily to do some iSCSI performance testing.

We removed the first VMNIC from the uplink port-profile that previously was applied, and popped it onto the new one.

We then put the second VMNIC on the new iSCSI port profile, and noticed the VEM dropped off the face of the earth and never came back. This is the error I saw... and wondering if anyone has any idea what is going on (more towards the bottom). Did we do something horribly stupid by breaking the port-channel and putting them on separate uplink port profiles?

2010 Mar 12 15:19:31 N1KV-VSM1 %VIM-5-IF_DETACHED: Interface Ethernet3/5 is detached
2010 Mar 12 15:19:31 N1KV-VSM1 %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel1: Ethernet3/5 is down
2010 Mar 12 15:19:31 N1KV-VSM1 %ETHPORT-5-IF_DOWN_MODULE_REMOVED: Interface Ethernet3/5 is down (module removed)
2010 Mar 12 15:19:32 N1KV-VSM1 %ETHPORT-5-IF_DOWN_INTERFACE_REMOVED: Interface Ethernet3/5 is down (Interface removed)
2010 Mar 12 15:19:32 N1KV-VSM1 %VIM-5-IF_DETACHED: Interface Ethernet3/8 is detached
2010 Mar 12 15:19:32 N1KV-VSM1 %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel1: Ethernet3/8 is down
2010 Mar 12 15:19:32 N1KV-VSM1 %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel1: first operational port changed from Ethernet3/8 to none
2010 Mar 12 15:19:32 N1KV-VSM1 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel1 is down (No operational members)
2010 Mar 12 15:19:32 N1KV-VSM1 %ETHPORT-5-IF_DOWN_MODULE_REMOVED: Interface Ethernet3/8 is down (module removed)
2010 Mar 12 15:19:32 N1KV-VSM1 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel1 is down (No operational members)
2010 Mar 12 15:19:33 N1KV-VSM1 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel1 is down (No operational members)
2010 Mar 12 15:19:33 N1KV-VSM1 %ETHPORT-5-IF_DOWN_INTERFACE_REMOVED: Interface Ethernet3/8 is down (Interface removed)
2010 Mar 12 15:19:33 N1KV-VSM1 %VIM-5-IF_ATTACHED: Interface Ethernet3/5 is attached to vmnic4 on module 3
2010 Mar 12 15:19:33 N1KV-VSM1 %VIM-5-IF_ATTACHED: Interface Ethernet3/8 is attached to vmnic7 on module 3
2010 Mar 12 15:19:38 N1KV-VSM1 %PLATFORM-2-PFM_VEM_REMOVE_NO_HB: Removing VEM 3 (heartbeats lost)
2010 Mar 12 15:19:38 N1KV-VSM1 %PLATFORM-2-MOD_REMOVE: Module 3 removed (Serial number )

2010 Mar 12 15:19:47 N1KV-VSM1 %ETHPORT-5-IF_SEQ_ERROR: Error (0x6e) while communicating with component MTS_SAP_PORT_CLIENT opcode:MTS_OPC_LC_PORT_CLIENT_CONFIG (for:RID_MODULE: 2)
2010 Mar 12 15:19:47 N1KV-VSM1 %PORTPROFILE-3-PORT_PROFILE_CHANGE_VERIFY_REQ_FAILURE: Process (SAP=175) has returned failure while processing update for port-profile iSCSI-Uplink
2010 Mar 12 15:19:56 N1KV-VSM1 %ETHPORT-5-IF_SEQ_ERROR: Error (0x6e) while communicating with component MTS_SAP_PORT_CLIENT opcode:MTS_OPC_LC_PORT_CLIENT_CONFIG (for:RID_MODULE: 2)
2010 Mar 12 15:19:56 N1KV-VSM1 %PORTPROFILE-3-PORT_PROFILE_CHANGE_VERIFY_REQ_FAILURE: Process (SAP=175) has returned failure while processing update for port-profile SLIVS01-Uplink.

-

N1KV-VSM1# sh svs neighbors

Active Domain ID: 91

AIPC Interface MAC: 0050-56ba-2b03
Inband Interface MAC: 0050-56ba-7f80

Src MAC           Type   Domain-id    Node-id     Last learnt (Sec. ago)
------------------------------------------------------------------------

0002-3d40-5b02     VEM        91         0302     31606.30
0002-3d40-5b03     VEM        91         0402     196707.16
0002-3d40-5b04     VEM        91         0502     196707.16

Here are the actual port profiles (matching on physical switch)

Note: VLAN91 is VSM management/Service Console. 95 & 96 are control/packet.    

port-profile type ethernet iSCSI-Uplink
  vmware port-group
  switchport mode trunk
  switchport trunk allowed vlan 93
  no shutdown
  system vlan 93
  state enabled

port-profile type ethernet VS01-Uplink
  vmware port-group
  switchport mode trunk
  switchport trunk allowed vlan 91-92,94-96
  no shutdown
  system vlan 91-92,95-96
  state enabled

Thanks,

Ryan

8 Replies 8

ryan.lambert
Level 1
Level 1

Resolution to this:

Created vSwitch1 and added vmnic0 to it - we were using 4 and 7 for the 1KV - and changed the Service Console IP to something else. We then jumped into vcenter since we were locked out on the old IP (SC was on the 1KV), and moved vmnic4/7 back to the original port-profiles. Rebooted, deleted vSwitch 1, rebooted again, and it was fixed.

We seem to be able to replicate this at will...

Anyone know if there is an actual bug filed for this, or is this a procedural error on our part? Really all I am trying to do is break the port-channel by using different port-profiles, and use the uplinks for separate purposes to do some testing.

If I ever needed to juggle the uplinks for something (rare, but possible I suppose) outside of doing manual subgroup pinning, I'd cripple one of my hosts attempting this.

Hi Ryan,

Your port-profile configuration is missing the port-channeling option.

Are your upstream switches clustered or not ?

Anyway I am not sure which code release you are using, but you should use SV1(2) and then configure the mac-pinning option under the uplink port-profile.

Right now you won't have any kind of HA between your uplink so if you remove a port yes the chances that the VEM never come back up are really high.

So under both port-channel please add the config channel-group auto mode on mac-pinning.

Please let me know if it helps.

Cheers

Hi there,

Hopefully I can clarify.

Under normal circumstances I run a port-channel with mac pinning configured. That works OK.

What I was trying to do was "break" that port-channel, using each uplink for a separate purpose (no redundancy temporarily). When I did that, despite having all of my appropriate VLANs (packet, control, mgmt) across one of the two uplinks, I could not communicate to the VEM and saw those errors.

The port-profiles I posted are not supposed to be part of a port-channel, which is why I left the channel-group commands out.

Backend switches are a pair of Nexus 5020s without vPC configured (just trunks into the physical NIC uplinks). I am on SV1(2).

Hopefully Jason doesn't mind that I link to his blog in this thread, but it seems mighty similar to the 2nd of the two bugs here:

http://jasonnash.wordpress.com/2010/03/10/two-annoying-bugs-in-the-cisco-nexus-1000v/

Ryan,

Did you configured your upstream N5K ports with "Portfast"?

Robert

Hi Robert,

I do have "spanning-tree port type edge trunk" on both of the switch ports that physically connect to the VM Host, in addition to the VLAN that is in the port-profile.

estine
Level 1
Level 1

Hey Ryan -

This is a known bug.  You should be able to find the information on it here:  http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCtc18601

Basically this bug is hit when the port-group is changed in a single step, meaning the vmnic is removed from the current port-group and added to the new port-group in one operation (without clicking "ok" in between).  This can be avoided if the port-group change is broken up into multiple steps:  from vCenter, go to Manage Physical adaptors, remove vmnic from the current port-group,  click ok , then go back to manage physical adapters and add the vmnic to the new port-group.

This bug will be fixed in the next release of the Nexus 1000V.

Thanks,

Liz

Okay, thank you!

Wanted to make sure this was in fact what I was running into. It didn't seem like I was doing anything "wrong", per se, although it appears the 1000v didn't appreciate me trying to consolidate steps.

Thanks again for the follow-up.

Review Cisco Networking for a $25 gift card