cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3955
Views
0
Helpful
2
Replies

Loosing 4-port ESXi on Nexus with QoS

mahbvh
Level 1
Level 1

Hi,

This is a long and boring post, but please bear with me

I have a vSphere environnement with around 20 ESXi. All ESXi have 2 Intel 10GB NICs connected to a Necus 1000V except 2 servers that have 4 NICs connected to the 1000V instead of 2. A few days ago I implemented QoS (1.4, CBWFQ) on the uplink and got this strange message on those 2 servers and lost them (NB: their VMKernel is held by the 1K, but the VSM and VC are on another ESXi with VSS) :

2011 Mar 10 16:02:45 N1K-VDI %VEM_MGR-2-VEM_MGR_DETECTED: Host FRCP00ESX0030 detected as module 30

2011 Mar 10 16:02:45 N1K-VDI %VIM-5-IF_ATTACHED: Interface Ethernet30/6 is attached to vmnic5 on module 30

2011 Mar 10 16:02:45 N1K-VDI %VIM-5-IF_ATTACHED: Interface Ethernet30/7 is attached to vmnic6 on module 30

2011 Mar 10 16:02:45 N1K-VDI %VIM-5-IF_ATTACHED: Interface Ethernet30/8 is attached to vmnic7 on module 30

2011 Mar 10 16:02:45 N1K-VDI %VEM_MGR-2-MOD_ONLINE: Module 30 is online

2011 Mar 10 16:02:45 N1K-VDI %IPQOSMGR-SLOT30-3-QOSMGR_DPA_MSG: DPA returned error message - QoS Agent: Only one queuing policy instance (per VEM) is supported 

2011 Mar 10 16:02:45 N1K-VDI %ETH_PORT_CHANNEL-5-CREATED: port-channel3 created

2011 Mar 10 16:02:46 N1K-VDI %IPQOSMGR-SLOT30-3-QOSMGR_DPA_MSG: DPA returned error message - QoS Agent: Only one queuing policy instance (per VEM) is supported 

2011 Mar 10 16:02:46 N1K-VDI %PORT-PROFILE-2-INTERFACE_QUARANTINED: Interface Ethernet30/7 has been quarantined due to Cmd Failure

2011 Mar 10 16:02:46 N1K-VDI %PORT-PROFILE-2-INTERFACE_QUARANTINED: Interface Ethernet30/7 has been quarantined due to Cmd Failure

2011 Mar 10 16:02:46 N1K-VDI %ETHPORT-5-IF_DOWN_PORT_PROFILE_INHERIT_ERR: Interface Ethernet30/7 is down (port-profile inherit error)

2011 Mar 10 16:02:46 N1K-VDI %IPQOSMGR-SLOT30-3-QOSMGR_DPA_MSG: DPA returned error message - QoS Agent: Only one queuing policy instance (per VEM) is supported 

2011 Mar 10 16:02:46 N1K-VDI %PORT-PROFILE-2-INTERFACE_QUARANTINED: Interface Ethernet30/6 has been quarantined due to Cmd Failure

2011 Mar 10 16:02:46 N1K-VDI %PORT-PROFILE-2-INTERFACE_QUARANTINED: Interface Ethernet30/6 has been quarantined due to Cmd Failure

2011 Mar 10 16:02:46 N1K-VDI %ETHPORT-5-IF_DOWN_PORT_PROFILE_INHERIT_ERR: Interface Ethernet30/6 is down (port-profile inherit error)

And at the same time in the accounting log :

Thu Mar 10 16:02:45 2011:update:ppm.18743:admin:configure terminal ; interface Ethernet30/6-8 (SUCCESS)

Thu Mar 10 16:02:45 2011:update:ppm.18743:admin:configure terminal ; interface Ethernet30/6-8 ; switchport mode trunk (SUCCESS)

Thu Mar 10 16:02:45 2011:update:ppm.18743:admin:configure terminal ; interface Ethernet30/6-8 ; switchport trunk allowed vlan 1-3967, 4048-4093 (SUCCESS)

Thu Mar 10 16:02:45 2011:update:ppm.18743:admin:configure terminal ; interface Ethernet30/6-8 ; service-policy type queuing output policy-queuing (FAILURE)

All servers with 2 NICs seem to behave well, thgough.

I recovered those servers eventually, removed them from the N1K, rebooted both and resinserted them. Since them I have connectivity problems with them, they become unreachable a few times a day. The status right now is :

1 was reachable but not visible int the N1K (neither module not interface)

1 was unreachable permanently until I shut all ports on the upstream Nexus 2K and unshut them.

Each time I lose one of them, I have just this kind of message in the N1K event log :

2011 Mar 16 10:30:34 N1K-VDI %VEM_MGR-2-VEM_MGR_REMOVE_NO_HB: Removing VEM 30 (heartbeats lost)

2011 Mar 16 10:30:34 N1K-VDI %ETHPORT-5-IF_DOWN_VEM_UNLICENSED: Interface Vethernet82 is down (VEM unlicensed)

2011 Mar 16 10:31:27 N1K-VDI %VIM-5-IF_DETACHED_MODULE_REMOVED: Interface Ethernet30/5 is detached (module removed)
2011 Mar 16 10:31:27 N1K-VDI %VIM-5-IF_DETACHED_MODULE_REMOVED: Interface Ethernet30/6 is detached (module removed)
2011 Mar 16 10:31:27 N1K-VDI %VIM-5-IF_DETACHED_MODULE_REMOVED: Interface Ethernet30/7 is detached (module removed)
2011 Mar 16 10:31:27 N1K-VDI %VIM-5-IF_DETACHED_MODULE_REMOVED: Interface Ethernet30/8 is detached (module removed)
2011 Mar 16 10:31:27 N1K-VDI %VIM-5-IF_DETACHED_MODULE_REMOVED: Interface Vethernet407 is detached (module removed)
2011 Mar 16 10:31:27 N1K-VDI %VIM-5-IF_DETACHED_MODULE_REMOVED: Interface Vethernet82 is detached (module removed)
(Needless to say the VEM is NOT unlicensed)

Here is the uplink configuration :

port-profile type ethernet uplink

  vmware port-group

  switchport mode trunk

  switchport trunk allowed vlan 1-3967,4048-4093

  service-policy type queuing output policy-queuing

  channel-group auto mode on

  no shutdown

  system vlan 998-999

  state enabled

And on both 5K/2K :
interface Ethernet103/1/1,Ethernet103/1/3
  description Member of Po30
  switchport mode trunk
  switchport trunk allowed vlan all
  spanning-tree port type edge trunk
  channel-group 30

interface port-channel30
  switchport mode trunk
  vpc 30
  switchport trunk allowed vlan all
  spanning-tree port type edge trunk
  speed 10000

NB: Meanwhile nothing shows in the 5K logs...

Do this kind of problem rings a bell to any of you ?

Any help will be greatly appreciated,

Cheers,

Vincent.

1 Accepted Solution

Accepted Solutions

clattin
Level 1
Level 1

What worked for me... I removed the QOS from the port-profile, removed the VM problem NIC, re-added the NIC (which brought up both 1000v interfaces), than replaced the QOS policy.

CSCtl70759 Bug Details

Addition of new NIC to port-channel fails if profile has queueing policy

Symptom:
Addition of NIC(s) to port-channel fails if the VEM is added first time to VSM
or NIC(s) are added first time to VSM one at a time. Error similar to the following can be seen:

%ETHPORT-5-IF_DOWN_PORT_PROFILE_INHERIT_ERR: Interface Ethernet1/2 is down (port-profile inherit error)

Conditions:
Failure is seen only when the NIC which was previously not added to VSM by any other port-profile and is added to port-channel port-profile for the first time or multiple NICs are added the first time to VSM one at a time.

Workaround:
Remove the queuing policy from the port-profile and add the NIC to the port-
channel. Once port joins the bundle, put back queuing policy. Subsequent NIC
removal/addition to profile will work fine. Or, bring the NIC up using any
other port-profile and then move it to intended port-profile having queuing
policy.

View solution in original post

2 Replies 2

clattin
Level 1
Level 1

What worked for me... I removed the QOS from the port-profile, removed the VM problem NIC, re-added the NIC (which brought up both 1000v interfaces), than replaced the QOS policy.

CSCtl70759 Bug Details

Addition of new NIC to port-channel fails if profile has queueing policy

Symptom:
Addition of NIC(s) to port-channel fails if the VEM is added first time to VSM
or NIC(s) are added first time to VSM one at a time. Error similar to the following can be seen:

%ETHPORT-5-IF_DOWN_PORT_PROFILE_INHERIT_ERR: Interface Ethernet1/2 is down (port-profile inherit error)

Conditions:
Failure is seen only when the NIC which was previously not added to VSM by any other port-profile and is added to port-channel port-profile for the first time or multiple NICs are added the first time to VSM one at a time.

Workaround:
Remove the queuing policy from the port-profile and add the NIC to the port-
channel. Once port joins the bundle, put back queuing policy. Subsequent NIC
removal/addition to profile will work fine. Or, bring the NIC up using any
other port-profile and then move it to intended port-profile having queuing
policy.

This is indeed the answer. I myself tried the second workaround as I couldn't tamper with the uplink configuration in a live environment. The only scary thing is that the 1000V complains that they were 2 uplink port profiles with overlapping VLANs, however this didn't cause any troubles.

Sorry I forgot to post here when I found the bug ID, and thank you clattin for mentioning it.

Cheers.