03-16-2011 11:16 AM
Hi,
This is a long and boring post, but please bear with me
I have a vSphere environnement with around 20 ESXi. All ESXi have 2 Intel 10GB NICs connected to a Necus 1000V except 2 servers that have 4 NICs connected to the 1000V instead of 2. A few days ago I implemented QoS (1.4, CBWFQ) on the uplink and got this strange message on those 2 servers and lost them (NB: their VMKernel is held by the 1K, but the VSM and VC are on another ESXi with VSS) :
2011 Mar 10 16:02:45 N1K-VDI %VEM_MGR-2-VEM_MGR_DETECTED: Host FRCP00ESX0030 detected as module 30
2011 Mar 10 16:02:45 N1K-VDI %VIM-5-IF_ATTACHED: Interface Ethernet30/6 is attached to vmnic5 on module 30
2011 Mar 10 16:02:45 N1K-VDI %VIM-5-IF_ATTACHED: Interface Ethernet30/7 is attached to vmnic6 on module 30
2011 Mar 10 16:02:45 N1K-VDI %VIM-5-IF_ATTACHED: Interface Ethernet30/8 is attached to vmnic7 on module 30
2011 Mar 10 16:02:45 N1K-VDI %VEM_MGR-2-MOD_ONLINE: Module 30 is online
2011 Mar 10 16:02:45 N1K-VDI %IPQOSMGR-SLOT30-3-QOSMGR_DPA_MSG: DPA returned error message - QoS Agent: Only one queuing policy instance (per VEM) is supported
2011 Mar 10 16:02:45 N1K-VDI %ETH_PORT_CHANNEL-5-CREATED: port-channel3 created
2011 Mar 10 16:02:46 N1K-VDI %IPQOSMGR-SLOT30-3-QOSMGR_DPA_MSG: DPA returned error message - QoS Agent: Only one queuing policy instance (per VEM) is supported
2011 Mar 10 16:02:46 N1K-VDI %PORT-PROFILE-2-INTERFACE_QUARANTINED: Interface Ethernet30/7 has been quarantined due to Cmd Failure
2011 Mar 10 16:02:46 N1K-VDI %PORT-PROFILE-2-INTERFACE_QUARANTINED: Interface Ethernet30/7 has been quarantined due to Cmd Failure
2011 Mar 10 16:02:46 N1K-VDI %ETHPORT-5-IF_DOWN_PORT_PROFILE_INHERIT_ERR: Interface Ethernet30/7 is down (port-profile inherit error)
2011 Mar 10 16:02:46 N1K-VDI %IPQOSMGR-SLOT30-3-QOSMGR_DPA_MSG: DPA returned error message - QoS Agent: Only one queuing policy instance (per VEM) is supported
2011 Mar 10 16:02:46 N1K-VDI %PORT-PROFILE-2-INTERFACE_QUARANTINED: Interface Ethernet30/6 has been quarantined due to Cmd Failure
2011 Mar 10 16:02:46 N1K-VDI %PORT-PROFILE-2-INTERFACE_QUARANTINED: Interface Ethernet30/6 has been quarantined due to Cmd Failure
2011 Mar 10 16:02:46 N1K-VDI %ETHPORT-5-IF_DOWN_PORT_PROFILE_INHERIT_ERR: Interface Ethernet30/6 is down (port-profile inherit error)
And at the same time in the accounting log :
Thu Mar 10 16:02:45 2011:update:ppm.18743:admin:configure terminal ; interface Ethernet30/6-8 (SUCCESS)
Thu Mar 10 16:02:45 2011:update:ppm.18743:admin:configure terminal ; interface Ethernet30/6-8 ; switchport mode trunk (SUCCESS)
Thu Mar 10 16:02:45 2011:update:ppm.18743:admin:configure terminal ; interface Ethernet30/6-8 ; switchport trunk allowed vlan 1-3967, 4048-4093 (SUCCESS)
Thu Mar 10 16:02:45 2011:update:ppm.18743:admin:configure terminal ; interface Ethernet30/6-8 ; service-policy type queuing output policy-queuing (FAILURE)
All servers with 2 NICs seem to behave well, thgough.
I recovered those servers eventually, removed them from the N1K, rebooted both and resinserted them. Since them I have connectivity problems with them, they become unreachable a few times a day. The status right now is :
1 was reachable but not visible int the N1K (neither module not interface)
1 was unreachable permanently until I shut all ports on the upstream Nexus 2K and unshut them.
Each time I lose one of them, I have just this kind of message in the N1K event log :
2011 Mar 16 10:30:34 N1K-VDI %VEM_MGR-2-VEM_MGR_REMOVE_NO_HB: Removing VEM 30 (heartbeats lost)
2011 Mar 16 10:30:34 N1K-VDI %ETHPORT-5-IF_DOWN_VEM_UNLICENSED: Interface Vethernet82 is down (VEM unlicensed)
Here is the uplink configuration :
port-profile type ethernet uplink
vmware port-group
switchport mode trunk
switchport trunk allowed vlan 1-3967,4048-4093
service-policy type queuing output policy-queuing
channel-group auto mode on
no shutdown
system vlan 998-999
state enabled
NB: Meanwhile nothing shows in the 5K logs...
Do this kind of problem rings a bell to any of you ?
Any help will be greatly appreciated,
Cheers,
Vincent.
Solved! Go to Solution.
10-08-2011 12:13 PM
What worked for me... I removed the QOS from the port-profile, removed the VM problem NIC, re-added the NIC (which brought up both 1000v interfaces), than replaced the QOS policy.
Addition of new NIC to port-channel fails if profile has queueing policy
Symptom:
Addition of NIC(s) to port-channel fails if the VEM is added first time to VSM
or NIC(s) are added first time to VSM one at a time. Error similar to the following can be seen:
%ETHPORT-5-IF_DOWN_PORT_PROFILE_INHERIT_ERR: Interface Ethernet1/2 is down (port-profile inherit error)
Conditions:
Failure is seen only when the NIC which was previously not added to VSM by any other port-profile and is added to port-channel port-profile for the first time or multiple NICs are added the first time to VSM one at a time.
Workaround:
Remove the queuing policy from the port-profile and add the NIC to the port-
channel. Once port joins the bundle, put back queuing policy. Subsequent NIC
removal/addition to profile will work fine. Or, bring the NIC up using any
other port-profile and then move it to intended port-profile having queuing
policy.
10-08-2011 12:13 PM
What worked for me... I removed the QOS from the port-profile, removed the VM problem NIC, re-added the NIC (which brought up both 1000v interfaces), than replaced the QOS policy.
Addition of new NIC to port-channel fails if profile has queueing policy
Symptom:
Addition of NIC(s) to port-channel fails if the VEM is added first time to VSM
or NIC(s) are added first time to VSM one at a time. Error similar to the following can be seen:
%ETHPORT-5-IF_DOWN_PORT_PROFILE_INHERIT_ERR: Interface Ethernet1/2 is down (port-profile inherit error)
Conditions:
Failure is seen only when the NIC which was previously not added to VSM by any other port-profile and is added to port-channel port-profile for the first time or multiple NICs are added the first time to VSM one at a time.
Workaround:
Remove the queuing policy from the port-profile and add the NIC to the port-
channel. Once port joins the bundle, put back queuing policy. Subsequent NIC
removal/addition to profile will work fine. Or, bring the NIC up using any
other port-profile and then move it to intended port-profile having queuing
policy.
10-10-2011 01:15 AM
This is indeed the answer. I myself tried the second workaround as I couldn't tamper with the uplink configuration in a live environment. The only scary thing is that the 1000V complains that they were 2 uplink port profiles with overlapping VLANs, however this didn't cause any troubles.
Sorry I forgot to post here when I found the bug ID, and thank you clattin for mentioning it.
Cheers.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide