03-16-2011 11:16 AM
Hi,
This is a long and boring post, but please bear with me
I have a vSphere environnement with around 20 ESXi. All ESXi have 2 Intel 10GB NICs connected to a Necus 1000V except 2 servers that have 4 NICs connected to the 1000V instead of 2. A few days ago I implemented QoS (1.4, CBWFQ) on the uplink and got this strange message on those 2 servers and lost them (NB: their VMKernel is held by the 1K, but the VSM and VC are on another ESXi with VSS) :
2011 Mar 10 16:02:45 N1K-VDI %VEM_MGR-2-VEM_MGR_DETECTED: Host FRCP00ESX0030 detected as module 30
2011 Mar 10 16:02:45 N1K-VDI %VIM-5-IF_ATTACHED: Interface Ethernet30/6 is attached to vmnic5 on module 30
2011 Mar 10 16:02:45 N1K-VDI %VIM-5-IF_ATTACHED: Interface Ethernet30/7 is attached to vmnic6 on module 30
2011 Mar 10 16:02:45 N1K-VDI %VIM-5-IF_ATTACHED: Interface Ethernet30/8 is attached to vmnic7 on module 30
2011 Mar 10 16:02:45 N1K-VDI %VEM_MGR-2-MOD_ONLINE: Module 30 is online
2011 Mar 10 16:02:45 N1K-VDI %IPQOSMGR-SLOT30-3-QOSMGR_DPA_MSG: DPA returned error message - QoS Agent: Only one queuing policy instance (per VEM) is supported
2011 Mar 10 16:02:45 N1K-VDI %ETH_PORT_CHANNEL-5-CREATED: port-channel3 created
2011 Mar 10 16:02:46 N1K-VDI %IPQOSMGR-SLOT30-3-QOSMGR_DPA_MSG: DPA returned error message - QoS Agent: Only one queuing policy instance (per VEM) is supported
2011 Mar 10 16:02:46 N1K-VDI %PORT-PROFILE-2-INTERFACE_QUARANTINED: Interface Ethernet30/7 has been quarantined due to Cmd Failure
2011 Mar 10 16:02:46 N1K-VDI %PORT-PROFILE-2-INTERFACE_QUARANTINED: Interface Ethernet30/7 has been quarantined due to Cmd Failure
2011 Mar 10 16:02:46 N1K-VDI %ETHPORT-5-IF_DOWN_PORT_PROFILE_INHERIT_ERR: Interface Ethernet30/7 is down (port-profile inherit error)
2011 Mar 10 16:02:46 N1K-VDI %IPQOSMGR-SLOT30-3-QOSMGR_DPA_MSG: DPA returned error message - QoS Agent: Only one queuing policy instance (per VEM) is supported
2011 Mar 10 16:02:46 N1K-VDI %PORT-PROFILE-2-INTERFACE_QUARANTINED: Interface Ethernet30/6 has been quarantined due to Cmd Failure
2011 Mar 10 16:02:46 N1K-VDI %PORT-PROFILE-2-INTERFACE_QUARANTINED: Interface Ethernet30/6 has been quarantined due to Cmd Failure
2011 Mar 10 16:02:46 N1K-VDI %ETHPORT-5-IF_DOWN_PORT_PROFILE_INHERIT_ERR: Interface Ethernet30/6 is down (port-profile inherit error)
And at the same time in the accounting log :
Thu Mar 10 16:02:45 2011:update:ppm.18743:admin:configure terminal ; interface Ethernet30/6-8 (SUCCESS)
Thu Mar 10 16:02:45 2011:update:ppm.18743:admin:configure terminal ; interface Ethernet30/6-8 ; switchport mode trunk (SUCCESS)
Thu Mar 10 16:02:45 2011:update:ppm.18743:admin:configure terminal ; interface Ethernet30/6-8 ; switchport trunk allowed vlan 1-3967, 4048-4093 (SUCCESS)
Thu Mar 10 16:02:45 2011:update:ppm.18743:admin:configure terminal ; interface Ethernet30/6-8 ; service-policy type queuing output policy-queuing (FAILURE)
All servers with 2 NICs seem to behave well, thgough.
I recovered those servers eventually, removed them from the N1K, rebooted both and resinserted them. Since them I have connectivity problems with them, they become unreachable a few times a day. The status right now is :
1 was reachable but not visible int the N1K (neither module not interface)
1 was unreachable permanently until I shut all ports on the upstream Nexus 2K and unshut them.
Each time I lose one of them, I have just this kind of message in the N1K event log :
2011 Mar 16 10:30:34 N1K-VDI %VEM_MGR-2-VEM_MGR_REMOVE_NO_HB: Removing VEM 30 (heartbeats lost)
2011 Mar 16 10:30:34 N1K-VDI %ETHPORT-5-IF_DOWN_VEM_UNLICENSED: Interface Vethernet82 is down (VEM unlicensed)
Here is the uplink configuration :
port-profile type ethernet uplink
vmware port-group
switchport mode trunk
switchport trunk allowed vlan 1-3967,4048-4093
service-policy type queuing output policy-queuing
channel-group auto mode on
no shutdown
system vlan 998-999
state enabled
NB: Meanwhile nothing shows in the 5K logs...
Do this kind of problem rings a bell to any of you ?
Any help will be greatly appreciated,
Cheers,
Vincent.
Solved! Go to Solution.
10-08-2011 12:13 PM
What worked for me... I removed the QOS from the port-profile, removed the VM problem NIC, re-added the NIC (which brought up both 1000v interfaces), than replaced the QOS policy.
Addition of new NIC to port-channel fails if profile has queueing policy
Symptom:
Addition of NIC(s) to port-channel fails if the VEM is added first time to VSM
or NIC(s) are added first time to VSM one at a time. Error similar to the following can be seen:
%ETHPORT-5-IF_DOWN_PORT_PROFILE_INHERIT_ERR: Interface Ethernet1/2 is down (port-profile inherit error)
Conditions:
Failure is seen only when the NIC which was previously not added to VSM by any other port-profile and is added to port-channel port-profile for the first time or multiple NICs are added the first time to VSM one at a time.
Workaround:
Remove the queuing policy from the port-profile and add the NIC to the port-
channel. Once port joins the bundle, put back queuing policy. Subsequent NIC
removal/addition to profile will work fine. Or, bring the NIC up using any
other port-profile and then move it to intended port-profile having queuing
policy.
10-08-2011 12:13 PM
What worked for me... I removed the QOS from the port-profile, removed the VM problem NIC, re-added the NIC (which brought up both 1000v interfaces), than replaced the QOS policy.
Addition of new NIC to port-channel fails if profile has queueing policy
Symptom:
Addition of NIC(s) to port-channel fails if the VEM is added first time to VSM
or NIC(s) are added first time to VSM one at a time. Error similar to the following can be seen:
%ETHPORT-5-IF_DOWN_PORT_PROFILE_INHERIT_ERR: Interface Ethernet1/2 is down (port-profile inherit error)
Conditions:
Failure is seen only when the NIC which was previously not added to VSM by any other port-profile and is added to port-channel port-profile for the first time or multiple NICs are added the first time to VSM one at a time.
Workaround:
Remove the queuing policy from the port-profile and add the NIC to the port-
channel. Once port joins the bundle, put back queuing policy. Subsequent NIC
removal/addition to profile will work fine. Or, bring the NIC up using any
other port-profile and then move it to intended port-profile having queuing
policy.
10-10-2011 01:15 AM
This is indeed the answer. I myself tried the second workaround as I couldn't tamper with the uplink configuration in a live environment. The only scary thing is that the 1000V complains that they were 2 uplink port profiles with overlapping VLANs, however this didn't cause any troubles.
Sorry I forgot to post here when I found the bug ID, and thank you clattin for mentioning it.
Cheers.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: