cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3924
Views
0
Helpful
2
Replies

Loosing 4-port ESXi on Nexus with QoS

mahbvh
Level 1
Level 1

Hi,

This is a long and boring post, but please bear with me

I have a vSphere environnement with around 20 ESXi. All ESXi have 2 Intel 10GB NICs connected to a Necus 1000V except 2 servers that have 4 NICs connected to the 1000V instead of 2. A few days ago I implemented QoS (1.4, CBWFQ) on the uplink and got this strange message on those 2 servers and lost them (NB: their VMKernel is held by the 1K, but the VSM and VC are on another ESXi with VSS) :

2011 Mar 10 16:02:45 N1K-VDI %VEM_MGR-2-VEM_MGR_DETECTED: Host FRCP00ESX0030 detected as module 30

2011 Mar 10 16:02:45 N1K-VDI %VIM-5-IF_ATTACHED: Interface Ethernet30/6 is attached to vmnic5 on module 30

2011 Mar 10 16:02:45 N1K-VDI %VIM-5-IF_ATTACHED: Interface Ethernet30/7 is attached to vmnic6 on module 30

2011 Mar 10 16:02:45 N1K-VDI %VIM-5-IF_ATTACHED: Interface Ethernet30/8 is attached to vmnic7 on module 30

2011 Mar 10 16:02:45 N1K-VDI %VEM_MGR-2-MOD_ONLINE: Module 30 is online

2011 Mar 10 16:02:45 N1K-VDI %IPQOSMGR-SLOT30-3-QOSMGR_DPA_MSG: DPA returned error message - QoS Agent: Only one queuing policy instance (per VEM) is supported 

2011 Mar 10 16:02:45 N1K-VDI %ETH_PORT_CHANNEL-5-CREATED: port-channel3 created

2011 Mar 10 16:02:46 N1K-VDI %IPQOSMGR-SLOT30-3-QOSMGR_DPA_MSG: DPA returned error message - QoS Agent: Only one queuing policy instance (per VEM) is supported 

2011 Mar 10 16:02:46 N1K-VDI %PORT-PROFILE-2-INTERFACE_QUARANTINED: Interface Ethernet30/7 has been quarantined due to Cmd Failure

2011 Mar 10 16:02:46 N1K-VDI %PORT-PROFILE-2-INTERFACE_QUARANTINED: Interface Ethernet30/7 has been quarantined due to Cmd Failure

2011 Mar 10 16:02:46 N1K-VDI %ETHPORT-5-IF_DOWN_PORT_PROFILE_INHERIT_ERR: Interface Ethernet30/7 is down (port-profile inherit error)

2011 Mar 10 16:02:46 N1K-VDI %IPQOSMGR-SLOT30-3-QOSMGR_DPA_MSG: DPA returned error message - QoS Agent: Only one queuing policy instance (per VEM) is supported 

2011 Mar 10 16:02:46 N1K-VDI %PORT-PROFILE-2-INTERFACE_QUARANTINED: Interface Ethernet30/6 has been quarantined due to Cmd Failure

2011 Mar 10 16:02:46 N1K-VDI %PORT-PROFILE-2-INTERFACE_QUARANTINED: Interface Ethernet30/6 has been quarantined due to Cmd Failure

2011 Mar 10 16:02:46 N1K-VDI %ETHPORT-5-IF_DOWN_PORT_PROFILE_INHERIT_ERR: Interface Ethernet30/6 is down (port-profile inherit error)

And at the same time in the accounting log :

Thu Mar 10 16:02:45 2011:update:ppm.18743:admin:configure terminal ; interface Ethernet30/6-8 (SUCCESS)

Thu Mar 10 16:02:45 2011:update:ppm.18743:admin:configure terminal ; interface Ethernet30/6-8 ; switchport mode trunk (SUCCESS)

Thu Mar 10 16:02:45 2011:update:ppm.18743:admin:configure terminal ; interface Ethernet30/6-8 ; switchport trunk allowed vlan 1-3967, 4048-4093 (SUCCESS)

Thu Mar 10 16:02:45 2011:update:ppm.18743:admin:configure terminal ; interface Ethernet30/6-8 ; service-policy type queuing output policy-queuing (FAILURE)

All servers with 2 NICs seem to behave well, thgough.

I recovered those servers eventually, removed them from the N1K, rebooted both and resinserted them. Since them I have connectivity problems with them, they become unreachable a few times a day. The status right now is :

1 was reachable but not visible int the N1K (neither module not interface)

1 was unreachable permanently until I shut all ports on the upstream Nexus 2K and unshut them.

Each time I lose one of them, I have just this kind of message in the N1K event log :

2011 Mar 16 10:30:34 N1K-VDI %VEM_MGR-2-VEM_MGR_REMOVE_NO_HB: Removing VEM 30 (heartbeats lost)

2011 Mar 16 10:30:34 N1K-VDI %ETHPORT-5-IF_DOWN_VEM_UNLICENSED: Interface Vethernet82 is down (VEM unlicensed)

2011 Mar 16 10:31:27 N1K-VDI %VIM-5-IF_DETACHED_MODULE_REMOVED: Interface Ethernet30/5 is detached (module removed)
2011 Mar 16 10:31:27 N1K-VDI %VIM-5-IF_DETACHED_MODULE_REMOVED: Interface Ethernet30/6 is detached (module removed)
2011 Mar 16 10:31:27 N1K-VDI %VIM-5-IF_DETACHED_MODULE_REMOVED: Interface Ethernet30/7 is detached (module removed)
2011 Mar 16 10:31:27 N1K-VDI %VIM-5-IF_DETACHED_MODULE_REMOVED: Interface Ethernet30/8 is detached (module removed)
2011 Mar 16 10:31:27 N1K-VDI %VIM-5-IF_DETACHED_MODULE_REMOVED: Interface Vethernet407 is detached (module removed)
2011 Mar 16 10:31:27 N1K-VDI %VIM-5-IF_DETACHED_MODULE_REMOVED: Interface Vethernet82 is detached (module removed)
(Needless to say the VEM is NOT unlicensed)

Here is the uplink configuration :

port-profile type ethernet uplink

  vmware port-group

  switchport mode trunk

  switchport trunk allowed vlan 1-3967,4048-4093

  service-policy type queuing output policy-queuing

  channel-group auto mode on

  no shutdown

  system vlan 998-999

  state enabled

And on both 5K/2K :
interface Ethernet103/1/1,Ethernet103/1/3
  description Member of Po30
  switchport mode trunk
  switchport trunk allowed vlan all
  spanning-tree port type edge trunk
  channel-group 30

interface port-channel30
  switchport mode trunk
  vpc 30
  switchport trunk allowed vlan all
  spanning-tree port type edge trunk
  speed 10000

NB: Meanwhile nothing shows in the 5K logs...

Do this kind of problem rings a bell to any of you ?

Any help will be greatly appreciated,

Cheers,

Vincent.

1 Accepted Solution

Accepted Solutions

clattin
Level 1
Level 1

What worked for me... I removed the QOS from the port-profile, removed the VM problem NIC, re-added the NIC (which brought up both 1000v interfaces), than replaced the QOS policy.

CSCtl70759 Bug Details

Addition of new NIC to port-channel fails if profile has queueing policy

Symptom:
Addition of NIC(s) to port-channel fails if the VEM is added first time to VSM
or NIC(s) are added first time to VSM one at a time. Error similar to the following can be seen:

%ETHPORT-5-IF_DOWN_PORT_PROFILE_INHERIT_ERR: Interface Ethernet1/2 is down (port-profile inherit error)

Conditions:
Failure is seen only when the NIC which was previously not added to VSM by any other port-profile and is added to port-channel port-profile for the first time or multiple NICs are added the first time to VSM one at a time.

Workaround:
Remove the queuing policy from the port-profile and add the NIC to the port-
channel. Once port joins the bundle, put back queuing policy. Subsequent NIC
removal/addition to profile will work fine. Or, bring the NIC up using any
other port-profile and then move it to intended port-profile having queuing
policy.

View solution in original post

2 Replies 2

clattin
Level 1
Level 1

What worked for me... I removed the QOS from the port-profile, removed the VM problem NIC, re-added the NIC (which brought up both 1000v interfaces), than replaced the QOS policy.

CSCtl70759 Bug Details

Addition of new NIC to port-channel fails if profile has queueing policy

Symptom:
Addition of NIC(s) to port-channel fails if the VEM is added first time to VSM
or NIC(s) are added first time to VSM one at a time. Error similar to the following can be seen:

%ETHPORT-5-IF_DOWN_PORT_PROFILE_INHERIT_ERR: Interface Ethernet1/2 is down (port-profile inherit error)

Conditions:
Failure is seen only when the NIC which was previously not added to VSM by any other port-profile and is added to port-channel port-profile for the first time or multiple NICs are added the first time to VSM one at a time.

Workaround:
Remove the queuing policy from the port-profile and add the NIC to the port-
channel. Once port joins the bundle, put back queuing policy. Subsequent NIC
removal/addition to profile will work fine. Or, bring the NIC up using any
other port-profile and then move it to intended port-profile having queuing
policy.

This is indeed the answer. I myself tried the second workaround as I couldn't tamper with the uplink configuration in a live environment. The only scary thing is that the 1000V complains that they were 2 uplink port profiles with overlapping VLANs, however this didn't cause any troubles.

Sorry I forgot to post here when I found the bug ID, and thank you clattin for mentioning it.

Cheers.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: