cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
545
Views
0
Helpful
0
Replies

Nexus 1000v - VEM disconnects and takes down ESX host

ajmeria_dilpesh
Level 1
Level 1

Hi All,

 

We have recently implemented our first Nexus 1000v (v5.0.0 00000) on ESXi 5.5 hosts and are having issues with random VEM disconnects and I want to figure out why. Yesterday, and previously, we have had instances of an ESX host dropping off the network completely for a random period of time, causing a HA event and then reconnecting. This only seems to affect 1 host at a time and there is no pattern to the issue. Looking at the VSM logs, the following messages are displayed whenever this happens:

 

   2015 Jun 29 02:23:45.216 remotevem_mgr[2324]: %VEM_MGR-2-VEM_MGR_REMOVE_NO_HB: Removing VEM 3 (heartbeats lost)
2015 Jun 29 02:23:45.216 remotevem_mgr[2324]: %VEM_MGR-2-VEM_MGR_REMOVE_NO_HB: Removing VEM 4 (heartbeats lost)
2015 Jun 29 02:23:45.216 remotevem_mgr[2324]: %VEM_MGR-2-VEM_MGR_REMOVE_NO_HB: Removing VEM 5 (heartbeats lost)

 

Luckily the environment is not in production yet. I am hoping that someone may be able to help us figure out what is going on. To start, here is a little bit of information about our environment:

 

The primary and failover VSM machines are hosted on 2 different VMWare hosts at our central datacenter in Belgium and are connected via a private control network. These ESX hosts run on standard VMWare switches and have the private network defined on them. The VSMs are also connected via another connection to our normal server LAN, which provides L3 connectivity to our VEMs. Communication to the VEMs is done via the mgmt0 interface.

 

The vCenter the VSM talks to is also on a VM in the Belgium datacenter. It is on the same server VLAN as the management interface of the VSM.  

 

The VEMs affected by this issue are on 2 Cisco UCS C220M4 servers running ESX 5.5 hosted at a remote site in Sweden. They are connected to the vCenter in Belgium.

 

There are seperate port profiles for ESXi management and VSM to VEM connectivity. Both of these are on the same VLAN. ESXi management uses vmkernel port 0 and VEM to VSM uses vmkernel port 3. Only the VEM to VSM port profile has been given the l3control option. Both of these port profiles reach the outside world through the same pair of physical uplinks.

 

I'm a server guy, not a network guy (the network person I've been working with is away today) but I've managed to get hold of the configs of the port profiles for both the VSM-VEM port profile and the ESXi management network profile. Anyone have any ideas as to what might be causing the issues we are having? Any suggestions welcome.

 

ESXi management:

 

port-profile type vethernet Intra_FN_Mgmt
  switchport mode access
  switchport access vlan 702
  no shutdown
  description ESX MGM Uplink
  system vlan 702
  state enabled
  vmware port-group

 

VEM to VSM

port-profile type vethernet Intra_FN_Control
  switchport mode access
  switchport access vlan 702
  no shutdown
  capability l3control
  description ControlVSMVEM
  system vlan 702
  state enabled
  vmware port-group

 

Physical port profile (for what its worth)

interface port-channel2
  inherit port-profile Physical_FN_MGMT_702
  vem 5

 

 

 

0 Replies 0