cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2195
Views
0
Helpful
3
Replies

Strange but explainable UCS VLAN Issues

brandonlucas
Level 1
Level 1

Hi all -

 

I have a very specific VLAN issue in our UCS-based ESXi environment, and I was hoping you could help me understand where this problem could possibly be.  I'm not a network guy, but I feel pretty confident in our UCS configuration, so I'm hoping for some assistance tracking this down based on our symptoms.

Here are my symptoms and environment explanation:

Basically, we noticed some intermittent storage connectivity issues recently in our VMware environment, and have tracked it down to more specific scenarios.  We originally had our iSCSI vmnics in VMware configured to use Native VLAN in UCS.  We really wanted to do VLAN tagging and our VMware vSwitches are set up as such.  This was simply a misconfiguration in our UCS config. When we corrected this and REMOVED Native VLAN from our vmnic's handling iSCSI storage traffic, we started to notice issues.

1.  Our UCS fabric interconnects connect straight up to a Nexus chassis.  Our storage also direct connects to the same Nexus chassis.

2.  Our "business traffic" vNics all use VLAN tagging with no issues in our VMware environment.  They have their own VLANs.

3.  Our iSCSI SAN traffic, however, is displaying strange behavior with native VLAN and VLAN tagging.  iSCSI traffic has it's own set of VLANs.

4.  We do use iSCSI boot, so we have Native VLAN set ONLY for the iSCSI boot interfaces.  Our vNICS that handle regular VMware operational iSCSI traffic should not have a native vlan set, because they are using VLAN tagging in VMware

5.  VLAN tagging is set up 100% correctly in VMware across the board.  I am certain of this.

 

Here is what we are seeing:

 

If I remove ALL Native VLAN on ALL iSCSI adapters, I can connect to our storage using VLAN B, but not VLAN A.

 

If I put Native VLAN on any adapter using VLAN A, and Native VLAN on any adapter using VLAN B, connectivity is restored for VLAN A, but connectivity is lost for VLAN B.

It is almost as if 8021q is not working at all in our Nexus environment, but only specifically for these 2 VLANs.  Remember:  our vNICS passing business traffic also use VLAN tagging and have no issues on the same Nexus environment.

3 Replies 3

Walter Dey
VIP Alumni
VIP Alumni

The concept of a native vlan applies per link;

in your case you have a native vlan between UCS FI and N5k, and possibly another native vlan between N5k and your storage system. They have to be the same for the vlan used for iSCSI boot however.

only one vlan per link can be native = untagged !

All UCS uplinks are vlan trunks, and I assume the same applies on your storage interface.

How is your network connectivity

- between UCS FI and N5k ? recommended is vpc ?

- what is your network connectivity between your storage controllers and N5k ? LACP ?

- is your storage active / standby, or active / active =

Here are more details.  The connections from our N5K's to the storage are NOT trunked.  VPC is used between FI and N5K.  I am told the connectivity is LACP.  Our storage is active/active (VNX5500).

 

Here are some more troubleshooting details that I find interesting:

Our primary iSCSI boot device uses VLAN 44 as native vlan on the UCS side.  Of course, it does not do VLAN tagging.  This sets up as a VMware Standard vSwitch after the host boots.

We have a separate vSwitch with 2 separate UCS vnics that is designed to handle the datastore iSCSI traffic.  Both of these NICs are set in UCS to allow 2 VLANS but neither one is native: 44, and 45.

Then within the VMware vSwitch, we have 2 separate port groups set up.  One does tagging on VLAN 44, and the other does tagging on VLAN 45.  We have found through further testing that VLAN 45 tagging seems to work.  However, VLAN tagging on 44 will not work.  Furthermore, if we reconfigure one of the vNics to only use VLAN44, set Native VLAN to 44, and remove the vlan tagging from the vSwitch, it still cannot communicate.  At that point, that vnic is the exact same setup as the iSCSI boot nic!  Furthermore, this vSwitch port group does not have a route in VMware to the 44 VLAN, which explains why it will not communicate, but the iSCSI boot port group DOES have a route.  This doesn't make a hell of a whole lot of sense to me to say the least.

 

I got to thinking that maybe a VLAN can only be on one VMware standard vSwitch, but I tested this and it doesn't make a difference.

 

Only the iSCSI boot interface can route on VLAN 44.  The vmnic's, even configured exactly like the ISCSI boot nic, will not route on 44.  Could this really all come back to the fact that our Nexus --> storage connectivity is not trunked?

 

 

Several issue

ESXi Managed interface:

Have a look at

http://keepingitclassless.net/2012/05/management-vlan-best-practices-in-esxi-and-cisco-ucs/

There you see, that for mgt interface of ESXi, you either specify a vlan, and in the UCS service profile it is tagged, or you specify no vlan, and in the SP this vlan is untagged !

iSCSI

- Don't understand you statement; you use vlan 44 for boot, therefore it has to be native;

- for iSCSI data you use vlan 44 and 45 (neither one is native) which contradicts above.

- connection N5k to VNX is not trunked ? how can this work ? if you run iscsi boot and vlan over different vlans

vSwitch

The uplink ports of a vswitch supports vlan trunking. In a portgroup you define which vlans are routed to a VM. This can be one or more vlan's.

iSCSI requires a special vmkernel port; similar to vmotion !

Review Cisco Networking for a $25 gift card

Review Cisco Networking for a $25 gift card