cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1172
Views
0
Helpful
20
Replies

ucs mini puzzle. two servers break networking

turtleburger
Level 1
Level 1

Scenario:

one 5108 AC2 chassis, with two B200 m4 blades installed running ESXi6. each server has same service profile. lan connectivity profile for vNICs, mac pools for each FI, etc. 

two FIs, two sfp+ ports in each configured as uplinks. port one in each is carrying management vlan for ESXi. ESXi on each server has two vNICs for mgmt and vmotion, over vlans configured globally at lan cloud level. These are solid, no problem with both service profiles running. 

Port three in each is carrying iSCSi plans, FI-A has iscsi-a vlan, FI-B has iscsi-b vlan. these are set at FI level, and are not global. each servers ESXi has two vNICs for iscsi; Each assigned to a different vlan for failover and multipath. 

When i boot one server, it all is as expected. iSCSi connects two paths, one on each iSCSi vlan. if i boot the second service profile, mgmt vnic is fine, but iSCSi will not connect. no traffic is visible on the FI port. I can flip flop the service profiles and each work as expected- but the second to boot up never connects. 

what is happening here? I don't have mac or ip conflicts.

extra info: using an HPE 5406/J99990 switch upstream because some dummy bought it instead of a nexus. I don't see anything weird in the logs for switch but it deserves mentioning. vlan config looks good. 

thanks for any insight you can offer. 

20 Replies 20

Walter Dey
VIP Alumni
VIP Alumni

We need more information or some clarification:

- which UCS version ?

- do you boot the OS over iSCSI ? and if yes, does this work

- how many Ethernet vnics per service profile and fabric: 3 ? mgt, vmotion and iSCSI

- are the iSCSI vlan's native ?

- can you ping the mgt. interface of the ESX ?

Thank you Walter Dey

We need more information or some clarification:

- which UCS version ?

3.1(1g)

- do you boot the OS over iSCSI ? and if yes, does this work

Currently booting off flexflash, but yes, iscsi boot works. exhibits same behavior: one boots, the other will not connect. There are no iSCSi vNIC in the service profile now, doing iSCSi with ESXi vmkernel only.

- how many Ethernet vnics per service profile and fabric: 3 ? mgt, vmotion and iSCSi

Six vnic per service profile. two mgmt, two vm traffic, two iSCSi. each pair assigned to both fabric. like graphic:

- are the iSCSI vlan's native ?

iSCSi vlans are native on their respective vNICs

- can you ping the mgt. interface of the ESX ?

yes, both management interfaces are accessible when two service profiles are running. 

additionally, each server can icmp the others iscsi interfaces. i am assuming this happens in backplane. Only one can icmp the discovery address of the nimble array. 

Hi

- Did you select hardware failover in the service profile, or better is failover configured on the vswitch  - can you give us more information about the Northbound configuration; eg. do you connect FI's to two different switches; are the vlans trunked between this 2 switches ?

Walter.

Good questions Walter Dey, thank you for taking the time. 

the iSCSi vnics are not configured for failover in service profile. FI-A only knows about iSCSi-A vlan, FI-B only knows about iSCSi-B vlan. Failover is configured in the storage adapter, through multipath. 

FI-A/port3 and FI-B/port3 are in uplink trunk mode, both connected to same switch (waiting on another sfp+ module for the hpe chassis) 

at switch end, each port is tagged with the downstream vlan only. no other vlans are carried on the links.

In testing a few minutes ago, i made a new service profile and lan connectivity profile, removing the second iSCSi vNIC from each. So server A has one vnic in iSCSi-A vlan, server B has one in iSCSi-B. 

in this config, both come up and connect to lun targets. So it seems the problem occurs when a server has vnics in two vlans- even though each FI only has one of them. 

spanning tree is enabled on the switch. clear as mud, yes?

Hi

You can assign multiple vlans over a single vnic, which is called Vlan trunking. And one and only one of this vlans can be the native one. This of course implies, that the OS understands trunking. (or the iSCSI storage array).

.....So it seems the problem occurs when a server has vnics in two vlans....

you mean vmnic2 in 3010 and vmnic3 in 3020 ?

Btw. do you have 2 separate links (one 3010, the other 3020) to your iSCSI array, controller A resp. B ?

Could it be, that multipathing is sending packets over both links, which the storage array doesn't understand ?

Yes- server1/vmnic2 and server2/vmnic2 are in vlan 3010. likewise vmnic3 in 3020.

Yes, nimble cs30 has one link each controller in 3010 and 3020. all addresses pingable from devices in access switchports. 

3010 and 3020 are the only vlan on their respective links. are native. 

really starting to suspect the aruba/hpe gear. this is a cisco approved configuration- at least until hpe switching gets involved. maybe i should be asking in an hpe forum...

It would be best to post a diagram; I am still confused; eg.

one link to nimble controller A, one vlan, native ? same for controller B ?

nimble direct connected to UCS FI ?

see also

https://connect.nimblestorage.com/thread/1056

https://connect.nimblestorage.com/community/configuration-and-networking/blog/2013/06/13/iscsi-booting-ucs-bladerack-server-with-nimble-storage

https://supportforums.cisco.com/discussion/11768986/iscsi-storage-ucs

i cannot direct connect nimble to FI- need to hop through switch. see diagram:

the above is similar to the nimble doc you linked, and does not work. below does, however.

appreciate the time Walter Dey

On your design: I would try to remove the red untagged link to controller A, resp. the green untagged link to controller B. 

On each iSCSI path, you need a native end to end vlan. 

Q. on your tagged link FI-A Procurve-A: is Vlan 3010 native ?

Q. Are you using Ethernet End host mode on the FI ?

yes, End Host (default) on both FI. vlans on each iSCSi path are native at the vnic. 

for fun, i eliminated the second vlan throughout, based on some discussion here:

http://wahlnetwork.com/2015/03/09/when-to-use-multiple-subnet-iscsi-network-design/

now, each server has one vswitch, one vmk, and two vmnics (one in each FI)

same behavior. one host simply does not communicate with hosts outside of the UCS chassis, though icmp completes between blades.

vlans on each iSCSi path are native at the vnic, yes but

you have to set the native vlan also on the uplink from the FI to the Procurve

Regarding iSCSI

Q. can you ping the iSCSI controller from your ESX ?

Regarding reachabilitly without iSCSI

Q. can you reach the ESX hosts from outside UCS ?

Q. Can you reach the manangement interface of ESX ?

Q. can you communicate between ESX

appreciate your attention Walter. I think we are getting a bit into the weeds here; lets refocus a bit. 

the design works with one service profile/server running. multipath with multiple subnet, target access, management, tcp/ip to hosts in each subnet. all working as expected. This is an approved smartstack configuration with the exception of hopping iSCSi through a switch instead of utilizing the (relatively new) appliance ports for direct storage connection. 

The problem occurs when two servers boot the service profile. But, in this case management interfaces are unaffected. Only iSCSi network interfaces of whatever server boots second are unable to reach endpoints outside of the fabric interconnect. in this scenario one host continues to function and the other does not connect to storage. if the first is shut down and the second rebooted, the second works. 

yes, maddening and nonsensical. hope you hade a good Independence Day. 

This must be a storage (or a multipathing) issue,

are the 2 service profiles are identical ? derived from a template ? if yes 

do you have 2 different boot luns ? can you please show us how the 2 boot polices look like (e.g. different boot luns).

these servers boot from flexflash. UCS has no iSCSi config in profile. esxi handles the iSCSi connections. 

service profiles are identical, vnics are provisioned by lan connectivity policy and vnic template. 

sharked the relevant interface on the FI when server 2 boots and found this weirdness:

interface wakes up, rattles off some arp, then goes silent. no replies. 

arp table in switch only shows my laptop and the mgmt gateway. earlier in the pcap you can see the storage replying to ARP from vnics behing the other fabric, and there are active iSCSi sessions. 

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card