06-29-2017 08:50 PM - edited 03-01-2019 01:13 PM
Scenario:
one 5108 AC2 chassis, with two B200 m4 blades installed running ESXi6. each server has same service profile. lan connectivity profile for vNICs, mac pools for each FI, etc.
two FIs, two sfp+ ports in each configured as uplinks. port one in each is carrying management vlan for ESXi. ESXi on each server has two vNICs for mgmt and vmotion, over vlans configured globally at lan cloud level. These are solid, no problem with both service profiles running.
Port three in each is carrying iSCSi plans, FI-A has iscsi-a vlan, FI-B has iscsi-b vlan. these are set at FI level, and are not global. each servers ESXi has two vNICs for iscsi; Each assigned to a different vlan for failover and multipath.
When i boot one server, it all is as expected. iSCSi connects two paths, one on each iSCSi vlan. if i boot the second service profile, mgmt vnic is fine, but iSCSi will not connect. no traffic is visible on the FI port. I can flip flop the service profiles and each work as expected- but the second to boot up never connects.
what is happening here? I don't have mac or ip conflicts.
extra info: using an HPE 5406/J99990 switch upstream because some dummy bought it instead of a nexus. I don't see anything weird in the logs for switch but it deserves mentioning. vlan config looks good.
thanks for any insight you can offer.
06-29-2017 11:26 PM
We need more information or some clarification:
- which UCS version ?
- do you boot the OS over iSCSI ? and if yes, does this work
- how many Ethernet vnics per service profile and fabric: 3 ? mgt, vmotion and iSCSI
- are the iSCSI vlan's native ?
- can you ping the mgt. interface of the ESX ?
06-30-2017 07:30 AM
Thank you Walter Dey
We need more information or some clarification:
- which UCS version ?
3.1(1g)
- do you boot the OS over iSCSI ? and if yes, does this work
Currently booting off flexflash, but yes, iscsi boot works. exhibits same behavior: one boots, the other will not connect. There are no iSCSi vNIC in the service profile now, doing iSCSi with ESXi vmkernel only.
- how many Ethernet vnics per service profile and fabric: 3 ? mgt, vmotion and iSCSi
Six vnic per service profile. two mgmt, two vm traffic, two iSCSi. each pair assigned to both fabric. like graphic:
- are the iSCSI vlan's native ?
iSCSi vlans are native on their respective vNICs
- can you ping the mgt. interface of the ESX ?
yes, both management interfaces are accessible when two service profiles are running.
additionally, each server can icmp the others iscsi interfaces. i am assuming this happens in backplane. Only one can icmp the discovery address of the nimble array.
06-30-2017 09:35 AM
Hi
- Did you select hardware failover in the service profile, or better is failover configured on the vswitch - can you give us more information about the Northbound configuration; eg. do you connect FI's to two different switches; are the vlans trunked between this 2 switches ?
Walter.
06-30-2017 11:34 AM
Good questions Walter Dey, thank you for taking the time.
the iSCSi vnics are not configured for failover in service profile. FI-A only knows about iSCSi-A vlan, FI-B only knows about iSCSi-B vlan. Failover is configured in the storage adapter, through multipath.
FI-A/port3 and FI-B/port3 are in uplink trunk mode, both connected to same switch (waiting on another sfp+ module for the hpe chassis)
at switch end, each port is tagged with the downstream vlan only. no other vlans are carried on the links.
In testing a few minutes ago, i made a new service profile and lan connectivity profile, removing the second iSCSi vNIC from each. So server A has one vnic in iSCSi-A vlan, server B has one in iSCSi-B.
in this config, both come up and connect to lun targets. So it seems the problem occurs when a server has vnics in two vlans- even though each FI only has one of them.
spanning tree is enabled on the switch. clear as mud, yes?
06-30-2017 12:52 PM
Hi
You can assign multiple vlans over a single vnic, which is called Vlan trunking. And one and only one of this vlans can be the native one. This of course implies, that the OS understands trunking. (or the iSCSI storage array).
.....So it seems the problem occurs when a server has vnics in two vlans....
you mean vmnic2 in 3010 and vmnic3 in 3020 ?
Btw. do you have 2 separate links (one 3010, the other 3020) to your iSCSI array, controller A resp. B ?
Could it be, that multipathing is sending packets over both links, which the storage array doesn't understand ?
06-30-2017 08:47 PM
Yes- server1/vmnic2 and server2/vmnic2 are in vlan 3010. likewise vmnic3 in 3020.
Yes, nimble cs30 has one link each controller in 3010 and 3020. all addresses pingable from devices in access switchports.
3010 and 3020 are the only vlan on their respective links. are native.
really starting to suspect the aruba/hpe gear. this is a cisco approved configuration- at least until hpe switching gets involved. maybe i should be asking in an hpe forum...
06-30-2017 11:33 PM
It would be best to post a diagram; I am still confused; eg.
one link to nimble controller A, one vlan, native ? same for controller B ?
nimble direct connected to UCS FI ?
see also
https://connect.nimblestorage.com/thread/1056
https://connect.nimblestorage.com/community/configuration-and-networking/blog/2013/06/13/iscsi-booting-ucs-bladerack-server-with-nimble-storage
https://supportforums.cisco.com/discussion/11768986/iscsi-storage-ucs
07-01-2017 05:11 PM
07-02-2017 01:38 AM
On your design: I would try to remove the red untagged link to controller A, resp. the green untagged link to controller B.
On each iSCSI path, you need a native end to end vlan.
Q. on your tagged link FI-A Procurve-A: is Vlan 3010 native ?
Q. Are you using Ethernet End host mode on the FI ?
07-02-2017 10:05 AM
yes, End Host (default) on both FI. vlans on each iSCSi path are native at the vnic.
for fun, i eliminated the second vlan throughout, based on some discussion here:
http://wahlnetwork.com/2015/03/09/when-to-use-multiple-subnet-iscsi-network-design/
now, each server has one vswitch, one vmk, and two vmnics (one in each FI)
same behavior. one host simply does not communicate with hosts outside of the UCS chassis, though icmp completes between blades.
07-02-2017 12:27 PM
vlans on each iSCSi path are native at the vnic, yes but
you have to set the native vlan also on the uplink from the FI to the Procurve
Regarding iSCSI
Q. can you ping the iSCSI controller from your ESX ?
Regarding reachabilitly without iSCSI
Q. can you reach the ESX hosts from outside UCS ?
Q. Can you reach the manangement interface of ESX ?
Q. can you communicate between ESX
07-05-2017 09:48 AM
appreciate your attention Walter. I think we are getting a bit into the weeds here; lets refocus a bit.
the design works with one service profile/server running. multipath with multiple subnet, target access, management, tcp/ip to hosts in each subnet. all working as expected. This is an approved smartstack configuration with the exception of hopping iSCSi through a switch instead of utilizing the (relatively new) appliance ports for direct storage connection.
The problem occurs when two servers boot the service profile. But, in this case management interfaces are unaffected. Only iSCSi network interfaces of whatever server boots second are unable to reach endpoints outside of the fabric interconnect. in this scenario one host continues to function and the other does not connect to storage. if the first is shut down and the second rebooted, the second works.
yes, maddening and nonsensical. hope you hade a good Independence Day.
07-05-2017 01:31 PM
This must be a storage (or a multipathing) issue,
are the 2 service profiles are identical ? derived from a template ? if yes
do you have 2 different boot luns ? can you please show us how the 2 boot polices look like (e.g. different boot luns).
07-05-2017 03:08 PM
these servers boot from flexflash. UCS has no iSCSi config in profile. esxi handles the iSCSi connections.
service profiles are identical, vnics are provisioned by lan connectivity policy and vnic template.
sharked the relevant interface on the FI when server 2 boots and found this weirdness:
interface wakes up, rattles off some arp, then goes silent. no replies.
arp table in switch only shows my laptop and the mgmt gateway. earlier in the pcap you can see the storage replying to ARP from vnics behing the other fabric, and there are active iSCSi sessions.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide