Re: UCS Mini - Jumbo Frames, Appliance Ports, and Uplinks.

efenstermaker · ‎10-18-2017

Hi all,

I'm having a hard time explaining some behavior that I'm seeing on a new UCS Mini, and was hoping one of the experts here would be willing to share some insight.

Sorry for no diagram, but this is a new system and we have a pretty simple topology right now:

UCS Mini, FI 6324's with 10Gb/s uplinks to a 4500x VSS (Ethernet 1/1).
Appliance ports (Ethernet 1/3) direct-connected to 10Gb/s I/O Module on EMC VNX 5300 (iSCSI).
B-Series blade running ESXi 6.0.

Everything seems to be running fine until I try to enable Jumbo MTU end-to-end:

VMware: Set MTU to 9000 on vSwitch and Ports used for storage connectivity.
UCS: Set MTU to 9000 on "Gold" QoS class. Assigned Gold class to a "storage" QoS policy. Assigned QoS policy to vNIC templates used for storage connectivity. Also set MTU to 9000 on storage vNIC templates (as required, the blade was rebooted for that).
EMC SAN: Set MTU to 9000 on relevant ports on I/O module.

Before I change these settings, if I ping from the VMWare host to the SAN port with an 8000 byte packet size, I receive an error that the message size is too large (as expected). After enabling Jumbo MTU, I no longer receive an error when pinging with the 8000-byte packet size, but I also do not receive ping replies. However I can still receive replies from the SAN port with the default packet size.

Now here's what I can't explain: the 4500x upstream switch is learning the MAC addresses of the 10Gb/s EMC SAN ports via the uplink interfaces from the FI's. This leads me to believe that I also need to configure Jumbo MTU on the uplinks from the FI's as well as the 4500x in order for this to work(?).

I haven't tried this yet so I don't know for sure if this is the solution, but even if it is, why is the northbound switch learning MAC addresses from ports that are plugged directly into a separate appliance port on the FI?

Thanks for any help.

Rick1776 · ‎10-19-2017

I'm assuming that the FI's are running in end-host mode. For VSS on the 4500X via the vss link it will sync the MAC address between the two switches in case one fails. This is normal behavior. Now the bigger issue could be that you are pinging from let's say SVR1 that goes to FI 1 that goes to 4500X SW1. Now what could be happening is that the return path is going through 4500X SW2 which will drop the packet. Via the appliance port I'm not sure if there is a way to make this an orphan port like in vPC so that it know's it's only directly connected to 4500X SW1 and will return via that switch only.

Are you running HSRP on the 4500X's.

efenstermaker · ‎10-19-2017

Yes, the FI's are running in end-host mode. No HSRP on the 4500x's.

I think my main issue here is that I don't have a grasp on the relationship between the Appliance Ports and the Uplink Ports. If I drop to nxos on the FI and check the mac address table of the iSCSI vlan, it learns the MAC of the Storage Processor on the SAN via the Appliance Port (Ethernet 1/3):

* 20 0060.16xx.xxxx dynamic 12250 F F Eth1/3

But the vNIC that resides on the storage VLAN is getting pinned to the Uplink Port (Ethernet 1/1):

747 esx-iscsi1-a Up Active No Protection Unprotected 0/0/0 1/0/1 Ether

I kind of threw my hands up this morning and set the MTU of the "Best Effort" class in UCS to 9000 (even though I was assigning the "Gold" class to the storage vNICs), and it all started working. That's nice I guess, but I'm a bit frustrated that I don't understand why I'm getting pinned to the Uplink and why any of this traffic is headed North in the first place.

Thanks for your help. With regards to your comments, I'm not *too* concerned about the return path from the VSS. It appears to be learning the correct MAC addresses on the appropriate ports.

Wes Austin · ‎10-19-2017

Did you create VLAN 20 in the appliance cloud and in the network cloud under the LAN tab in UCSM?

Can you provide the output of:

Pod2UCS4-A(nxos)# show interface priority-flow-control

It sounds like your not negotiating PFC, so everything is flowing best effort. If it is off, you need to ensure your QoS setting match between UCS/EMC/4500.

This guide is a little dated but explains everything well as far as Appliance ports go:

https://www.cisco.com/c/en/us/support/docs/servers-unified-computing/ucs-5100-series-blade-server-chassis/116075-ucs-app-connectivity-tshoot-00.html

efenstermaker · ‎10-19-2017

@Wes Austin wrote:

Did you create VLAN 20 in the appliance cloud and in the network cloud under the LAN tab in UCSM?

Yes, VLAN 20 exists in both the LAN Cloud and the Appliance Cloud. Unless VLAN 20 was allowed on the Uplink Port, the iSCSI vNICs would show a fault stating that there are no interfaces capable of carrying traffic for that VLAN.

Can you provide the output of:

Pod2UCS4-A(nxos)# show interface priority-flow-control

============================================================
Port Mode Oper(VL bmap) RxPPP TxPPP
============================================================

Ethernet1/1 Auto Off 0 0
Ethernet1/2 Auto Off 0 0
Ethernet1/3 Auto Off 0 0
Ethernet1/4 Auto Off 0 0
Ethernet1/6 Auto On (18) 0 0
Ethernet1/7 Auto Off 0 0
Ethernet1/8 Auto On (18) 0 0
Ethernet1/9 Auto Off 0 1002030
Ethernet1/10 Auto On (18) 0 0
Ethernet1/11 Auto Off 0 0
Ethernet1/12 Auto On (18) 0 0
Ethernet1/13 Auto Off 0 0
Ethernet1/14 Auto On (18) 0 0
Ethernet1/15 Auto Off 0 0
Ethernet1/16 Auto Off 0 0
Ethernet1/17 Auto Off 0 0
Ethernet1/18 Auto Off 0 0
Ethernet1/19 Auto Off 0 0
Ethernet1/20 Auto Off 0 0
Ethernet1/21 Auto Off 0 0
Ethernet1/22 Auto Off 0 0
Ethernet1/23 Auto Off 0 0
Ethernet1/24 Auto Off 0 0
Br-Ethernet1/5/1 Auto Off 0 0
Br-Ethernet1/5/2 Auto Off 0 0
Br-Ethernet1/5/3 Auto Off 0 0
Br-Ethernet1/5/4 Auto Off 0 0
Vethernet703 Auto Off 0 0
Vethernet705 Auto Off 0 0
Vethernet707 Auto Off 0 0
Vethernet709 Auto Off 0 0
Vethernet713 Auto Off 0 0
Vethernet715 Auto Off 0 0
Vethernet717 Auto Off 0 0
Vethernet719 Auto Off 0 0
Vethernet723 Auto Off 0 0
Vethernet725 Auto Off 0 0
Vethernet727 Auto Off 0 0
Vethernet729 Auto Off 0 0
Vethernet733 Auto Off 0 0
Vethernet735 Auto Off 0 0
Vethernet737 Auto Off 0 0
Vethernet739 Auto Off 0 0
Vethernet743 Auto Off 0 0
Vethernet745 Auto Off 0 0
Vethernet747 Auto Off 0 0
Vethernet749 Auto Off 0 0

It sounds like your not negotiating PFC, so everything is flowing best effort. If it is off, you need to ensure your QoS setting match between UCS/EMC/4500.

This guide is a little dated but explains everything well as far as Appliance ports go:

https://www.cisco.com/c/en/us/support/docs/servers-unified-computing/ucs-5100-series-blade-server-chassis/116075-ucs-app-connectivity-tshoot-00.html

Thanks, I've looked over that document before - particularly the section explaining why running Appliance Port VLANs on Uplinks would be desirable. I just don't understand why it would be pinned to the Uplink by default. The whole point (in my mind) of the appliance port is to have the iSCSI traffic flowing directly between the FI and the Storage Processor, and only send the traffic out the Uplink Port if that connection fails. It doesn't appear to be working out that way, otherwise I'm assuming I wouldn't have had to set Jumbo MTU on the "Best Effort" class.

I'll look into that PFC setting. I've always thought that setting only applied flow control to certain classes, but didn't have any bearing as to whether or not specific classes are actually in effect. For that, I've simply been assigning a class to a QoS policy, then applying that policy to the vNIC Template.

Thank you.

Sandeep Singh · ‎10-19-2017

Hi

The appliance port traffic will never traverse the Uplink ports, the reason same VLAN is required to be configured and it appears to be pinned to an uplink port, is how the appliance port is treated internally within UCSM (ie it is treated similar to a vnic and hence pinned to a uplink port). The traffic between UCS server to FI (appliance port) will directly go to the Storage. Note that you dont need to configure this VLAN on upstream switch, since this is not required, and in your case you also dont need Jumbo MTU on 4500.

Note that since appliance port is pinned to an Uplink port, in an event that uplink port goes down, the appliance port will also be brought down though it is not physically down. Again this is just how a vnic is brought down when the pinned Uplink port goes down. So its best to have uplink as a port-channel.

It looks like Jumbo MTU config is still not applied or not correctly configured end to end, thats why the ping fails. To test this in UCSM CLI run "connect nxos" and then "show hardware internal carmel counters interrupt match mtu" on both FIs. Start ping and check the values before and after the ping fails. The value on the port will increment if there was an MTU violation because of jumbo MTU.

efenstermaker · ‎10-19-2017

@Sandeep Singh wrote:

Hi

The appliance port traffic will never traverse the Uplink ports, the reason same VLAN is required to be configured and it appears to be pinned to an uplink port, is how the appliance port is treated internally within UCSM (ie it is treated similar to a vnic and hence pinned to a uplink port). The traffic between UCS server to FI (appliance port) will directly go to the Storage. Note that you dont need to configure this VLAN on upstream switch, since this is not required, and in your case you also dont need Jumbo MTU on 4500.

Actually, right now we only have 1 10G link going from FI A to Storage Processor A, and another going from FI B to Storage Processor B. In order for FI A to have a path to SP B should its Appliance Port (or SP A) become unavailable, this traffic will go out the Uplink, and Jumbo MTU will be needed on the 4500x ports.

The more I think about it, maybe this is the problem. Perhaps I should spend another 10G port on each FI for appliance ports cabled to the opposite storage processor on the SAN. That way each FI would have paths to each storage processor without ever needing to traverse the uplinks.

Note that since appliance port is pinned to an Uplink port, in an event that uplink port goes down, the appliance port will also be brought down though it is not physically down. Again this is just how a vnic is brought down when the pinned Uplink port goes down. So its best to have uplink as a port-channel.

This is very informative, thank you. This is a new installation, so we've already witnessed this behavior when unplugging the uplinks on each FI to test failover.

It looks like Jumbo MTU config is still not applied or not correctly configured end to end, thats why the ping fails. To test this in UCSM CLI run "connect nxos" and then "show hardware internal carmel counters interrupt match mtu" on both FIs. Start ping and check the values before and after the ping fails. The value on the port will increment if there was an MTU violation because of jumbo MTU.

That command isn't available when I drop to to nxos on my FI. It's a 6324 so perhaps it doesn't have the full nxos command set.

If the iSCSI packets are not hitting the uplinks, then I'm 95% certain I had it configured end-to-end. The MTU was configured in VMWare and also the Storage Processor ports on the SAN. I had the "Gold" class in UCS set to Jumbo MTU (9000) and assigned to the storage vNICS, but it wasn't until I set Jumbo MTU on the "Best Effort" class that it actually began to work.

"Best Effort" is the class assigned to the Uplink Ports, which is why I was convinced this traffic was hitting the 4500x.

Rick1776 · ‎10-20-2017

Did you get it working?

efenstermaker · ‎10-20-2017

I added an Appliance Port to each FI and crossed them over to the opposite storage processor. At this point, neither FI should have to send traffic to the upstream switch to reach either storage processor. I did that in hopes of solving this issue and also as a best practice.

Our VMware guy is now performing storage migrations on some test VM's to generate some traffic. I'm still seeing some traffic going out one uplink and back in the other, but it may just be incidental VM traffic.

I'm having some difficulty monitoring traffic on the appliance ports over SNMP, but that's another issue entirely. Once I get some sort of proper monitoring in place, I will know for sure that the traffic is flowing as expected.

Sandeep Singh · ‎10-20-2017

Forgot that the command is for 6200 series FI and you are using UCS mini, hence it didnt worked.

If, as you mentioned, that configuring Jumbo MTU on Best effort class, this worked; it means that the issue is with tagging the packet correctly for gold class at either end. Looks like you have configured jumbo MTU correct but one of the ends is not tagging it correctly for Gold class, hence it ends up in best effort class.

On UCS make sure that you have configured jumbo MTU on vnic as well on Gold class policy and have assigned this policy to the vnic.

efenstermaker · ‎10-20-2017

Yes, the packet size on the Gold class is set to 9000 (even before I set it for Best Effort), I have associated this class with a storage QoS policy, this policy is assigned to the appliance ports and the storage vNICs, and the vNICs themselves have their MTU set to 9000.

This is why I was convinced that the uplinks were interfering in some way - if traffic was flowing as expected it should have worked without modifying the Best Effort class.

Looks like i'll be dusting off Wireshark for this one. :)

Thank you, I appreciate your thoughtful replies.