cancel
Showing results for 
Search instead for 
Did you mean: 
cancel

Jumbo Mumbo in OpenStack using Cisco's UCS servers and Nexus 9000

1191
Views
1
Helpful
0
Comments
Cisco Employee

This is a blog about configuring jumbo frames in OpenStack in order to scale neutron and nova in a production datacenter using Cisco's UCS servers, UCS Fabric Interconnects and Nexus 9000.

In computer networking, MTU (Maximum Transmission Unit) is the size (in bytes) of the largest packet that can be transferred by an interface without IP fragmentation.

MTU is:

  • Fixed before transmitting data (Ethernet)
  • Negotiated during handshake/connect time (point-to-point serial links)
  • Dynamically determined on-the-fly while transmitting data

The default MTU is 1500 bytes.  Jumbo MTU is 9000 bytes.

Here is an IP packet.

mtu.png


IP fragmentation is an process that breaks datagrams into smaller pieces (fragments), so that packets may be formed that can pass through a link with a smaller maximum transmission unit (MTU) than the original datagram size. The fragments are reassembled by the receiving host.  IP fragmentation adds additional processing overhead on the interface.

IP fragmentation is controlled by the DF (Don't Fragment) bit.

IP Fragmentation (DF bit in IP header):


df.png

Here are the neutron configurations needed to enable jumbo MTU (9000 bytes) in OpenStack.

In neutron.conf:

    [DEFAULT]

    global_physnet_mtu = 9000

    advertise_mtu = true

In openvswitch_agent.ini:

    [ovs]

    bridge_mappings = provider1:eth1,provider2:eth2,provider3:eth3


In ml2_conf.ini:

    [ml2]

    physical_network_mtus = provider2:4000,provider3:1500

    path_mtu = 9000

Enable the DHCP MTU option (26) in /etc/neutron/dnsmasq-neutron.conf:

    dhcp-option-force=26,1454

Restart dnsmasq (DHCP server) on all network nodes.

The MTU values only apply to new network resources.  network_device_mtu in nova.conf is deprecated in OpenStack Juno.

Use case to test jumbo frames in OpenStack:

  • Boot a nova instance
  • Create a cinder volume using ceph backend
  • Attach volume to instance and mount it
  • SSH into instance and write lots of data in this volume
  • Measure traffic drop on the OpenStack nodes and ceph nodes

Refer How to attach cinder/ceph XFS volume to a nova instance in OpenStack horizon for more information about the above steps in the use case.

Below are the steps to configure jumbo frames on the UCS Fabric Interconnects in Cisco's UCSM GUI:


ucsm1.png


ucsm2.png


Test jumbo MTU on the OpenStack nodes (controller and compute nodes):


# ip a | grep mtu

2: mx: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000

3: t: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000

4: p: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000

7: br-inst: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN

8: br-prov: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN

9: br-int: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN

10: phy-br-inst@int-br-inst: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000

11: int-br-inst@phy-br-inst: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000

12: phy-br-prov@int-br-prov: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000

13: int-br-prov@phy-br-prov: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000

Send jumbo frames with 8972 bytes as payload using ping from OpenStack nodes.  These frames will not be fragmented due to -M do.

# ping -M do -s 8972 10.7.8.2

PING 10.7.8.2 (10.7.8.2) 8972(9000) bytes of data.

8980 bytes from 10.7.8.2: icmp_seq=1 ttl=64 time=0.118 ms

8980 bytes from 10.7.8.2: icmp_seq=2 ttl=64 time=0.066 ms

--- 10.7.8.2 ping statistics ---

2 packets transmitted, 2 received, 0% packet loss, time 3999ms

rtt min/avg/max/mdev = 0.062/0.082/0.118/0.022 ms

-M above is used to select Path MTU Discovery strategy. -M above can be:

  • do (prohibit fragmentation, even local one)
  • want (do PMTU discovery, fragment locally when packet size is large)
  • dont (do not set DF flag)

Test jumbo frames on Cisco's UCS Fabric Interconnects:

UCS-FI-A# connect nxos

UCS-FI-A(nxos)# show queuing interface | grep MTU

    q-size: 360640, HW MTU: 9216 (9216 configured)

Send jumbo frames with 9000 bytes as payload using ping from the UCS Fabric Interconnect.

UCS-FI-A(local-mgmt)# ping 10.23.223.20 count 5 packet-size 9000

PING 10.23.223.20 (10.23.223.20) from 10.23.223.45 : 9000(9028) bytes of data.

9008 bytes from 10.23.223.20: icmp_seq=1 ttl=255 time=0.741 ms

9008 bytes from 10.23.223.20: icmp_seq=2 ttl=255 time=0.796 ms

9008 bytes from 10.23.223.20: icmp_seq=3 ttl=255 time=0.740 ms

9008 bytes from 10.23.223.20: icmp_seq=4 ttl=255 time=0.775 ms

9008 bytes from 10.23.223.20: icmp_seq=5 ttl=255 time=0.814 ms

--- 10.23.223.20 ping statistics ---

5 packets transmitted, 5 received, 0% packet loss, time 4033ms

rtt min/avg/max/mdev = 0.740/0.773/0.814/0.034 ms

Configure and test jumbo frames on Cisco's Nexus 9000:


N9k# configure terminal

N9k(config)# interface Ethernet1/3

N9k(config-if)# mtu 9216

N9k(config-if)# end

N9k# show running-config interface ethernet 1/3

interface Ethernet1/3

  mtu 9216

N9k# show interface ethernet 1/3

Ethernet1/3 is up

  MTU 9216 bytes, BW 10000000 Kbit, DLY 10 usec

Jumbo MTU can also be configured for port-channels.

Send jumbo frames with 9000 bytes as payload using ping from Nexus 9000.

N9k# ping 10.23.223.21 vrf management packet-size 9000 count 5

PING 10.23.223.21 (10.23.223.21): 9000 data bytes

9008 bytes from 10.23.223.21: icmp_seq=0 ttl=254 time=1.384 ms

9008 bytes from 10.23.223.21: icmp_seq=1 ttl=254 time=0.993 ms

9008 bytes from 10.23.223.21: icmp_seq=2 ttl=254 time=0.919 ms

9008 bytes from 10.23.223.21: icmp_seq=3 ttl=254 time=0.927 ms

9008 bytes from 10.23.223.21: icmp_seq=4 ttl=254 time=1.002 ms

--- 10.23.223.21 ping statistics ---

5 packets transmitted, 5 packets received, 0.00% packet loss

round-trip min/avg/max = 0.919/1.044/1.384 ms

Jumbo packet (with MTU 9000 bytes) captured in Wireshark packet sniffer:


jumbo_packet.jpg


Jumbo frames in OpenStack with neutron SR-IOV ports:

I observed that nova VMs attached to a neutron SR-IOV port will also pass jumbo MTU packets.  Below is how jumbo MTU is seen on the interface of a nova VM attached to a neutron SR-IOV port.

[cloud-user@sr-iov-vm ~]$ ip a | grep 9000

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP qlen 1000


Path MTU Discovery:

Path MTU Discovery (PMTUD) is a technique to determining the MTU of the network path between source and destination.  PMTUD works by setting the Don't Fragment (DF) flag bit in the IP headers of outgoing packets. Then, any device along the path whose MTU is smaller than the packet will drop it, and send back an Internet Control Message Protocol (ICMP) Fragmentation Needed (Type 3, Code 4) message containing its MTU, allowing the source host to reduce its Path MTU appropriately.  This process is repeated until the MTU is small enough to traverse the entire path without fragmentation.

Advantages of jumbo frames:

  • Greater efficiency
  • Lesser packet drop
  • Interfaces process fewer packets per second

Disadvantages of jumbo frames:

  • Slower per-packet processing since each packet is big (9000 byes)
  • Bigger packets on the wire may increase lag and latency
  • Packet corruption causes huge retransmits and congestion
CreatePlease to create content
Content for Community-Ad
August's Community Spotlight Awards
This widget could not be displayed.