This is a blog about configuring jumbo frames in OpenStack in order to scale neutron and nova in a production datacenter using Cisco's UCS servers, UCS Fabric Interconnects and Nexus 9000.
In computer networking, MTU (Maximum Transmission Unit) is the size (in bytes) of the largest packet that can be transferred by an interface without IP fragmentation.
MTU is:
- Fixed before transmitting data (Ethernet)
- Negotiated during handshake/connect time (point-to-point serial links)
- Dynamically determined on-the-fly while transmitting data
The default MTU is 1500 bytes. Jumbo MTU is 9000 bytes.
Here is an IP packet.
IP fragmentation is an process that breaks datagrams into smaller pieces (fragments), so that packets may be formed that can pass through a link with a smaller maximum transmission unit (MTU) than the original datagram size. The fragments are reassembled by the receiving host. IP fragmentation adds additional processing overhead on the interface.
IP fragmentation is controlled by the DF (Don't Fragment) bit.
IP Fragmentation (DF bit in IP header):
Here are the neutron configurations needed to enable jumbo MTU (9000 bytes) in OpenStack.
In neutron.conf:
[DEFAULT]
global_physnet_mtu = 9000
advertise_mtu = true
In openvswitch_agent.ini:
[ovs]
bridge_mappings = provider1:eth1,provider2:eth2,provider3:eth3
In ml2_conf.ini:
[ml2]
physical_network_mtus = provider2:4000,provider3:1500
path_mtu = 9000
Enable the DHCP MTU option (26) in /etc/neutron/dnsmasq-neutron.conf:
dhcp-option-force=26,1454
Restart dnsmasq (DHCP server) on all network nodes.
The MTU values only apply to new network resources. network_device_mtu in nova.conf is deprecated in OpenStack Juno.
Use case to test jumbo frames in OpenStack:
- Boot a nova instance
- Create a cinder volume using ceph backend
- Attach volume to instance and mount it
- SSH into instance and write lots of data in this volume
- Measure traffic drop on the OpenStack nodes and ceph nodes
Refer How to attach cinder/ceph XFS volume to a nova instance in OpenStack horizon for more information about the above steps in the use case.
Below are the steps to configure jumbo frames on the UCS Fabric Interconnects in Cisco's UCSM GUI:
Test jumbo MTU on the OpenStack nodes (controller and compute nodes):
# ip a | grep mtu
2: mx: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000
3: t: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000
4: p: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000
7: br-inst: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN
8: br-prov: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN
9: br-int: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN
10: phy-br-inst@int-br-inst: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000
11: int-br-inst@phy-br-inst: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000
12: phy-br-prov@int-br-prov: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000
13: int-br-prov@phy-br-prov: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000
Send jumbo frames with 8972 bytes as payload using ping from OpenStack nodes. These frames will not be fragmented due to -M do.
# ping -M do -s 8972 10.7.8.2
PING 10.7.8.2 (10.7.8.2) 8972(9000) bytes of data.
8980 bytes from 10.7.8.2: icmp_seq=1 ttl=64 time=0.118 ms
8980 bytes from 10.7.8.2: icmp_seq=2 ttl=64 time=0.066 ms
--- 10.7.8.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 0.062/0.082/0.118/0.022 ms
-M above is used to select Path MTU Discovery strategy. -M above can be:
- do (prohibit fragmentation, even local one)
- want (do PMTU discovery, fragment locally when packet size is large)
- dont (do not set DF flag)
Test jumbo frames on Cisco's UCS Fabric Interconnects:
UCS-FI-A# connect nxos
UCS-FI-A(nxos)# show queuing interface | grep MTU
q-size: 360640, HW MTU: 9216 (9216 configured)
Send jumbo frames with 9000 bytes as payload using ping from the UCS Fabric Interconnect.
UCS-FI-A(local-mgmt)# ping 10.23.223.20 count 5 packet-size 9000
PING 10.23.223.20 (10.23.223.20) from 10.23.223.45 : 9000(9028) bytes of data.
9008 bytes from 10.23.223.20: icmp_seq=1 ttl=255 time=0.741 ms
9008 bytes from 10.23.223.20: icmp_seq=2 ttl=255 time=0.796 ms
9008 bytes from 10.23.223.20: icmp_seq=3 ttl=255 time=0.740 ms
9008 bytes from 10.23.223.20: icmp_seq=4 ttl=255 time=0.775 ms
9008 bytes from 10.23.223.20: icmp_seq=5 ttl=255 time=0.814 ms
--- 10.23.223.20 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4033ms
rtt min/avg/max/mdev = 0.740/0.773/0.814/0.034 ms
Configure and test jumbo frames on Cisco's Nexus 9000:
N9k# configure terminal
N9k(config)# interface Ethernet1/3
N9k(config-if)# mtu 9216
N9k(config-if)# end
N9k# show running-config interface ethernet 1/3
interface Ethernet1/3
mtu 9216
N9k# show interface ethernet 1/3
Ethernet1/3 is up
MTU 9216 bytes, BW 10000000 Kbit, DLY 10 usec
Jumbo MTU can also be configured for port-channels.
Send jumbo frames with 9000 bytes as payload using ping from Nexus 9000.
N9k# ping 10.23.223.21 vrf management packet-size 9000 count 5
PING 10.23.223.21 (10.23.223.21): 9000 data bytes
9008 bytes from 10.23.223.21: icmp_seq=0 ttl=254 time=1.384 ms
9008 bytes from 10.23.223.21: icmp_seq=1 ttl=254 time=0.993 ms
9008 bytes from 10.23.223.21: icmp_seq=2 ttl=254 time=0.919 ms
9008 bytes from 10.23.223.21: icmp_seq=3 ttl=254 time=0.927 ms
9008 bytes from 10.23.223.21: icmp_seq=4 ttl=254 time=1.002 ms
--- 10.23.223.21 ping statistics ---
5 packets transmitted, 5 packets received, 0.00% packet loss
round-trip min/avg/max = 0.919/1.044/1.384 ms
Jumbo packet (with MTU 9000 bytes) captured in Wireshark packet sniffer:
Jumbo frames in OpenStack with neutron SR-IOV ports:
I observed that nova VMs attached to a neutron SR-IOV port will also pass jumbo MTU packets. Below is how jumbo MTU is seen on the interface of a nova VM attached to a neutron SR-IOV port.
[cloud-user@sr-iov-vm ~]$ ip a | grep 9000
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP qlen 1000
Path MTU Discovery:
Path MTU Discovery (PMTUD) is a technique to determining the MTU of the network path between source and destination. PMTUD works by setting the Don't Fragment (DF) flag bit in the IP headers of outgoing packets. Then, any device along the path whose MTU is smaller than the packet will drop it, and send back an Internet Control Message Protocol (ICMP) Fragmentation Needed (Type 3, Code 4) message containing its MTU, allowing the source host to reduce its Path MTU appropriately. This process is repeated until the MTU is small enough to traverse the entire path without fragmentation.
Advantages of jumbo frames:
- Greater efficiency
- Lesser packet drop
- Interfaces process fewer packets per second
Disadvantages of jumbo frames:
- Slower per-packet processing since each packet is big (9000 byes)
- Bigger packets on the wire may increase lag and latency
- Packet corruption causes huge retransmits and congestion