cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
5498
Views
10
Helpful
7
Replies

Nexus 93180YC and Jumbo Frames

mfarrenkopf
Level 1
Level 1

For better or worse, as a standard practice we've been enabling jumbo frames in our data centers on our layer 2 links.  We have some storage equipment that our server teams have set for jumbo frames in storage.

I found out last week that some of our layer 2 links (vPC links, access to distribution) have mismatched MTUs.  Distribution side (Nexus 7700s) is set to 9216.  Access side (93180YC), however, is set for 1500.  Peer link is also erroneously set to 1500.

I've set off Red Alert because of the MTU mismatches.  However, at first glance, I don't see any indication of communication problems.  Interfaces have incremented jumbo frames, but no interface indications that frames are being dropped.  Storage is designed to be layer 2 adjacent, so it's not going through routing.

Will layer 2-adjacent communication be allowed to exceed the interface MTU?  That doesn't seem right.  I know Nexus uses cut-through, so it can't be sure of the frame size prior to starting to switch it.

Regardless, the interfaces should match the expected MTU, and we should do the reconfigurations.  But maybe there's less impact than I would expect.

Output below from a representative interface.  Outputs were taken a few minutes apart, so there will be differences in the numbers.

Thank you!

 

DCA-102-8A# sh int e1/45
Ethernet1/45 is up
admin state is up, Dedicated Interface
Belongs to Po114
Hardware: 1000/10000/25000 Ethernet, address: f80b.cb1c.1374 (bia f80b.cb1c.1374)
Description: Storage-A0-102-08
MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, medium is broadcast
Port mode is trunk
full-duplex, 10 Gb/s, media type is 10G
Beacon is turned off
Auto-Negotiation is turned on, FEC mode is Auto
Input flow-control is off, output flow-control is off
Auto-mdix is turned off
Rate mode is dedicated
Switchport monitor is off
EtherType is 0x8100
EEE (efficient-ethernet) : n/a
Last link flapped 10week(s) 0day(s)
Last clearing of "show interface" counters 12w5d
5 interface resets
30 seconds input rate 32 bits/sec, 0 packets/sec
30 seconds output rate 856 bits/sec, 0 packets/sec
Load-Interval #2: 5 minute (300 seconds)
input rate 40 bps, 0 pps; output rate 512 bps, 0 pps
RX
1565774 unicast packets 257198 multicast packets 1461 broadcast packets
1824433 input packets 467118954 bytes
71972 jumbo packets 0 storm suppression bytes
0 runts 0 giants 0 CRC 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
TX
2235131585 unicast packets 6950081 multicast packets 1446 broadcast packets
2242083112 output packets 3347510474221 bytes
2176181384 jumbo packets
0 output error 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 0 output discard
0 Tx pause

 

DCA-102-8A# sh int e1/45 counters det all
Ethernet1/45
64 bit counters:
0. rxHCTotalPkts = 1824442
1. txHCTotalPks = 2242083359
2. rxHCUnicastPkts = 1565775
3. rxHCMulticastPkts = 257206
4. rxHCBroadcastPkts = 1461
5. rxHCOctets = 467120052
6. txHCUnicastPkts = 2235131587
7. txHCMulticastPkts = 6950326
8. txHCBroadcastPkts = 1446
9. txHCOctets = 3347510499727
10. rxTxHCPkts64Octets = 38561
11. rxTxHCpkts65to127Octets = 12648225
12. rxTxHCpkts128to255Octets = 16018069
13. rxTxHCpkts256to511Octets = 11711672
14. rxTxHCpkts512to1023Octets = 14512306
15. rxTxHCpkts1024to1518Octets = 12725612
16. rxTxHCpkts1519to1548Octets = 2176253356
17. rxHCTrunkFrames = 0
18. txHCTrunkFrames = 0
19. rxHCDropEvents = 0
19. InLayer3Unicast = 0
20. InLayer3UnicastOctets = 0
21. InLayer3Multicast = 0
22. InLayer3MulticastOctets = 0
23. OutLayer3Unicast = 0
24. OutLayer3UnicastOctets = 0
25. OutLayer3Multicast = 0
26. OutLayer3MulticastOctets = 0
27. InLayer3Routed = 0
28. InLayer3RoutedOctets = 0
29. OutLayer3Routed = 0
30. OutLayer3RoutedOctets = 0
31. InLayer3AverageOctets = 0
32. InLayer3AveragePackets = 0
33. OutLayer3AverageOctets = 0
34. OutLayer3AveragePackets = 0

All Port Counters:
0. Rx Packets: = 1824442
1. Rx Bytes: = 467120052
2. No Buffer Errors: = 0
3. Rx Broadcast Packets: = 1461
4. Rx Multicast Packets: = 257206
5. Rx Unicast Packets: = 1565775
6. Rx Jumbo Packets: = 71972
7. Runt Errors: = 0
8. Rx Storm Suppression: = 0
9. Input Errors: = 0
10. Input CRC Errors: = 0
11. ECC Errors: = 0
12. Overrun Errors: = 0
13. Ignored Errors: = 0
14. Watchdog Errors: = 0
15. tx broadcast packets: = 1446
16. tx multicast packets: = 6950326
17. tx unicast packets: = 2235131587
18. tx jumbo packets: = 2176181384
19. Rx Pause: = 0
20. Dribble Errors: = 0
21. If Down Drop Errors: = 0
22. Bad Etype Drop Errors: = 0
23. Bad Proto Drop Errors: = 0
24. tx packets: = 2242083359
25. tx bytes: = 3347510499727
26. Underrun Errors: = 0
27. Output Errors: = 0
28. Collision Errors: = 0
29. Resets: = 0
30. Babble Errors: = 0
31. Late Collision Errors: = 0
32. Deferred Errors: = 0
33. Lost Carrier Errors: = 0
34. No Carrier Errors: = 0
35. Tx Pause: = 0
36. Single Collision Errors: = 0
37. Multi-Collision Errors: = 0
38. Excess Collision Errors: = 0
39. Jabber Errors: = 0
40. Short Frame Errors: = 0
41. Input Discard Errors: = 0
42. Bad Encapsulation Errors: = 0
43. Output CRC Errors: = 0
44. Symbol Errors: = 0
45. Output Dropped Errors: = 0
46. SQETest = 0
47. Rx Packets from 0 to 64 bytes: = 38561
48. Rx Packets from 65 to 127 bytes: = 625440
49. Rx Packets from 128 to 255 bytes: = 521779
50. Rx Packets from 256 to 511 bytes: = 520942
51. Rx Packets from 512 to 1023 bytes: = 23892
52. Rx Packets from 1024 to 1518 bytes: = 21856
53. Rx Packets from 1519 to 1548 bytes: = 71972
54. Rx Trunk Packets: = 0
55. Tx Packets from 0 to 64 bytes: = 0
56. Tx Packets from 65 to 127 bytes: = 12022785
57. Tx Packets from 128 to 255 bytes: = 15496290
58. Tx Packets from 256 to 511 bytes: = 11190730
59. Tx Packets from 512 to 1023 bytes: = 14488414
60. Tx Packets from 1024 to 1518 bytes: = 12703756
61. Tx Packets from 1519 to 1548 bytes: = 2176181384
62. Tx Trunk Packets: = 0
63. Output BPDU Lost: = 0
64. Output COS0 Lost: = 0
65. Output COS1 Lost: = 0
66. Output COS2 Lost: = 0
67. Output COS3 Lost: = 0
68. Output COS4 Lost: = 0
69. Output COS5 Lost: = 0
70. Output COS6 Lost: = 0
71. Output COS7 Lost: = 0

7 Replies 7

f00z
Level 1
Level 1

I'm not sure about that particular switch as we don't have one, but we have other nexus 9k and 3k, and the jumbo frames is a part of the QoS policy (system qos , service policy).  

The actual MTU on the show interface is the layer3 MTU. Is has been confusing to me as you'd think it would be updated.

But jumbo frames work fine as long as the qos policy is set to it.  

If you use a layer 3 interface (with ip address on it) or SVI, then the MTU is the layer3 MTU and it does need to be changed to match.

 

If you do command like: 

show queuing interface eth1/1 | inc HW
HW MTU of Ethernet1/1 : 9216 bytes

 

you will see the actual MTU is 9216 (set by my qos policy)

 

But:

show int eth1/1 | i MTU
MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec

 

This still says 1500 mtu.. 

When we first got nexus 3k/9k many years ago this was really confusing.  I think the 7k is the same way but we don't use any of those. They really should update the interface to show ALL of the MTU information on show interface.  as long as you have > 6.2? version of the code it seems to work this way. the older code versions had a different method of doing it if I recall correctly, but that was a LONG time ago :D

 

 

Hi f00z,

Thank you for the reply!

I cut my teeth originally on the 5548s and 6001.  Those defined the QoS policy as you described.  A show int eth1/1, for example, shows an MTU of 1500 on a layer 2 port, whereas the vPC global consistency parameters showed an MTU of 9216.  But the key (in my mind) is, you don't have an "mtu" statement on a layer 2 interface at all on 5548/6001, so that's why I changed the QoS MTU.

On the 93180YC (and 7000, and 7700), you can define the MTU on a layer 2 interface.  So I haven't been changing the QoS MTU.  Maybe that's a mistake on our part.  Once the "mtu" statement appeared for layer 2 interfaces, I didn't think the QoS configuration was necessary on the 93180YC.  Maybe that's wrong?  Under the global vPC consistency parameters, it shows MTUs of 1500 for all the queues.

The 93180YC are positioned in a classic access/distribution model, layer 2 to the 7Ks.  The 93180YCs have no SVIs of their own.  (We have configured them for routing, just in case we had a need for it, but so far we haven't.)  My reading of your response tells me that ports should be able to switch jumbo frames between them at layer 2, even if the MTU is not configured as such, because the MTU in this context matters at layer 3.

So the questions I need answered:  1) do I still need to set the network QoS policy MTU?  2) will a layer 2 interface still do jumbo frames even if not set on the interface or in the QoS policy?  3) if I don't set the QoS policy to 9216, but I do set the interface MTU to 9216, what is the impact during periods of congestion?

Thank you for the reply!

Matt

Don't quote me on this but I believe the 9k differs from the 3k that:

 

a) you can set l2 mtu on interfaces (per-port so u can have more granular control) where you can't on the 3k

b) "When enabling jumbo MTU, the default network QoS policy can support jumbo frames. Under the network QoS policy, the MTU is used only for buffer carving when no-drop classes are configured." -- qos network policy is automatically adjusted when you set jumbo (system jumbomtu) , so all layer2 ports should automatically be set to whatever that is regardless

c) the 9k automatically sets the mtu to 9216 on the vpc peer link (or possibly to the highest mtu of any vpc interface, need to verify this by testing some day)

 

You should be able to see it with the show queueing interface eth1/1 or show system internal ethpm info interface eth1/1 

 

I will test this on a 9k I have in the lab next week just to verify.  

Hi @mfarrenkopf,

Nexus 9000 switches are using per-port MTU configuration and not the network QoS approach.

This means you change both the Layer2 and Layer3 MTU directly on the interface:

  • Layer2 interfaces
N9K-1(config-if)# show int e 1/1 | i i mtu
  MTU 1500 bytes, BW 25000000 Kbit, DLY 10 usec
N9K-1(config-if)# int e 1/1
N9K-1(config-if)# mtu 9216
N9K-1(config-if)# show int e 1/1 | i i mtu
  MTU 9216 bytes, BW 25000000 Kbit, DLY 10 usec
  • Layer3 interfaces:
N9K-1(config-if)# int vlan 10
N9K-1(config-if)# mtu ?
  <68-9216>  MTU size in bytes

N9K-1(config-if)# mtu 9216
N9K-1(config-if)# show int vlan 10 | i i mtu
  MTU 9216 bytes, BW 1000000 Kbit, DLY 10 usec,
N9K-1(config-if)# mtu 1500 
N9K-1(config-if)# show int vlan 10 | i i mtu
  MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,

 

On Neuxs 9000 switches, there is also the concept of system jumbomtu. This configuration will modify the maximum MTU size configurable on interfaces:

N9K-1(config)# system jumbomtu 5000
N9K-1(config)# int e 1/1
N9K-1(config-if)# mtu ?
  <1500-5000>  Enter MTU

DC-Team-N9K-1(config-if)# mtu 9216
                              ^
% Invalid number, range is (1500:5000) at '^' marker.

Here is a link on how to configure MTU on all Nexus platforms: https://www.cisco.com/c/en/us/support/docs/switches/nexus-9000-series-switches/118994-config-nexus-00.html#anc10 

 

Best regards,

Sergiu

 

 

It's a little more complex than that. Like some of the 40g /100g ports will ignore MTU regardless of what you set it at, it defaults to 16383 or some number around there and you can't change it even setting it on the port (like the original 93xx higig ports did this).

I bet the 93180 also has ports that ignore MTU and that is why you are seeing them still forward jumbo packets.  

Anyways, it's always best to set the mtu everywhere statically so there's less chance of a bug happening.  At least so far for me, on the 9k setting mtu has been seamless, as opposed to old IOS devices where they bounce the ports.

Seems like every time we get a new model of the 9k or 3k there's different configuration and/or different results from same config. 

Hello @f00z 


@f00z wrote:

It's a little more complex than that. Like some of the 40g /100g ports will ignore MTU regardless of what you set it at, it defaults to 16383 or some number around there and you can't change it even setting it on the port (like the original 93xx higig ports did this).


To avoid creating confusion to whomever is reading this thread, allow me to make a remark:

what you mentioned here is only accurate for first generation of Nexus switches, where the hw architecture was composed of 2 different ASICs: NFE (Merchant silicon containing 10G ports) and ALE/ALE-2 (Cisco ASIC containing 40G ports; there is also a GEM having 100G ports, but that is a special one). These two ASICs were interconnected with internal ports called HiGiG. All packets over HiGig ports were having additional internal headers (for forwarding purposes). so to accommodate these internal headers, the MTU limit (and MTU changes respectively) was disabled.

Internal structure of first gen switches:

ale.jpg

 

Starting with second generation, where we have only one ASIC type (Cisco ASIC), where all ports are contained by this single ASIC, distributed in slices, the MTU is fully configurable on all ports, including 40G/100G.

Architecture of all newer generations of Nexus:

slice.png

 

 

In the context of this thread, on Nexus 93190YC, the MTU on 40G/100G ports are configurable on port-level.

 

Cheers,

Sergiu