Diagnosing a 400Mbps cap on Network Speed With Cisco and Watchguard Devices

Ian Stephens · ‎12-21-2018

We are noticing that we max our WAN port out at 400 Mbps. We have a 1Gbps connection with our provider delivered over pure Ethernet (in a datacenter).

Here is an example of the max-out using crude MRTG:

We are directly connecting to our provider via Ethernet at 1Gbps. This comes in to our Cisco 2901 router and then we are then connecting directly to our Watchguard Firebox at 1Gbps Ethernet (in drop-in mode). All devices are reporting 1Gbps line speed with full duplex.

The Firebox then connects to our switch at 1Gbps. We are running a gigabit switch which connects directly to our servers (also at 1Gbps to each server).

We can't seem to achieve anything over 400Mbps through the setup. The Firebox X1250e we are running is rated at 1.5Gbps throughput for raw packet forwarding (which we are doing - no proxying or fixup is being performed on the data).

We have even fired up a command line Speedtest (Ookla) on one of the servers and it hits the 400Mbps cap.

I know people are going to say the Cisco 2901 is the issue but we are running full 1500 packets and even at 400Mbps over an extended period, this is an example of our CPU usage:

sh proc cpu
CPU utilization for five seconds: 18%/17%; one minute: 17%; five minutes: 17%

Also worth noting, we are not running any QoS on the Firebox - all QoS is disabled (the whole module unloaded).

The Cisco 2901 has CEF enabled.

We have the following configuration:

Does anybody know what may be causing this "cap"?

Also, we have been monitoring the interfaces:

5 minute output rate 399034000 bits/sec, 33444 packets/sec

This shows ~33,000 pps (33 kpps). We already know the Cisco 2901 can handle 330 kpps @ 64 bytes.

We would like some tips, advice and suggestions as to how we can diagnose this remotely (we can't easily go to the datacenter to perform tests).

Any help and advice are greatly appreciated; thank you in advance.

Ian Stephens · ‎12-22-2018

Also, just following on...

If the router is to blame, how can we confirm that the router is at max capacity? The CPU figures and PPS data don't seem to show max load.

Thank you for your help.

Georg Pauwen · ‎12-22-2018

Hello,

are we talking about a layer 2 or a layer 3 GigabitEthernet link on the 2901 ? Can you post the output of 'show interfaces GigabitEthernetx' from the router interfaces connected to the ISP and the WatchGuard ?

Ian Stephens · ‎12-22-2018

Thank you for your reply. Here are the outputs for both interfaces.

Connection to Watchguard Firebox

GigabitEthernet0/0 is up, line protocol is up 

  Hardware is CN Gigabit Ethernet, address is x.x.x (bia x.x.x)

  Description: connected to Watchguard

  Internet address is x.x.x.x/26

  MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec, 

     reliability 255/255, txload 1/255, rxload 80/255

  Encapsulation ARPA, loopback not set

  Keepalive set (10 sec)

  Full Duplex, 1Gbps, media type is RJ45

  output flow-control is XON, input flow-control is XON

  ARP type: ARPA, ARP Timeout 04:00:00

  Last input 00:00:18, output 00:00:00, output hang never

  Last clearing of "show interface" counters 15:45:22

  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 448

  Queueing strategy: fifo

  Output queue: 0/40 (size/max)

  5 minute input rate 317600000 bits/sec, 26722 packets/sec

  5 minute output rate 4504000 bits/sec, 9569 packets/sec

     1342335516 packets input, 1360008449 bytes, 448 no buffer

     Received 223 broadcasts (0 IP multicasts)

     0 runts, 0 giants, 0 throttles 

     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

     0 watchdog, 0 multicast, 8662 pause input

     510961103 packets output, 325652526 bytes, 0 underruns

     0 output errors, 0 collisions, 0 interface resets

     0 unknown protocol drops

     0 babbles, 0 late collision, 0 deferred

     0 lost carrier, 0 no carrier, 0 pause output

     0 output buffer failures, 0 output buffers swapped out

Connection to ISP

GigabitEthernet0/1 is up, line protocol is up 

  Hardware is CN Gigabit Ethernet, address is x.x.x (bia x.x.x)

  Description: connected to ISP

  Internet address is x.x.x.x/30

  MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec, 

     reliability 255/255, txload 80/255, rxload 1/255

  Encapsulation ARPA, loopback not set

  Keepalive set (10 sec)

  Full Duplex, 1Gbps, media type is RJ45

  output flow-control is unsupported, input flow-control is XON

  ARP type: ARPA, ARP Timeout 04:00:00

  Last input 00:00:00, output 00:00:00, output hang never

  Last clearing of "show interface" counters 15:45:22

  Input queue: 1/75/0/0 (size/max/drops/flushes); Total output drops: 0

  Queueing strategy: fifo

  Output queue: 0/40 (size/max)

  5 minute input rate 4902000 bits/sec, 9569 packets/sec

  5 minute output rate 317599000 bits/sec, 26723 packets/sec

     510902327 packets input, 2594822340 bytes, 0 no buffer

     Received 0 broadcasts (0 IP multicasts)

     0 runts, 0 giants, 0 throttles 

     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

     0 watchdog, 0 multicast, 0 pause input

     1342353598 packets output, 1354131150 bytes, 0 underruns

     0 output errors, 0 collisions, 0 interface resets

     0 unknown protocol drops

     0 babbles, 0 late collision, 0 deferred

     0 lost carrier, 0 no carrier, 0 pause output

     0 output buffer failures, 0 output buffers swapped out

Georg Pauwen · ‎12-22-2018

Hello,

layer 3 throughput for the 2901 is indeed rated at about 3Gig (I assume you have seen the same data as in the attached white paper).

This might sound too obvious, but I would check with the ISP if they actually provide the full 1G. If they say they do, and if you can afford the downtime, I would take out the Cisco and the WatchGuard to see if eliminating any of these two devices from the topology increases the speed...

Interfaces look clean by the way, no drops. I assume you have CEF enabled ? If possible, post the config of the 2901, we might be able to spot something...

https://community.cisco.com/t5/switching/throughput-of-2901-router/td-p/2174748?attachment-id=51462

Ian Stephens · ‎12-23-2018

Hi Georg,

Thank you for your response (and for also confirming we should be able to achieve greater throughput than 400Mbps with a 2901 @ 1500 bytes). This was the main reason for opening this thread - to get confirmation that the router wasn't maxed out.

I will open a support ticket with our provider now and confirm everything is correct on their end (no rate limiting or policing policies).

I can indeed disconnect everything for 5 minutes and plug a machine directly into the Ethernet cable provided by our ISP. I will wait until after the holidays and probably do it at a quiet period (2-3am). I will also wait to see what our provider says before I arrange this too. I will set up the network configuration on a laptop before I head up the datacenter so I can quickly unplug/test/reconnect everything.

Yes, interfaces look good.

Yes, CEF is enabled.

We are running a very simple config.

Thank you for your help.

Here is our simplified conf (only the important parts with certain elements removed with "x".)

hostname x.x.x.net
!
logging buffered 51200 warnings
!
no aaa new-model
ethernet lmi ce
!
ip name-server 8.8.8.8
ip name-server 8.8.4.4
!
ip cef
no ipv6 cef
!
multilink bundle-name authenticated
redundancy      
buffers tune automatic
!     
interface Embedded-Service-Engine0/0
 no ip address
 shutdown 
!         
interface GigabitEthernet0/0
 description connected to xxx
 ip address 195.xxx.xxx.xxx 255.255.255.192
 duplex auto
 speed auto
!         
interface GigabitEthernet0/1
 description connected to xxx
 ip address 195.xxx.xxx.xxx 255.255.255.252
 duplex auto
 speed auto
!         
ip forward-protocol nd
!         
no ip http server
ip http access-class 23
ip http authentication local
ip http secure-server
ip http timeout-policy idle 60 life 86400 requests 10000
!         
ip route 0.0.0.0 0.0.0.0 195.xxx.xxx.xxx
!
snmp-server community public RO 23
snmp-server community intsnmp RO 96
snmp-server location London, United Kingdom
snmp-server contact xxx xxxx, +44.xxxxxxxxxxx, xxx@xxx.com
access-list 23 permit 148.xxx.xxx.xxx
access-list 23 permit 195.xxx.xxx.xxx 0.0.0.63
access-list 96 permit xxx.xxx.xxx.xxx
access-list 96 permit xxx.xxx.xxx.xxx
access-list 96 permit xxx.xxx.xxx.xxx
access-list 96 permit xxx.xxx.xxx.xxx

Ian Stephens · ‎12-23-2018

For what it's worth:

#sh ver
Cisco IOS Software, C2900 Software (C2900-UNIVERSALK9-M), Version 15.5(3)M, RELEASE SOFTWARE (fc1)

paul driver · ‎12-24-2018

Hello

it may have nothing to do with any of your line rates but more than the bdp-rwin between your site and the end host you testing

what is the RTTyou are obtaining?

use the following calculation url to check the expected Rwin regards you BDP RTT

Bdp/rwin

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

Ian Stephens · ‎12-24-2018

Thank you for your reply, Paul.

It's not a single thread or host we are testing with.

We have two servers providing downloadable files over HTTPS which can have over 50-100 active connections at the same time (the files are large).

We notice the ~400Mbps limit when these servers are busy (Sunday eve etc).

But also, even with a single thread/host test such as CLI speedtest.net we notice the exact same limit.

Many thanks.

paul driver · ‎12-24-2018

Hello

@Ian Stephens wrote:

Thank you for your reply, Paul.

It's not a single thread or host we are testing with.

We have two servers providing downloadable files over HTTPS which can have over 50-100 active connections at the same time (the files are large).

We notice the ~400Mbps limit when these servers are busy (Sunday eve etc).

But also, even with a single thread/host test such as CLI speedtest.net we notice the exact same limit.

I would definitely check if the windows scailling is set up correctly on those servers to see if your connection improves becasue if it isn’t then what you are experiencing does seem to suggest its a possibility

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

Ian Stephens · ‎12-26-2018

Hi Paul,

I know our servers are auto-scaling:

admin@rackarray1:/$ sysctl net.ipv4.tcp_window_scaling

net.ipv4.tcp_window_scaling = 1

What else should I be looking at?

Thank you for your help.

Ian Stephens · ‎12-26-2018

We have modified the window sizing settings on the servers to the following:

net.core.rmem_max = 67108864 
net.core.wmem_max = 67108864 
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432

However, the ~400Mbps limit persists.