cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
852
Views
0
Helpful
2
Replies

High CPU on 2811/IP Input, wrong switching path on GRE tunnel packets?

Hello,

recently I installed an 2811 15.1(4)M10. The primary function is to provide static IPv4-addresses to dynamic endpoints. I'm using a stock DMVPN config without crypto for that purpose. With approx 5MBit/s outgoing over the DMVPN-Tunnel-Interface, CPU is spending 98% on doing things and IP Input is the top most CPU hog. As you can see, only 34% CPU are spent on Interrupt-Switching packets.

show proc cpu sorted 5min
CPU utilization for five seconds: 98%/34%; one minute: 98%; five minutes: 97%
 PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process 
 138  1562991564   404043710       3868 55.71% 55.40% 55.63%   0 IP Input         

I tried to debug the cause but I'm running out of ideas.

show interface stats
FastEthernet0/0
          Switching path    Pkts In   Chars In   Pkts Out  Chars Out
               Processor   82854343 2835637802   41595604 3625150655
             Route cache  108673117 4258033954  115701696  306401071
                   Total  191527460 2798704460  157297300 3931551726
Tunnel6
          Switching path    Pkts In   Chars In   Pkts Out  Chars Out
               Processor   35423466 1297452269     164814   21452440
             Route cache   26199025 3188431356   58412283 2399557371
                   Total   61622491  190916329   58577097 2421009811

Fa0/0 is the main interface where all traffic runs in and out. As you can see, there are many packets input which get process switched.

Tunnel6 is the DMVPN-Hub-Interface. Please note the really high percentage of process switched Pkts In compared to Pkts Out.

show cef int

FastEthernet0/0 is up (if_number 3)
  Corresponding hwidb fast_if_number 3
  Corresponding hwidb firstsw->if_number 3
  Internet address is <cut>
  Secondary address <cut>
  ICMP redirects are always sent
  Per packet load-sharing is disabled
  IP unicast RPF check is disabled
  Input features: Stateful Inspection, Virtual Fragment Reassembly, Virtual Fragment Reassembly After IPSec Decryption, NAT Outside
  Output features: Post-routing NAT Outside, Stateful Inspection
  IP policy routing is disabled
  BGP based policy accounting on input is disabled
  BGP based policy accounting on output is disabled
  IPv6 CEF switching enabled
  Hardware idb is FastEthernet0/0
  Fast switching type 1, interface type 18
  IP CEF switching enabled
  IP CEF switching turbo vector
  IP prefix lookup IPv4 mtrie 8-8-8-8 optimized
  Input fast flags 0x400040, Output fast flags 0x10100
  ifindex 3(3)
  Slot  Slot unit 0 VC -1
  IP MTU 1500

Tunnel6 is up (if_number 8)
  Corresponding hwidb fast_if_number 8
  Corresponding hwidb firstsw->if_number 8
  Internet address is <cut>
  ICMP redirects are never sent
  Per packet load-sharing is disabled
  IP unicast RPF check is disabled
  IP policy routing is disabled
  BGP based policy accounting on input is disabled
  BGP based policy accounting on output is disabled
  Interface is marked as tunnel interface
  Hardware idb is Tunnel6
  Fast switching type 14, interface type 0
  IP CEF switching enabled
  IP CEF switching turbo vector
  IP Null turbo vector
  IP prefix lookup IPv4 mtrie 8-8-8-8 optimized
  Input fast flags 0x0, Output fast flags 0x0
  ifindex 8(8)
  Slot  Slot unit -1 VC -1
  IP MTU 1476

The only difference between these two is the fast flags, but I don't know if they are relevant.

Since the following stats are reset only for a reload: uptime is 22 weeks, 2 days, 11 hours, 19 minutes.

show int switching

FastEthernet0/0 
          Throttle count       5692
                   Drops         RP      26171         SP          0
             SPD Flushes       Fast          0        SSE          0
             SPD Aggress       Fast          0
            SPD Priority     Inputs   78081963      Drops          0

    Protocol  IP                  
          Switching path    Pkts In   Chars In   Pkts Out  Chars Out
                 Process 1679494616 4243398782  818616164 4214848478
            Cache misses          0          -          -          -
                    Fast 3414455563  435028419 3669552799 3141577463
               Auton/SSE          0          0          0          0

    Protocol  DEC MOP             
          Switching path    Pkts In   Chars In   Pkts Out  Chars Out
                 Process          0          0      22461    1729497
            Cache misses          0          -          -          -
                    Fast          0          0          0          0
               Auton/SSE          0          0          0          0

    Protocol  ARP                 
          Switching path    Pkts In   Chars In   Pkts Out  Chars Out
                 Process   62261029 3735661740     119167    7150020
            Cache misses          0          -          -          -
                    Fast          0          0          0          0
               Auton/SSE          0          0          0          0

    Protocol  CDP                 
          Switching path    Pkts In   Chars In   Pkts Out  Chars Out
                 Process     224812  106560888     250320  110391120
            Cache misses          0          -          -          -
                    Fast          0          0          0          0
               Auton/SSE          0          0          0          0

    Protocol  IPv6                
          Switching path    Pkts In   Chars In   Pkts Out  Chars Out
                 Process    7814309 1563957300   65623186 1184106846
            Cache misses          0          -          -          -
                    Fast   71908574 3189895435   30888012 2718155351
               Auton/SSE          0          0          0          0

    Protocol  Other               
          Switching path    Pkts In   Chars In   Pkts Out  Chars Out
                 Process      10899     653940    1350462   81027720
            Cache misses          0          -          -          -
                    Fast          0          0          0          0
               Auton/SSE          0          0          0          0

Tunnel6 Tunnel-Wolke 2
          Throttle count          0
                   Drops         RP    6251973         SP          0
             SPD Flushes       Fast          0        SSE          0
             SPD Aggress       Fast          0
            SPD Priority     Inputs          0      Drops          0

    Protocol  IP                  
          Switching path    Pkts In   Chars In   Pkts Out  Chars Out
                 Process  755542282  996645000    2921849  374627008
            Cache misses          0          -          -          -
                    Fast  582966433 3881995597 1430454894 3257835338
               Auton/SSE          0          0          0          0

    Protocol  Other               
          Switching path    Pkts In   Chars In   Pkts Out  Chars Out
                 Process          0          0    7019105  920205716
            Cache misses          0          -          -          -
                    Fast          0          0          0          0
               Auton/SSE          0          0          0          0

Again, there is a high rate of Pkts In which doesn't get Fast-Switched. Debugging on packet layer as recommended in Cisco-Documents for debugging high cpu load isn't an option since the CPU load is already very high and I had a very hard time to stop debugging after 10 minutes waiting for EXEC to get access to the CPU.

So, basic questions: Why are so many packets input getting process switched as opposed to fast switched, on a GRE-Tunnel Hub interface? (At least that's what I can dig out of the run time guts.) Checking on an 2901/15.5(2)T, I can see similar counts on the hub tunnel. On an 2951/Version 15.0(1)M9, fast switched packet counters are one or two orders of magnitudes higher on the tunnel interfaces than process switched. 

Thanks for any hint regarding this topic!

 

 

 

2 Replies 2

Peter Paluch
Cisco Employee
Cisco Employee

Hello,

I would personally suggest trying to find out whether it is the tunneled traffic that is causing the IP Input process to go high on CPU. I am not sure if you can actually shutdown the Tunnel interface for a couple of minutes (it will obviously cause a network outage) but if you could, observing the CPU load would be extremely helpful.

If the IP Input is related to the traffic passing through the tunnel then the most probable issue would be MTU issues and the need to fragment packets. According to the outputs you have posted, the Tunnel interface is already using the MTU of 1476 which is just right for plain GRE tunnels (20 extra bytes for a new IP header, 4 bytes for the GRE header, 20+4+1476=1500). However, if the packets passing through this tunnel are over 1476 bytes long then the router will need to engage into fragmenting them, hence the high IP Input load.

You have not posted the configuration of the tunnel but please make sure that you are using the ip tcp adjust-mss 1436 (1476-40) command on that interface - this will affect all TCP sessions that are carried by that tunnel to not originate segments larger than 1436 bytes in their bodies, which is just right for the tunnel's MTU (TCP header is 20 bytes, IP header is 20 bytes, 20+20+1436=1476).

This command will unfortunately be unable to affect UDP streams. If there is a possibility of lots of intensive UDP traffic being carried by the tunnel, it is possible that the UDP segments are oversized (say, video streams or NFS mounts). Unfortunately, a router cannot influence UDP streams. If it is determined that your high CPU is indeed caused by excessive need for IP fragmentation, and the oversized IP packets carry UDP segments, you will need to tweak the applications that are sending them (it that's possible at all).

Best regards,
Peter

Hello Peter,

thanks for your suggestions. I didn't try to shut down the tunnel itself but the other side - the tunnel of the peer who's experiencing/generating traffic. The load immediately drops to nearly zero, as I expected.

I also thought about MTU-stuff. Thanks for your maths about mtu and mss sizes. I always carefully calculate and set these values. I don't have proof that high CPU comes from fragmentation but we're running two 2821 as PPPoE endpoints with a sustained rate of up to 80MBit/s from upstream into the PPPoE-sessions. These boxes don't suffer from high CPU but they need to frag packets regularly. For that reason I don't believe in frag-processing as root cause.

I know about ip tcp adjust-mss and I try to omit this configuration in favour of PMTUD. Reason: I read somewhere that adding this parameter makes packets always going the process switching way. Unfortunately, I didn't save the source of this claim. :-( Anyway, adding this parameter and looking at the CPU usage history over the last few hours didn't change anything.

For your recommendations regarding UDP traffic: I configured ip flow top-talkers which shows that mostly tcp-traffic is flowing over the tunnel.

Please also see my comment about punt-adjancencies as reply to my initial post.

 

Review Cisco Networking for a $25 gift card