cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4440
Views
15
Helpful
1
Comments
olpeleri
Cisco Employee
Cisco Employee

 

 

Systems under test

 

This document covers ISR G2 Cisco 3945 and Cisco 3945e routers with on-board crypto engine.

Both devices are connected back to back via GigabitEthernet0/1.

Packet injector Agilent is connected via GigabitEthernet0/2 on both routers

 

The version that has been tested in this lab is 15.1(4)M5.

 

Traffic pattern that has been sent to the router is bursty [ busts of 1000 packets sent at 1gig/second].

This flow will go through features such as Complex [Parent/Child] QoS - GRE encapsulation - Encryption / Decryption.

 

Crypto engine command output may differ from a 1900 or 2900, still the same logic has to be applied on other platforms.

 

1. 3945 - what to check

 

  • show proc cpu

 

 

bsns-3945-2#sh proc cpu | e 0.00%

3945#sh proc cpu | e 0.00%

CPU utilization for five seconds: 64%/63%; one minute: 62%; five minutes: 72%

PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process

  16     1384592      494466       2800  0.31%  0.06%  0.02%   0 Environmental mo

  69     4263712      528295       8070  0.07%  0.09%  0.07%   0 Per-Second Jobs 

  87      624592     1977539        315  0.15%  0.13%  0.11%   0 Netclock Backgro

105      327624    60431937          5  0.07%  0.09%  0.09%   0 SEC BATCH       

109      155144    62150729          2  0.07%  0.11%  0.13%   0 Ethernet Msec Ti

209      359008   120354323          2  0.15%  0.13%  0.14%   0 HQF Shaper Backg

291      127524    24465492          5  0.07%  0.05%  0.06%   0 MFIB_mrib_write 

295        2200         810       2716  0.15%  0.19%  0.08% 644 Virtual Exec    

326      326172    11161929         29  0.07%  0.04%  0.06%   0 PIM Process 

 

Here the CPU reports an average usage of 64% with 63% under interrupt [ aka packets routed by CEF ].

 

We will see later the router is in fact oversubscribed.

 

Key learning: CPU usage is measured as an average calculed on 5 seconds.

This is something to always remember since the real-time CPU utilization might reach 100% during few msecs and this will not be seen in any show commands.

 

Because of this average report, you should never take the CPU usage 'as it is', U will have to understand the traffic pattern in order to  get the right picture.

 

  • show platform interrupt

 

 

Per design, Interrupts [ aka packet forwarding under CEF]  have the highest priority from a scheduling standpoint.

 

Under load, there is a safeguard mechanism that gives to other IOS processes few msec of CPU time. This is what Interrupt throttling is used for.

 

3945-2#sh platform interrupt  | section Interrupt Throttling

Interrupt Throttling:

  Throttle Count    =    31065   Timer Count            =    31060

  Netint(usec)      =    20000   Netint Mask(usec)      =     1000

  Real Netint(usec) =     4000   Real Netint Mask(usec) =      200

  Active            =        0   Configured             =        1

  Longest IRQ(usec) =     8856

When 4000 usec [ 4 msec] of continuous traffic under interrupt has been reached, Net I/O interrupts will be throttled during 200 usec, giving some CPU time to processes.

 

If we take few consecutive outputs, we can see the throttle increasing

 

 

3945-2#terminal exec prompt timestamp

3945-2#show plat interrupt | i Time | Throttle Count

Time source is hardware calendar, 08:04:14.637 UTC Mon Oct 15 2012

  Throttle Count    =    37313   Timer Count            =    37308

bsns-3945-2#show plat interrupt | i Time | Throttle Count

Time source is hardware calendar, 08:04:17.513 UTC Mon Oct 15 2012

  Throttle Count    =    37359   Timer Count            =    37354

When we throttle [ aka ignore ] an interrupt, packets are queued on the interface and most likely dropped as overruns [ aka not polled quick enough from the interface]

 

  • show platform error

 

bsns-3945-2#show plat error

Load for five secs: 64%/63%; one minute: 64%; five minutes: 64%

Time source is hardware calendar, 08:11:10.301 UTC Mon Oct 15 2012

 

Interface GigabitEthernet0/1:

----------------------------

Input Errors            : 17529

Input Overrun           : 17529

 

Interface GigabitEthernet0/2:

----------------------------

Input Errors            : 489481

Input Overrun           : 489481



 

Indeed,we can see traffic getting dropped on Gigabit Ethernet 0/1 due to the burstiness of the flow. We can crosscheck by taking the interface stats

 

 

3945-2#sh int gig0/1

Load for five secs: 64%/63%; one minute: 60%; five minutes: 60%

Time source is hardware calendar, 08:14:53.261 UTC Mon Oct 15 2012

 

GigabitEthernet0/1 is up, line protocol is up

  Hardware is PQ3_TSEC, address is 0007.7dfb.6e01 (bia 0007.7dfb.6e01)

  Description: Connected to bsns-2960-1

  Internet address is 172.146.1.2/30

  MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,

     reliability 169/255, txload 4/255, rxload 1/255

  Encapsulation ARPA, loopback not set

  Keepalive set (10 sec)

  Full Duplex, 1Gbps, media type is RJ45

  output flow-control is unsupported, input flow-control is unsupported

  ARP type: ARPA, ARP Timeout 04:00:00

  Last input 00:00:00, output 00:00:00, output hang never

  Last clearing of "show interface" counters 00:03:45

  Input queue: 0/75/54785/0 (size/max/drops/flushes); Total output drops: 0

  Queueing strategy: Class-based queueing

  Output queue: 0/1000/0 (size/max total/drops)

  30 second input rate 2612000 bits/sec, 2177 packets/sec

  30 second output rate 17442000 bits/sec, 14535 packets/sec

     463075 packets input, 69454186 bytes, 0 no buffer

     Received 32 broadcasts (0 IP multicasts)

     0 runts, 0 giants, 392 throttles

     1082575 input errors, 0 CRC, 0 frame, 1082575 overrun, 0 ignored

     0 watchdog, 12 multicast, 0 pause input

     2407704 packets output, 361152652 bytes, 0 underruns

     0 output errors, 0 collisions, 0 interface resets

     8 unknown protocol drops

     0 babbles, 0 late collision, 0 deferred

     0 lost carrier, 0 no carrier, 0 pause output

     0 output buffer failures, 0 output buffers swapped out

 

We can see the overruns increasing and that we are throttling this interface. [ We see similar outputs on gig0/2].

  • Crypto engine statistics

3945-2#sh crypto engine accelerator statistic | i Error

Errors:

Pad Error                    = 0

Data Error                   = 0

Packet Error                 = 0

Null IP Error                = 0

Hardware Error               = 0

Link Down Error              = 0

Queue Full Error             = 0

API Request Error            = 0

Invalid Flow Error           = 0

QOS Queue Full Error         = 0

Packet too Big Error         = 0

Too Many Particles Error     = 0

Input Queue Full Error       = 0

Output Queue Full Error      = 0

Pre-batch Queue Full Error   = 0

Post-batch Queue Full Error  = 0

 

In this test scenario, crypto engine is still in healthy state since no drops are reported. Depending of the traffic pattern, it still possible to fill up the queue [ eg a burst of packets to decrypt]

 

3945-2#sh crypto engine accelerator ring pool | i used

The no entries currently being used   = 996

 

On 3945, the number of entries used on the crypto engine encryption/decryption ring is 1024 [ queue lenght] - 996 [ remaining free queue items] =28

 

2. What to check on the 3945e

 

On  an 3945e, the output from the show crypto engine accelator statistics gives a slightly different output

 

 

  • show proc cpu

 

 

3945e-1#sh proc cpu | e 0.00%

CPU utilization for five seconds: 19%/17%; one minute: 18%; five minutes: 32%

PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process

  16     1392544      500615       2781  0.63%  0.12%  0.06%   0 Environmental mo

  69     4264372      534582       7977  0.07%  0.10%  0.08%   0 Per-Second Jobs 

  87      626064     2002190        312  0.23%  0.19%  0.18%   0 Netclock Backgro

109      156864    62926488          2  0.07%  0.07%  0.07%   0 Ethernet Msec Ti

209      360772   121869657          2  0.07%  0.08%  0.09%   0 HQF Shaper Backg

291      127736    24772141          5  0.07%  0.03%  0.04%   0 MFIB_mrib_write 

326      327060    11302145         28  0.07%  0.04%  0.05%   0 PIM Process     

 

CPU usage looks way better. Still, since the traffic is bursty we need to check if there is any interrupt throttling

 

 

  • show platform interrupt

 

bsns-3945e-1#sh platform interrupt  | section Throttl                         

Interrupt Throttling Statistics:

Throttle Count   = 00002186   Timer Count = 00002186

Few interrupt have been throttled but it never reached a drop level [ packets were simply queued and then processed]

There is no show platform error on 3945e platform. Still we can gather the info from the traditionnal show interface command

 

  • show interface

 

3945e-1#sh interfaces  | i Gigabit|over

GigabitEthernet0/0 is up, line protocol is up

     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

GigabitEthernet0/1 is up, line protocol is up

     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

GigabitEthernet0/2 is up, line protocol is up

     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

Show interface does not reveal any issues.

 

 

    • Crypto engine statistics

 

945e-1#sh crypto engine accelerator statistic  | section ipsec

 

        Received pkts(ipsec)          : 930037

        Reinjected pkts(ipsec)        : 930036

        Dropped pkts(ipsec)           : 0

        Drop over capacity(ipsec)     : 0

        Drop no particle(ipsec)       : 0

        Drop no session(ipsec)        : 0

        Errors(ipsec)                 : 0

        Pending pkts(ipsec)           : 0

 

On the 3945e, no issues are seen. On this platform, the encryption/decryption queue most of the time empty.

 

 

3945e-1#sh crypto engine accelerator ring pool

 

 

Device:   Onboard VPN

Location: Onboard: 0

The Crypto Packet IPSEC Queue Information

==========================================

The Queuesize is                          :2048

The no of entries currently being used    : 0

The Read Index is                         :1809

The Write Index is                        :1809



Conclusion:

 

Even if on 5 seconds average, the CPU usage is not at 90+%, the nature of the traffic can create a port speed burst that lasts few msec.

 

During that time, a router may have issues polling each interface [ which leads to overruns].

 

Always size your router accordingly to the nature of the traffic or shape the flow on the next hop device on the LAN side [ in order to queue packets instead of dropping them as overruns on the ipsec gateway.

Comments

Great contribution Olivier, thanks!!

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: