10-14-2012 11:57 PM - edited 08-29-2017 01:44 AM
This document covers ISR G2 Cisco 3945 and Cisco 3945e routers with on-board crypto engine.
Both devices are connected back to back via GigabitEthernet0/1.
Packet injector Agilent is connected via GigabitEthernet0/2 on both routers
The version that has been tested in this lab is 15.1(4)M5.
Traffic pattern that has been sent to the router is bursty [ busts of 1000 packets sent at 1gig/second].
This flow will go through features such as Complex [Parent/Child] QoS - GRE encapsulation - Encryption / Decryption.
Crypto engine command output may differ from a 1900 or 2900, still the same logic has to be applied on other platforms.
bsns-3945-2#sh proc cpu | e 0.00%
3945#sh proc cpu | e 0.00%
CPU utilization for five seconds: 64%/63%; one minute: 62%; five minutes: 72%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
16 1384592 494466 2800 0.31% 0.06% 0.02% 0 Environmental mo
69 4263712 528295 8070 0.07% 0.09% 0.07% 0 Per-Second Jobs
87 624592 1977539 315 0.15% 0.13% 0.11% 0 Netclock Backgro
105 327624 60431937 5 0.07% 0.09% 0.09% 0 SEC BATCH
109 155144 62150729 2 0.07% 0.11% 0.13% 0 Ethernet Msec Ti
209 359008 120354323 2 0.15% 0.13% 0.14% 0 HQF Shaper Backg
291 127524 24465492 5 0.07% 0.05% 0.06% 0 MFIB_mrib_write
295 2200 810 2716 0.15% 0.19% 0.08% 644 Virtual Exec
326 326172 11161929 29 0.07% 0.04% 0.06% 0 PIM Process
Here the CPU reports an average usage of 64% with 63% under interrupt [ aka packets routed by CEF ].
We will see later the router is in fact oversubscribed.
Key learning: CPU usage is measured as an average calculed on 5 seconds.
This is something to always remember since the real-time CPU utilization might reach 100% during few msecs and this will not be seen in any show commands.
Because of this average report, you should never take the CPU usage 'as it is', U will have to understand the traffic pattern in order to get the right picture.
Per design, Interrupts [ aka packet forwarding under CEF] have the highest priority from a scheduling standpoint.
Under load, there is a safeguard mechanism that gives to other IOS processes few msec of CPU time. This is what Interrupt throttling is used for.
3945-2#sh platform interrupt | section Interrupt Throttling
Interrupt Throttling:
Throttle Count = 31065 Timer Count = 31060
Netint(usec) = 20000 Netint Mask(usec) = 1000
Real Netint(usec) = 4000 Real Netint Mask(usec) = 200
Active = 0 Configured = 1
Longest IRQ(usec) = 8856
When 4000 usec [ 4 msec] of continuous traffic under interrupt has been reached, Net I/O interrupts will be throttled during 200 usec, giving some CPU time to processes.
If we take few consecutive outputs, we can see the throttle increasing
3945-2#terminal exec prompt timestamp
3945-2#show plat interrupt | i Time | Throttle Count
Time source is hardware calendar, 08:04:14.637 UTC Mon Oct 15 2012
Throttle Count = 37313 Timer Count = 37308
bsns-3945-2#show plat interrupt | i Time | Throttle Count
Time source is hardware calendar, 08:04:17.513 UTC Mon Oct 15 2012
Throttle Count = 37359 Timer Count = 37354
When we throttle [ aka ignore ] an interrupt, packets are queued on the interface and most likely dropped as overruns [ aka not polled quick enough from the interface]
bsns-3945-2#show plat error
Load for five secs: 64%/63%; one minute: 64%; five minutes: 64%
Time source is hardware calendar, 08:11:10.301 UTC Mon Oct 15 2012
Interface GigabitEthernet0/1:
----------------------------
Input Errors : 17529
Input Overrun : 17529
Interface GigabitEthernet0/2:
----------------------------
Input Errors : 489481
Input Overrun : 489481
Indeed,we can see traffic getting dropped on Gigabit Ethernet 0/1 due to the burstiness of the flow. We can crosscheck by taking the interface stats
3945-2#sh int gig0/1
Load for five secs: 64%/63%; one minute: 60%; five minutes: 60%
Time source is hardware calendar, 08:14:53.261 UTC Mon Oct 15 2012
GigabitEthernet0/1 is up, line protocol is up
Hardware is PQ3_TSEC, address is 0007.7dfb.6e01 (bia 0007.7dfb.6e01)
Description: Connected to bsns-2960-1
Internet address is 172.146.1.2/30
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 169/255, txload 4/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full Duplex, 1Gbps, media type is RJ45
output flow-control is unsupported, input flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:00, output 00:00:00, output hang never
Last clearing of "show interface" counters 00:03:45
Input queue: 0/75/54785/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: Class-based queueing
Output queue: 0/1000/0 (size/max total/drops)
30 second input rate 2612000 bits/sec, 2177 packets/sec
30 second output rate 17442000 bits/sec, 14535 packets/sec
463075 packets input, 69454186 bytes, 0 no buffer
Received 32 broadcasts (0 IP multicasts)
0 runts, 0 giants, 392 throttles
1082575 input errors, 0 CRC, 0 frame, 1082575 overrun, 0 ignored
0 watchdog, 12 multicast, 0 pause input
2407704 packets output, 361152652 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
8 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out
We can see the overruns increasing and that we are throttling this interface. [ We see similar outputs on gig0/2].
3945-2#sh crypto engine accelerator statistic | i Error
Errors:
Pad Error = 0
Data Error = 0
Packet Error = 0
Null IP Error = 0
Hardware Error = 0
Link Down Error = 0
Queue Full Error = 0
API Request Error = 0
Invalid Flow Error = 0
QOS Queue Full Error = 0
Packet too Big Error = 0
Too Many Particles Error = 0
Input Queue Full Error = 0
Output Queue Full Error = 0
Pre-batch Queue Full Error = 0
Post-batch Queue Full Error = 0
In this test scenario, crypto engine is still in healthy state since no drops are reported. Depending of the traffic pattern, it still possible to fill up the queue [ eg a burst of packets to decrypt]
3945-2#sh crypto engine accelerator ring pool | i used
The no entries currently being used = 996
On 3945, the number of entries used on the crypto engine encryption/decryption ring is 1024 [ queue lenght] - 996 [ remaining free queue items] =28
On an 3945e, the output from the show crypto engine accelator statistics gives a slightly different output
3945e-1#sh proc cpu | e 0.00%
CPU utilization for five seconds: 19%/17%; one minute: 18%; five minutes: 32%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
16 1392544 500615 2781 0.63% 0.12% 0.06% 0 Environmental mo
69 4264372 534582 7977 0.07% 0.10% 0.08% 0 Per-Second Jobs
87 626064 2002190 312 0.23% 0.19% 0.18% 0 Netclock Backgro
109 156864 62926488 2 0.07% 0.07% 0.07% 0 Ethernet Msec Ti
209 360772 121869657 2 0.07% 0.08% 0.09% 0 HQF Shaper Backg
291 127736 24772141 5 0.07% 0.03% 0.04% 0 MFIB_mrib_write
326 327060 11302145 28 0.07% 0.04% 0.05% 0 PIM Process
CPU usage looks way better. Still, since the traffic is bursty we need to check if there is any interrupt throttling
bsns-3945e-1#sh platform interrupt | section Throttl
Interrupt Throttling Statistics:
Throttle Count = 00002186 Timer Count = 00002186
Few interrupt have been throttled but it never reached a drop level [ packets were simply queued and then processed]
There is no show platform error on 3945e platform. Still we can gather the info from the traditionnal show interface command
3945e-1#sh interfaces | i Gigabit|over
GigabitEthernet0/0 is up, line protocol is up
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
GigabitEthernet0/1 is up, line protocol is up
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
GigabitEthernet0/2 is up, line protocol is up
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
Show interface does not reveal any issues.
945e-1#sh crypto engine accelerator statistic | section ipsec
Received pkts(ipsec) : 930037
Reinjected pkts(ipsec) : 930036
Dropped pkts(ipsec) : 0
Drop over capacity(ipsec) : 0
Drop no particle(ipsec) : 0
Drop no session(ipsec) : 0
Errors(ipsec) : 0
Pending pkts(ipsec) : 0
On the 3945e, no issues are seen. On this platform, the encryption/decryption queue most of the time empty.
3945e-1#sh crypto engine accelerator ring pool
Device: Onboard VPN
Location: Onboard: 0
The Crypto Packet IPSEC Queue Information
==========================================
The Queuesize is :2048
The no of entries currently being used : 0
The Read Index is :1809
The Write Index is :1809
Even if on 5 seconds average, the CPU usage is not at 90+%, the nature of the traffic can create a port speed burst that lasts few msec.
During that time, a router may have issues polling each interface [ which leads to overruns].
Always size your router accordingly to the nature of the traffic or shape the flow on the next hop device on the LAN side [ in order to queue packets instead of dropping them as overruns on the ipsec gateway.
Great contribution Olivier, thanks!!
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: