cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
22414
Views
15
Helpful
8
Replies

High CPU utilization on ISR4431/K9 freezing

jbaros
Level 1
Level 1

Hello Team,

 

I am experiencing the service instability on my network with ISR4431/K9. From monitoring tool I see the CPU utilization goes to 99perc, my logs disappear, then BGP goes down, then UP. Router is not rebooting. This is happening also during night, when traffic is on low level. I was looking for solution, still I dont have any. Please help me to understand this issue, if you can. I am wondering why the CPU is so over-utilized, without reason.  See below the logs:

Log Buffer (8192 bytes):
 neighbor 10.239.136.253 UP on interface GigabitEthernet0/0/1.11
May 15 02:54:07.182: %PIM-5-DRCHG: DR change from neighbor 10.239.136.252 to 10.239.136.253 on interface GigabitEthernet0/0/1.11
May 15 02:57:29.361: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:002 TS:00004614114896131963 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 11
May 15 02:57:38.524: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 90% exceeds the setting threshold(80%).

May 15 02:58:00.108: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:001 TS:00004614147790613622 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 02:58:13.522: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 4% recovered.

May 15 03:01:26.623: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:001 TS:00004614352155813192 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 11
May 15 03:01:38.534: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 99% exceeds the setting threshold(80%).

May 15 03:01:56.334: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:003 TS:00004614384013810070 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 03:02:28.526: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 47% recovered.

May 15 03:05:23.486: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:003 TS:00004614589016807413 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 03:05:33.551: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 99% exceeds the setting threshold(80%).

May 15 03:05:48.528: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 2% recovered.

May 15 03:10:17.051: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:001 TS:00004614883652583807 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 11
May 15 03:10:28.541: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 99% exceeds the setting threshold(80%).

May 15 03:10:49.199: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:002 TS:00004614913652583024 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 03:11:03.533: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 3% recovered.

May 15 04:54:56.535: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:001 TS:00004621162002817197 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 04:54:57.078: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:003 TS:00004621161472208466 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 04:55:08.620: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 99% exceeds the setting threshold(80%).

May 15 04:55:23.922: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:002 TS:00004621191536975587 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 04:55:43.599: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 17% recovered.

May 15 04:57:41.474: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:001 TS:00004621328013493533 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 04:57:53.604: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 93% exceeds the setting threshold(80%).

May 15 04:58:13.622: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:001 TS:00004621358013498223 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 04:58:40.401: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:002 TS:00004621388013500932 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 04:58:58.602: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 19% recovered.

May 15 04:59:03.596: %PIM-5-NBRCHG: neighbor 10.239.247.218 DOWN on interface GigabitEthernet0/0/2 DR
May 15 04:59:03.596: %PIM-5-DRCHG: DR change from neighbor 10.239.247.218 to 10.239.247.217 on interface GigabitEthernet0/0/2
May 15 04:59:16.306: %PIM-5-NBRCHG: neighbor 10.239.247.218 UP on interface GigabitEthernet0/0/2
May 15 04:59:16.307: %PIM-5-DRCHG: DR change from neighbor 10.239.247.217 to 10.239.247.218 on interface GigabitEthernet0/0/2
May 15 05:03:14.556: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:002 TS:00004621658944841809 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 11
May 15 05:03:23.616: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 99% exceeds the setting threshold(80%).

May 15 05:03:41.335: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:001 TS:00004621688944843027 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 05:03:58.607: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 4% recovered.

May 15 05:06:49.889: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:001 TS:00004621874275435011 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 11
May 15 05:06:58.609: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 99% exceeds the setting threshold(80%).

May 15 05:07:16.668: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:001 TS:00004621904275434168 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 05:07:48.816: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:002 TS:00004621934275441919 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 05:07:58.609: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 25% recovered.

May 15 05:08:10.097: %PIM-5-NBRCHG: neighbor 10.239.136.253 DOWN on interface GigabitEthernet0/0/1.11 DR
May 15 05:08:10.097: %PIM-5-DRCHG: DR change from neighbor 10.239.136.253 to 10.239.136.252 on interface GigabitEthernet0/0/1.11
May 15 05:08:22.558: %PIM-5-NBRCHG: neighbor 10.239.136.253 UP on interface GigabitEthernet0/0/1.11
May 15 05:08:22.559: %PIM-5-DRCHG: DR change from neighbor 10.239.136.252 to 10.239.136.253 on interface GigabitEthernet0/0/1.11
May 15 05:11:51.895: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:001 TS:00004622179500385182 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 05:11:58.635: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 95% exceeds the setting threshold(80%).

May 15 05:12:22.969: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:001 TS:00004622209500388706 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 05:12:23.987: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:002 TS:00004622209444225532 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 05:12:43.612: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 30% recovered.

May 15 05:15:56.123: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:001 TS:00004622422652163284 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 11
May 15 05:16:03.643: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 99% exceeds the setting threshold(80%).

May 15 05:16:28.271: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:002 TS:00004622452652166506 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 05:16:55.050: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:001 TS:00004622482652167941 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 05:17:03.618: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 3% recovered.

May 15 05:17:12.596: %PIM-5-NBRCHG: neighbor 10.239.247.218 DOWN on interface GigabitEthernet0/0/2 DR
May 15 05:17:12.596: %PIM-5-DRCHG: DR change from neighbor 10.239.247.218 to 10.239.247.217 on interface GigabitEthernet0/0/2
May 15 05:17:25.111: %PIM-5-NBRCHG: neighbor 10.239.247.218 UP on interface GigabitEthernet0/0/2
May 15 05:17:25.112: %PIM-5-DRCHG: DR change from neighbor 10.239.247.217 to 10.239.247.218 on interface GigabitEthernet0/0/2
May 15 05:20:37.909: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:001 TS:00004622704434775203 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 11
May 15 05:20:37.913: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:001 TS:00004622703365597544 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 11
May 15 05:20:43.643: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 86% exceeds the setting threshold(80%).

May 15 05:20:58.620: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 2% recovered.

May 15 05:25:18.883: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:002 TS:00004622984332090637 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 05:25:28.642: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 99% exceeds the setting threshold(80%).

May 15 05:25:49.301: %IOSXE-5-PLATFORM:cpp_cp: QFP:0.0 Thread:003 TS:00004623016898073908 %PUNT_INJECT-5-DROP_PUNT_CAUSE: punt cause policer drop packet casue 55
May 15 05:26:13.623: MCPRP-QFP-ALERT: Slot: 0, QFP:0, Load 59% recovered.

Please help me find solution. Is this the bug?

 

Thanks a lot

Jozef

1 Accepted Solution

Accepted Solutions

Hello Giuseppe,

 

Thanks a lot for your help in this matter.

This is solved now. In my topology there was spanning tree issue, optic cables were faulty, with very low Rx power. So some packets were lost, and this caused spanning tree topology re-calculating. Then router was too busy with "punt" packets and this caused high cpu utilization, and freezing. However after solving cable issues, and spanning tree issues, CPU utilization is back normal.

 

Thanks and have good day

Jozef

View solution in original post

8 Replies 8

Giuseppe Larosa
Hall of Fame
Hall of Fame

Hello Jozef,

it looks like there are a lot of packets that cannot be processed by CEF in hardware and so they are punted = sent to main processor.

see

https://www.cisco.com/c/en/us/support/docs/content-networking/adaptive-session-redundancy-asr/211428-ASR1000-Punt-Policer-Logging-and-Monitor.html

 

Use the following command to understand the cause for dropping last parameter is the cause number in the log message

show platform hardware qfp active infrastucture punt config cause 11

 

 

You can probably takes benefits on enabling limits of punt packets sent to CPU on each interface using the following commands taken from first document. The idea is to drop on interface if the rate of punted packets exceeds a rate in pps

Router(config)#platform punt-intf rate < packet per second>

Router(config)#interface gigabitEthernet 0/0/0
Router(config-if)#punt-control enable <packet per second>                               

This configuration enables punt-policing monitoring per interface. For example, if you configure punt-control rate as 1000 globally as well as on a paricular interface, the device will keep track of the punt drop for this particular interface for the time 30 seconds. After 30 seconds of time interval, the router shows a log like this to alert the admin that there has been a punt voilation event.

 

see the following document for further troubleshooting

 

https://www.cisco.com/c/en/us/support/docs/content-networking/adaptive-session-redundancy-asr/211428-ASR1000-Punt-Policer-Logging-and-Monitor.html

 

 

Also you can find useful the following document to troubleshoot punt drops on ASR1000 (may be too specific)

 

https://www.cisco.com/c/en/us/support/docs/routers/asr-1000-series-aggregation-services-routers/110531-asr-packet-drop.html

 

I would enable the punt policer per interface on each interface to see if the behaviour changes.

There is also a global per platform command.

 

Edit:

you asks if it can be a SW bug.

I have seen some SW bugs with a similar description but different cause reason numbers 27 (related to L2TP) or 60 (related to SIP memory usage in ASR 1000)

I see from your logs the most recurring reasons are 11 and 55, find out what they mean with the first show command I proposed above. Then if you enable per inteface punt policing you can minimize the effects, if  it is not a SW bug.

 

Hope to help

Giuseppe

 

 

Hello Giuseppe,

 

Thanks a lot for your help in this matter.

This is solved now. In my topology there was spanning tree issue, optic cables were faulty, with very low Rx power. So some packets were lost, and this caused spanning tree topology re-calculating. Then router was too busy with "punt" packets and this caused high cpu utilization, and freezing. However after solving cable issues, and spanning tree issues, CPU utilization is back normal.

 

Thanks and have good day

Jozef

Hello Jozef,

I am happy you have solved.

It is difficult to think that physical layer issues can cause high CPU usage like it happened in your case.

 

You have been kind to provide feedaback on this issue, because it may happen to someone else.

 

Best Regards

Giuseppe

 

Hello the problem is resolved... For my case it was a switch bug.
I just minimize the traffic through of the switch ( detachment of some
cables) then the error message gone.
Thanks 🙏

Hello Khaldi,

can you report the bug id the device model and IOS version running on it to provide more info that may be useful for other colleagues?

 

Thanks for your feedback

 

Best Regards

Giuseppe

 

khaldi.tasbih
Level 1
Level 1

Hello i had the same  problem of punt casue policer drop packet casue 60 same like yours with my router cisco 4300, i read your post and i just want know if the configuration  enables punt-policing monitoring per interface . means the following instructions resolving the problem or not and if you have the solution please help me 

 Router(config)#interface gigabitEthernet 0/0/0
Router(config-if)#punt-control enable <packet per second

 

need your help

 

Hello Khaldi,

is the command supported on your router ?

 

If it is you can try to use it to see if it provides benefits

 

Hope to help

Giuseppe

 

ccbg
Level 1
Level 1

My gut is there was a broadcast storm caused by a routing loop!

Review Cisco Networking for a $25 gift card