Solved: NTP/GSP high CPU util

bmco · ‎08-10-2012

Hi,

We have 2 ASR9k (v4.2.0) routers doing "light" xconnect work, no problems so far.

Lately, we have noticed ntpd/gsp utilizing an abnormally high amount of CPU, is this normal behavoir on the ASR9k?

RP/0/RSP0/CPU0:asr-1#show processes cpu | utility head -n 2

Fri Aug 10 10:52:52.285 CET

CPU utilization for one minute: 57%; five minutes: 57%; fifteen minutes: 57%

RP/0/RSP0/CPU0:asr-1#show processes cpu | utility egrep -e ntpd -e gsp

Fri Aug 10 10:52:59.346 CET

233559 23% 23% 23% gsp

266447 27% 27% 27% ntpd

RP/0/RSP0/CPU0:asr-2#show processes cpu | utility head -n 2

Fri Aug 10 10:53:39.162 CET

CPU utilization for one minute: 56%; five minutes: 56%; fifteen minutes: 56%

RP/0/RSP0/CPU0:asr-2#show processes cpu | utility egrep -e ntpd -e gsp

Fri Aug 10 10:54:11.032 CET

233559 22% 22% 22% gsp

266447 28% 28% 28% ntpd

/Bjorn

mdebraba · ‎08-10-2012

CSCtw87827 is a known issue in 4.2.0 causing high CPU in ntpd/gsp.

Workaround is to enable IPv6 on the source interface used by NTP, but I'd recommend considering moving to 4.2.1 which is much more stable (unless extended validation testing was made for 4.2.0 of course)

Here is the full release note:

Issue:
======
The NTPD process takes around 27% cpu usage after upgrade to 4.2.0 30I. We did not see this issue before and we don not see this issue on another ASR9K running on 20L also.

Root cause:
-----------
ens_read_nb is getting called irrespective of producer available for requested interface exists or not. So this retry is happening infinitely if the requested interface is not configured with ipv6.

Fix:
====
When ENS_CMD_NB_STATUS is received we need to check the 'status' flag which is part of 'grp_ens_nb_status_msg_st' which is passed as part of 'value' parameter to the consumer callback. If this comes as 'ETIME' then it represents producer is available, but timeout happened due to some other reasons. If status is 'ESRCH' then it represents that producer doesn't exist for the requested interface. These are the two values for 'status' as of now.

Workaround:
===========
Configure either 'ipv6 enable' or an ipv6 address under requested interface. This will avoid infinite retrying from consumer library.

Condition which causes to hit this issue:
===============================
If any one of the LAS client register for link local address without enabling ipv6 on that interface, which may be due to configure/unconfigure/rollback and all.

View solution in original post

mdebraba · ‎08-10-2012

CSCtw87827 is a known issue in 4.2.0 causing high CPU in ntpd/gsp.

Workaround is to enable IPv6 on the source interface used by NTP, but I'd recommend considering moving to 4.2.1 which is much more stable (unless extended validation testing was made for 4.2.0 of course)

Here is the full release note:

Issue:
======
The NTPD process takes around 27% cpu usage after upgrade to 4.2.0 30I. We did not see this issue before and we don not see this issue on another ASR9K running on 20L also.

Root cause:
-----------
ens_read_nb is getting called irrespective of producer available for requested interface exists or not. So this retry is happening infinitely if the requested interface is not configured with ipv6.

Fix:
====
When ENS_CMD_NB_STATUS is received we need to check the 'status' flag which is part of 'grp_ens_nb_status_msg_st' which is passed as part of 'value' parameter to the consumer callback. If this comes as 'ETIME' then it represents producer is available, but timeout happened due to some other reasons. If status is 'ESRCH' then it represents that producer doesn't exist for the requested interface. These are the two values for 'status' as of now.

Workaround:
===========
Configure either 'ipv6 enable' or an ipv6 address under requested interface. This will avoid infinite retrying from consumer library.

Condition which causes to hit this issue:
===============================
If any one of the LAS client register for link local address without enabling ipv6 on that interface, which may be due to configure/unconfigure/rollback and all.

bmco · ‎08-10-2012

Thank you, I had searched the bugkit for ASR9k related issues, didn't think to look in "Cisco Carrier Routing System" :-)

We will upgrade to 4.2.1 ASAP.

/Bjorn