cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2119
Views
0
Helpful
2
Replies

NTP/GSP high CPU util

bmco
Level 1
Level 1

Hi,

We have 2 ASR9k (v4.2.0) routers doing "light" xconnect work, no problems so far.

Lately, we have noticed ntpd/gsp utilizing an abnormally high amount of CPU, is this normal behavoir on the ASR9k?

RP/0/RSP0/CPU0:asr-1#show processes cpu | utility head -n 2

Fri Aug 10 10:52:52.285 CET

CPU utilization for one minute: 57%; five minutes: 57%; fifteen minutes: 57%

RP/0/RSP0/CPU0:asr-1#show processes cpu | utility egrep -e ntpd -e gsp

Fri Aug 10 10:52:59.346 CET

233559  23%     23%      23% gsp

266447  27%     27%      27% ntpd

RP/0/RSP0/CPU0:asr-2#show processes cpu | utility head -n 2

Fri Aug 10 10:53:39.162 CET

CPU utilization for one minute: 56%; five minutes: 56%; fifteen minutes: 56%

RP/0/RSP0/CPU0:asr-2#show processes cpu | utility egrep -e ntpd -e gsp

Fri Aug 10 10:54:11.032 CET

233559  22%     22%      22% gsp

266447  28%     28%      28% ntpd

/Bjorn

1 Accepted Solution

Accepted Solutions

mdebraba
Cisco Employee
Cisco Employee

CSCtw87827 is a known issue in 4.2.0 causing high CPU in ntpd/gsp.

Workaround is to enable IPv6 on the source interface used by NTP, but I'd recommend considering moving to 4.2.1 which is much more stable (unless extended validation testing was made for 4.2.0 of course)

Here is the full release note:

Issue:
======
The NTPD process takes around 27% cpu usage after upgrade to 4.2.0 30I. We did not see this issue before and we don not see this issue on another ASR9K running on 20L also.

Root cause:
-----------
ens_read_nb is getting called irrespective of producer available for requested interface exists or not. So this retry is happening infinitely if the requested interface is not configured with ipv6.

Fix:
====
When ENS_CMD_NB_STATUS is received we need to check the 'status' flag which is part of 'grp_ens_nb_status_msg_st' which is passed as part of 'value' parameter to the consumer callback. If this comes as 'ETIME' then it represents producer is available, but timeout happened due to some other reasons. If status is 'ESRCH' then it represents that producer doesn't exist for the requested interface. These are the two values for 'status' as of now.

Workaround:
===========
Configure either 'ipv6 enable' or an ipv6 address under requested interface. This will avoid infinite retrying from consumer library.

Condition which causes to hit this issue:
===============================
If any one of the LAS client register for link local address without enabling ipv6 on that interface, which may be due to configure/unconfigure/rollback and all.

View solution in original post

2 Replies 2

mdebraba
Cisco Employee
Cisco Employee

CSCtw87827 is a known issue in 4.2.0 causing high CPU in ntpd/gsp.

Workaround is to enable IPv6 on the source interface used by NTP, but I'd recommend considering moving to 4.2.1 which is much more stable (unless extended validation testing was made for 4.2.0 of course)

Here is the full release note:

Issue:
======
The NTPD process takes around 27% cpu usage after upgrade to 4.2.0 30I. We did not see this issue before and we don not see this issue on another ASR9K running on 20L also.

Root cause:
-----------
ens_read_nb is getting called irrespective of producer available for requested interface exists or not. So this retry is happening infinitely if the requested interface is not configured with ipv6.

Fix:
====
When ENS_CMD_NB_STATUS is received we need to check the 'status' flag which is part of 'grp_ens_nb_status_msg_st' which is passed as part of 'value' parameter to the consumer callback. If this comes as 'ETIME' then it represents producer is available, but timeout happened due to some other reasons. If status is 'ESRCH' then it represents that producer doesn't exist for the requested interface. These are the two values for 'status' as of now.

Workaround:
===========
Configure either 'ipv6 enable' or an ipv6 address under requested interface. This will avoid infinite retrying from consumer library.

Condition which causes to hit this issue:
===============================
If any one of the LAS client register for link local address without enabling ipv6 on that interface, which may be due to configure/unconfigure/rollback and all.

Thank you, I had searched the bugkit for ASR9k related issues, didn't think to look in "Cisco Carrier Routing System" :-)

We will upgrade to 4.2.1 ASAP.

/Bjorn