10-04-2022 02:49 PM - edited 10-04-2022 04:00 PM
We have 6 pairs of 9800-80 on HA, and we noticed that on all pairs there is a process (SAMsgThread) that runs every 15 minutes that affects the 9800 controller CPU. That process SAMsgThread is the responsible for Smart Licensing operations. Depending on the time of the day the CPU hits 100% and it may affect client transactions depending on the qty of APs that are hosted in the controller.
The controller version now is running 17.6.4 and couple weeks back was on 17.6.2.
I have a ticket opened but they are not helping much. Has anybody experienced this issue?
Looking at the the command "show license eventlog" it displays the following every 15 min:
"2022-09-30 04:19:53.428 MST SAEVT_HA_MESSAGE messageType="SmartAgentHaMsgTSFileChange"
2022-09-30 04:34:53.428 MST SAEVT_HA_MESSAGE messageType="SmartAgentHaMsgTSFileChange"
2022-09-30 04:49:53.560 MST SAEVT_HA_MESSAGE messageType="SmartAgentHaMsgTSFileChange"
2022-09-30 05:04:53.486 MST SAEVT_HA_MESSAGE messageType="SmartAgentHaMsgTSFileChange"
2022-09-30 05:19:53.554 MST SAEVT_HA_MESSAGE messageType="SmartAgentHaMsgTSFileChange"
2022-09-30 05:34:53.561 MST SAEVT_HA_MESSAGE messageType="SmartAgentHaMsgTSFileChange"
2022-09-30 05:49:53.464 MST SAEVT_HA_MESSAGE messageType="SmartAgentHaMsgTSFileChange"
2022-09-30 06:04:53.214 MST SAEVT_HA_MESSAGE messageType="SmartAgentHaMsgTSFileChange"
2022-09-30 06:19:53.462 MST SAEVT_HA_MESSAGE messageType="SmartAgentHaMsgTSFileChange"
10-04-2022 07:47 PM
Post the complete output to the following commands:
10-05-2022 07:50 AM
sh platform resources
**State Acronym: H - Healthy, W - Warning, C - Critical
Resource Usage Max Warning Critical State
----------------------------------------------------------------------------------------------------
RP0 (ok, active) H
Control Processor 9.10% 100% 80% 90% H
DRAM 9898MB(15%) 62892MB 88% 93% H
harddisk 0MB(0%) 0MB 80% 85% H
ESP0(ok, active) H
QFP H
TCAM 100cells(0%) 1048576cells 65% 85% H
DRAM 679250KB(16%) 4194304KB 85% 95% H
IRAM 14764KB(11%) 131072KB 85% 95% H
CPU Utilization 2.00% 100% 90% 95% H
sh platform software status control-processor brief
Load Average
Slot Status 1-Min 5-Min 15-Min
1-RP0 Healthy 1.88 1.77 1.98
2-RP0 Healthy 1.17 1.00 1.06
Memory (kB)
Slot Status Total Used (Pct) Free (Pct) Committed (Pct)
1-RP0 Healthy 64402204 10121344 (16%) 54280860 (84%) 18299220 (28%)
2-RP0 Healthy 64402204 7306124 (11%) 57096080 (89%) 16134092 (25%)
CPU Utilization
Slot CPU User System Nice Idle IRQ SIRQ IOwait
1-RP0 0 3.00 1.10 0.00 95.90 0.00 0.00 0.00
1 5.39 1.19 0.00 93.30 0.00 0.09 0.00
2 6.50 1.20 0.00 92.30 0.00 0.00 0.00
3 4.50 1.50 0.00 93.99 0.00 0.00 0.00
4 19.91 1.30 0.00 78.77 0.00 0.00 0.00
5 21.50 2.80 0.00 75.70 0.00 0.00 0.00
6 7.20 2.10 0.00 90.60 0.00 0.10 0.00
7 6.29 1.49 0.00 92.20 0.00 0.00 0.00
8 9.50 2.10 0.00 88.40 0.00 0.00 0.00
9 6.20 1.40 0.00 92.39 0.00 0.00 0.00
10 8.30 2.40 0.00 89.30 0.00 0.00 0.00
11 6.30 1.50 0.00 92.20 0.00 0.00 0.00
12 1.40 1.10 0.00 93.19 0.00 4.30 0.00
13 2.50 1.20 0.00 95.79 0.00 0.50 0.00
14 8.59 2.39 0.00 88.91 0.00 0.09 0.00
15 4.30 1.20 0.00 94.40 0.00 0.10 0.00
16 6.50 1.70 0.00 91.80 0.00 0.00 0.00
17 3.49 1.59 0.00 94.90 0.00 0.00 0.00
18 7.80 3.30 0.00 88.78 0.00 0.10 0.00
19 4.70 1.30 0.00 94.00 0.00 0.00 0.00
20 5.30 1.90 0.00 92.40 0.00 0.40 0.00
21 7.70 1.90 0.00 90.39 0.00 0.00 0.00
22 8.50 1.70 0.00 88.70 0.00 1.10 0.00
23 10.38 3.39 0.00 86.21 0.00 0.00 0.00
2-RP0 0 0.30 0.20 0.00 99.50 0.00 0.00 0.00
1 12.01 6.20 0.00 81.78 0.00 0.00 0.00
2 0.30 0.20 0.00 99.49 0.00 0.00 0.00
3 0.70 0.40 0.00 98.90 0.00 0.00 0.00
4 2.79 0.89 0.00 96.30 0.00 0.00 0.00
5 6.90 2.80 0.00 90.30 0.00 0.00 0.00
6 1.20 0.30 0.00 98.50 0.00 0.00 0.00
7 2.90 0.80 0.00 96.30 0.00 0.00 0.00
8 3.20 0.60 0.00 96.20 0.00 0.00 0.00
9 5.80 1.10 0.00 93.09 0.00 0.00 0.00
10 1.00 0.20 0.00 98.79 0.00 0.00 0.00
11 2.09 0.39 0.00 97.40 0.00 0.09 0.00
12 1.60 0.40 0.00 98.00 0.00 0.00 0.00
13 0.19 0.39 0.00 99.40 0.00 0.00 0.00
14 3.40 0.80 0.00 95.80 0.00 0.00 0.00
15 1.39 1.99 0.00 96.60 0.00 0.00 0.00
16 1.00 0.30 0.00 98.70 0.00 0.00 0.00
17 0.89 0.19 0.00 98.80 0.00 0.09 0.00
18 1.20 0.30 0.00 98.50 0.00 0.00 0.00
19 4.60 0.90 0.00 94.50 0.00 0.00 0.00
20 0.30 0.20 0.00 99.50 0.00 0.00 0.00
21 6.70 2.80 0.00 88.10 0.00 2.40 0.00
22 2.20 1.10 0.00 96.50 0.00 0.20 0.00
23 0.89 0.69 0.00 98.40 0.00 0.00 0.00
10-05-2022 03:15 PM
Raise a TAC Case.
Memory utilization (>16%) is abnormally high.
10-06-2022 08:21 AM - edited 06-27-2023 07:21 AM
I'll be interested to know the outcome as we're due to upgrade production WLC's from 17.6.2 to 17.6.4 in next 2 weeks.
I have 9800-80 HA SSO in lab not showing any of those CPU spikes and not seeing any of those events in logs either. Lab WLC has very few APs and clients though so that might be the difference.
How do you have smart licensing configured? Ours has call-home service disabled and reporting direct to CSSM using smart transport.
10-06-2022 03:35 PM
I have about 2900 APs on the controller, you wont see the CPU issue unless you have 2k plus running on the controller. My test controller does not have that issue. By the way, I had the issue on 17.6.2 version too
10-06-2022 03:37 PM
Yeah, I tried with license direct and disable. It did not make a change on the CPU spike.
06-26-2023 11:44 AM
I have a pair of 9800-80s in HA and am also seeing the CPU spike every 15mins. How were you able to find which process was causing the issue?
06-26-2023 04:39 PM
Start with the complete output to the following commands:
06-27-2023 05:25 AM
Hey @Leo Laohoo,
Please see attached.
06-27-2023 06:07 AM
I am not seeing anything wrong with the output.
I can, however, see the spike of CPU every 15 minutes. Please rerun the following command when the CPUs spike because I want to take a snapshot of which CPU is actually hot spinning.
So every 15, 30, 45 or top of the hour, re-run the command several times.
06-27-2023 06:44 AM
Hey @Leo Laohoo
Does this capture what you are looking for? In the last command that I ran it looks like the linux_iosd-imag process spikes for a bit.
06-27-2023 04:17 PM
@Beazle wrote:
I ran it looks like the linux_iosd-imag process spikes for a bit.
linux_iosd-imag is a process related- or attributed to telemetry. Is there an SNMP server (and how many) &/or DNAC?
06-28-2023 05:12 AM
We have a SNMP server that polls bandwidth and traffic every 5 mins. Then we also have Cisco Prime and DNAC setup for telemetry. Do you think that could be too many devices polling the controller and causing a CPU spike?
06-28-2023 05:55 AM
N
@Beazle wrote:
Do you think that could be too many devices polling the controller and causing a CPU spike?
No, DNAC is.
Try it. Remove DNAC from polling the stack for 48 hours and compare the results.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide