01-10-2026 07:48 AM
Dear all
We are currently running 7x C9300-48P all with IOS-XE 17.12.XX.
We are experiencing a strange issue where on 5x of the devices we get high latency when pinging the virtual switch interface that is used for CLI access. But it is only happening every 5-7 pings
This is also noticeable when connecting to the CLI. When the latency goes up we have a small lag when entering commands for example.
All the 7 devices are setup the same and have the same function (access switches).
The devices were previously running 17.12.04 and I have since upgraded them to 17.12.06 but the behaviour is still the same.
When checking "sh proc cpu sorted | ex 0.00" I can see that the TPS IPC PROCESS is running around 60-70% and this coincides with the lag/latency spikes:
Switch#sh proc cpu sorted | ex 0.00
CPU utilization for five seconds: 63%/0%; one minute: 38%; five minutes: 37%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
363 419605 824 509229 62.00% 36.05% 34.72% 0 TPS IPC Process
490 197 1082 182 0.23% 0.06% 0.01% 1 SSH Process
192 2142 1697 1262 0.15% 0.16% 0.17% 0 CDP Protocol
168 2476 8257 299 0.15% 0.17% 0.17% 0 FED IPC process
480 2142 11130 192 0.07% 0.16% 0.16% 0 SISF Switcher Th
78 2624 14832 176 0.07% 0.07% 0.07% 0 IOSD ipc task
70 763 1750 436 0.07% 0.04% 0.05% 0 Net Background
155 1459 709 2057 0.07% 0.09% 0.09% 0 NGWC DOT1X Proce
397 956 570 1677 0.07% 0.02% 0.01% 0 Syslog Traps
117 679 21843 31 0.07% 0.04% 0.04% 0 IOSXE-RP Punt Se
212 710 8244 86 0.07% 0.07% 0.07% 0 UDLD
100 1675 3976 421 0.07% 0.11% 0.11% 0 Crimson flush tr
In addition the log contains the following entries:
Switch#sh logg | s HOG
Jan 10 16:37:11.463: %SYS-3-CPUHOG: Task is running for (2597)msecs, more than (2000)msecs (3/3),process = TPS IPC Process.
Jan 10 16:37:19.753: %SYS-3-CPUHOG: Task is running for (2538)msecs, more than (2000)msecs (1/1),process = TPS IPC Process.
Jan 10 16:37:28.049: %SYS-3-CPUHOG: Task is running for (2491)msecs, more than (2000)msecs (1/1),process = TPS IPC Process.
Jan 10 16:37:36.316: %SYS-3-CPUHOG: Task is running for (2435)msecs, more than (2000)msecs (2/2),process = TPS IPC Process.
Jan 10 16:37:44.583: %SYS-3-CPUHOG: Task is running for (2365)msecs, more than (2000)msecs (3/3),process = TPS IPC Process.
Jan 10 16:37:52.879: %SYS-3-CPUHOG: Task is running for (2320)msecs, more than (2000)msecs (0/0),process = TPS IPC Process.
Jan 10 16:38:01.176: %SYS-3-CPUHOG: Task is running for (2209)msecs, more than (2000)msecs (2/2),process = TPS IPC Process.
Jan 10 16:38:09.438: %SYS-3-CPUHOG: Task is running for (2143)msecs, more than (2000)msecs (1/1),process = TPS IPC Process.
Jan 10 16:38:17.701: %SYS-3-CPUHOG: Task is running for (2097)msecs, more than (2000)msecs (1/1),process = TPS IPC Process.
Jan 10 16:38:25.944: %SYS-3-CPUHOG: Task is running for (2065)msecs, more than (2000)msecs (1/1),process = TPS IPC Process.
Jan 10 16:38:34.219: %SYS-3-CPUHOG: Task is running for (2033)msecs, more than (2000)msecs (1/1),process = TPS IPC Process.
My understanding is, is that the control plane is busy and thus starts droping packets but I have not been able to confirm this as I am not very well versed with CPU/Control plane troubleshooting.
I have followed most of the troubleshooting guides I've found but didn't come to a proper finding/conclucsion, hence me creating this post.
As mentioned 2/7 devices are not experiencing this issue even though they have the same config.
I've also tried finding differences in their config but I was not successful.
Has somebody encoutered this already or has any other ideas?
Thank you and best!
Solved! Go to Solution.
01-13-2026 03:05 AM
Dear all
We found the issue when comparing configs again in detail.
The group our company belongs to is using DNAC and has pushed 32 telemetry ietf subscriptions to the 5 affected devices.
If we remove that config the problem is gone completely.
Thank you all for your ideas and your time.
Best!
01-10-2026 08:02 AM
- @mar0n Consider using latest advisory software : https://software.cisco.com/download/home/286313983/type/282046477/release/IOSXE-17.15.4
and check if that can help
M.
01-10-2026 08:34 AM
Hi,
There might be a bug, however, since you've already upgraded to as I see, an MD suggested release, could be something related to control-plane. First collect and post the complete output of following commands:
show policy-map control-plane
show platform hardware fed active qos queue stats internal cpu policer
show platform software fed switch active punt cpuq all
show platform software fed switch active punt cause summary
show platform software fed switch active punt rates interfaces
If there's control-plane overloading, we'll be using this document as a guide to get to the Root Cause:
Thanks,
Cristian.
01-10-2026 02:14 PM - edited 01-10-2026 02:14 PM
Thanks Cristian, unfortunately I didnt reply to your post, see below in the thread the outputs and answers.
01-10-2026 10:06 AM - edited 01-10-2026 10:07 AM
Hi both
Thank you for taking the time to respond.
I've followed that document already previously but as I said I am not well versed with this kind of troubleshooting.
I've captured the output of the commands stated by you and attached them below.
As far as I am concerned we are using the default cpp policy-map.
Looking at the cpu policer I can see that the switch has "Forus traffic" drops:
============================================================================================
(default) (set) Queue Queue
QId PlcIdx Queue Name Enabled Rate Rate Drop(Bytes) Drop(Frames)
--------------------------------------------------------------------------------------------
2 14 Forus traffic Yes 4000 4000 169370 285
However this values are not increasing when I run the command a few times and since the issue (lag/latency) is there permanently I would expect the counter to increase...
Looking foward to hear what you can interpret from this files.
Thank you very much again for your support!
Best
mar0n.
01-10-2026 01:54 PM
The CLI lagging and ping latency spikes are probably perfectly "normal" with the corresponding CPU spikes.
What's also probably not normal, is why those CPU spikes are happening.
From a quick review, the %SYS-3-CPUHOG: messages, during "normal" operations, are generally considered abnormal. They may be indicative of some software defect, or something unusual happening within your network.
If you have contract TAC support, this might be worth opening a TAC case.
Besides the information already requested by @Cristian Matei (which it appears you've provided), might you also be able to provide a "sanitized" config?
01-10-2026 02:03 PM
Hi Joseph
The devices are under contract yes but I am not able to open a TAC directly but will have to go through our supplier.
Since I will need to explain the problem to their technician and then again to TAC, I thought I'd try my luck here first.
But sanitizing the config vs explaining the problem twice for a TAC is probably about the same amount of effort, so I'd rather go with the TAC option than possibly leaking something by not sanitizing my config file properly. I hope you understand.
Thank you for answering to my post!
Best
01-13-2026 03:05 AM
Dear all
We found the issue when comparing configs again in detail.
The group our company belongs to is using DNAC and has pushed 32 telemetry ietf subscriptions to the 5 affected devices.
If we remove that config the problem is gone completely.
Thank you all for your ideas and your time.
Best!
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide