CISCO 3750 Switch latency

ZAHIDHASEEB · ‎08-10-2021

My cisco 3750 core switch latency increase to 200+ ms sometimes and become normal after few minutes/hours to below 10 ms. However sometimes the latency keep 200+ ms and we need to reboot the switch which resolve the issue. However this is not permanent and we need to reboot switch after every 20 hours

balaji.bandi · ‎08-10-2021

My cisco 3750 core switch latency increase to 200+ ms

Latency to switch IP address or Interenet ? from what device or where the device connected while pinging ? (post source IP and destination if you seeing as 200ms Latency?)

Latency increases due to utilisation of Link or congestion, or end device responce.

So you need to Troublshoot the issue.

what was the load that time on the switch, what is the interface utlisation that time ? and so on need to gather information (rebooting not the solution all the time). finding the problem fixing is long term goal.

is this happing only recently , when ? what was changed in the network ?

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

ZAHIDHASEEB · ‎08-10-2021

1- The latency issue happens on LAN and internet both. We are connected directly with a layer-II switch which is further connected with a Layer-III Core switch. Layer-III has so many layer-II switches connected too.

2- Our complete infrastructure is on 1GIG network and we notice that the Layer-II and Layer-III switche interfaces are under 1G utilization, not more than 300/400mb

3- We notice sometimes that the load of CPU reach to 70% to 80% but the ms keep below 10ms and at that time we dont feel any latency issue. Currently the CPU load is 48% but the latency is showing 0ms on NPM tool. I am assuming that we should not relate the CPU load with latency ?

4- We are facing this issue from three weeks

balaji.bandi · ‎08-10-2021

Then you need to go Granular troubleshooting :

4- We are facing this issue from three weeks

Theni start look what was changed last weeks time ? - any network config, new devices added, any Firmware upgrade ?

1- The latency issue happens on LAN and internet both. We are connected directly with a layer-II switch which is further connected with a Layer-III Core switch. Layer-III has so many layer-II switches connected too.

how about the devices connected layer3 ? Do you see same issue, is the issue all Layer 2 connected devices ? (or only part of the network ?

3- We notice sometimes that the load of CPU reach to 70% to 80% but the ms keep below 10ms and at that time we dont feel any latency issue. Currently the CPU load is 48% but the latency is showing 0ms on NPM tool. I am assuming that we should not relate the CPU load with latency ?

If this Layer 2, most of the Layer 3 functions are perfromed on Layer 3 device, what is the status before 3 weeks ?

1- The latency issue happens on LAN and internet both. We are connected directly with a layer-II switch which is further connected with a Layer-III Core switch. Layer-III has so many layer-II switches connected too.

This need to be addressed and Monitored correctly, you need to confgure your NMs different Layer 3 Gateway IP and check the Ping ? in various devices, also same time Internet ?

By the way - what is the models of the devices using here , what IOS code, do you have any STP Loops here ? what IGP protocol using or static route all over ?

where is NPM tool Located ? (is this latency you measuring using NPM ?

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

ZAHIDHASEEB · ‎08-10-2021

1- We did not change anything on network.

2- Below is our environment details. Please note, routing is only perform by Cisco 3750 layer-III switch

30 CISCO switches (model 2960 and 2970 layer-II) installed on access layer at different floors. Currently we have 06 floors.

01 CISCO switch (model 3750 layer-III) installed on core layer on top of access layer switches

30 WIFI installed with CISCO access layer switches

Currently we have around 500 users/laptops/virtual machines/physical servers connected behind access layer switches

3- Before three weeks everything was working fine.

4- Yes the NPM tool is present

5- How to catch the STP loop ?

6- Yes the NPM telling about the latency and CPU load.

Do you suggest how to catch who is utilizing the CPU load

balaji.bandi · ‎08-10-2021

what is the IOS Code running on each device ? ( where is NPM connected ? Layer 3 or Layer 2), what is the uptime of the devices ?

when NPM shows you the latency, what is the outcome manually ping have same results in different part of the network, what is the results from 3750 coide device ? does the users complain about speed ? Both Local and Internet ? ( all 500 users ? or only part of the network ?

what devices you reboot to resolve this issue ?

Can you give example of Latency where to where ? Source IP and Destination IP in the Lan and Internet ?

5- How to catch the STP loop ?

Draw a network diagram so you will know any where Loops.

show spanning block (show you on layer 2 switch any Lopps).

Do you have any SYSLOG which you collect the logs ? You need to collect the Logs and co-relate to the problem.

Note : this is your network, our sugestions only based on the input, ( you need to breakdown the toubleshoot each leverl to bottom of the problem) - community can only help how you can do, 500 users is small/medium size network.

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

ZAHIDHASEEB · ‎08-10-2021

Does Buffer miss is any issue ?

balaji.bandi · ‎08-10-2021

there are many reasons at the stage we do not know what causing that - IOS Latest stable is 12.2(55) SE12

I am sure your interfaces have discards, errors, can you post all the interface on the core connected show interface gi x/x to look the Drops.

Not from NMS, from device.

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Joseph W. Doherty · ‎08-10-2021

What model 3750 and what version of IOS running?

Although you later post ". . . switche interfaces are under 1G utilization, not more than 300/400mb", depending on you time period being average, a 40% load can be "busy". What do you drop stats look like? What do your TCAM stats look like? What do your memory stats look like?

ZAHIDHASEEB · ‎08-10-2021

1- Please find the sh version result from core switch

Switch Ports Model SW Version SW Image
------ ----- ----- ---------- ----------
* 3 30 WS-C3750E-24TD 12.2(55)SE1 C3750E-UNIVERSALK9-M

2- Drop stats

E#show platform tcam utilization

CAM Utilization for ASIC# 0 Max Used
Masks/Values Masks/values

Unicast mac addresses: 6364/6364 575/575
IPv4 IGMP groups + multicast routes: 1120/1120 1/1
IPv4 unicast directly-connected routes: 6144/6144 6144/6144
IPv4 unicast indirectly-connected routes: 2048/2048 149/149
IPv4 policy based routing aces: 442/442 12/12
IPv4 qos aces: 512/512 21/21
IPv4 security aces: 954/954 35/35

Note: Allocation of TCAM entries per feature uses
a complex algorithm. The above information is meant
to provide an abstract view of the current TCAM utilization

Joseph W. Doherty · ‎08-10-2021

For starters, if possible, I would recommend upgrading your SE1 to SE12 (if you can), as between SE1 and SE12 is all bug fixes. I recall, (55) got very stable starting around SE10.

Don't know your overall interface usage, but your discard rates, on some of the interfaces, might be also causing "latency". Is QoS enabled? If so, running default or custom QoS configuration?

Input errors, on your one interface, should, ideally, be remediated.

"Pv4 unicast directly-connected routes: 6144/6144 6144/6144" 100% utilization? Switch being used just for L2 or L2 and L3? What TCAM template being used?

When CPU "spikes", do you have stats for the busy processes?

ZAHIDHASEEB · ‎08-11-2021

#show mls qos
QoS is disabled
QoS ip packet dscp rewrite is enabled

sh proc cpu sort | ex 0.00
CPU utilization for five seconds: 59%/29%; one minute: 58%; five minutes: 66%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
156 466814 28662 16286 12.61% 11.08% 11.15% 0 HL3U bkgrd proce
4 157690 6756 23340 2.39% 1.45% 1.28% 0 Check heaps
94 311024 280539 1108 1.91% 1.67% 3.84% 0 HLFM address lea
10 190771 380294 501 1.91% 1.82% 1.56% 0 ARP Input
216 170561 77230 2208 1.75% 1.33% 1.29% 0 Spanning Tree
332 117920 15389 7662 0.95% 1.06% 1.05% 0 CEF: IPv4 proces
114 63810 10108 6312 0.63% 0.59% 0.58% 0 hpm counter proc
327 54860 42001 1306 0.63% 0.48% 0.43% 0 SNMP ENGINE
199 71132 194827 365 0.47% 0.74% 0.80% 0 IP Input
203 68447 206178 331 0.47% 0.44% 0.49% 0 ADJ resolve proc
342 1792 1145 1565 0.31% 0.90% 0.31% 2 SSH Process
78 34584 4074 8488 0.31% 0.25% 0.26% 0 Adjust Regions
166 20096 2081 9656 0.15% 0.13% 0.15% 0 HQM Stack Proces
155 14289 284448 50 0.15% 0.22% 0.18% 0 Hulc LED Process
229 7958 10109 787 0.15% 0.16% 0.15% 0 PI MATM Aging Pr
325 32085 83116 386 0.15% 0.27% 0.25% 0 IP SNMP
258 7216 25745 280 0.15% 0.11% 0.08% 0 DHCPD Receive
167 4086 4144 986 0.15% 0.05% 0.03% 0 HRPC qos request

#show platform tcam utilization

CAM Utilization for ASIC# 0 Max Used
Masks/Values Masks/values

Unicast mac addresses: 6364/6364 857/857
IPv4 IGMP groups + multicast routes: 1120/1120 1/1
IPv4 unicast directly-connected routes: 6144/6144 6144/6144
IPv4 unicast indirectly-connected routes: 2048/2048 140/140
IPv4 policy based routing aces: 442/442 12/12
IPv4 qos aces: 512/512 21/21
IPv4 security aces: 954/954 35/35

Note: Allocation of TCAM entries per feature uses
a complex algorithm. The above information is meant
to provide an abstract view of the current TCAM utilization

#sh sdm prefer
The current template is "desktop default" template.
The selected template optimizes the resources in
the switch to support this level of features for
8 routed interfaces and 1024 VLANs.

number of unicast mac addresses: 6K
number of IPv4 IGMP groups + multicast routes: 1K
number of IPv4 unicast routes: 8K
number of directly-connected IPv4 hosts: 6K
number of indirect IPv4 routes: 2K
number of IPv4 policy based routing aces: 0
number of IPv4/MAC qos aces: 0.5K
number of IPv4/MAC security aces: 0.875k

I assume that the CPU hike sometime does not put any bad impact. but I assume that when the latency increases of core switch then we are in trouble. This time the CPU went to 80% but the ms was 2ms to 3ms

Joseph W. Doherty · ‎08-11-2021

"I assume that the CPU hike sometime does not put any bad impact."

Correct. What's the CPU is doing during a spike is important.

"The current template is "desktop default" template."

Ah, one of the other templates might be better for your usage.

We still have the possible issue of packet discards, earlier posted stats. Again, packets being drop can create "latency".

ZAHIDHASEEB · ‎08-11-2021

Sorry my bad. Its type mistake for version. Kindly find the correct details

Switch Ports Model SW Version SW Image
------ ----- ----- ---------- ----------
* 3 30 WS-C3750E-24TD 12.2(55)SE11 C3750E-UNIVERSALK9-M