Different latency between traceroute and ping

Alek5942 · ‎08-27-2024

Hello community,

Recently, I observed very strange situation. Users were complaining about general slowness of internet connection. I performed ping and traceroute from Core switch to 8.8.8.8 and ping latency was very good whereas traceroute showed high latency inside ISP's network. We opened ticket for ISP and after some time issue was resolved. But my question is, how is possible that latency of ping was fine? How is it possible that there is difference between in latency between ping and traceroute? Below is example of output with modified IP addresses:

CoreSwitch#ping 8.8.8.8
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 8.8.8.8, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 5/5/7 ms

CoreSwitch#traceroute 8.8.8.8 numeric
Type escape sequence to abort.
Tracing the route to 8.8.8.8
VRF info: (vrf in name/id, vrf out name/id)
1 192.21.17.225 0 msec 0 msec 0 msec
2 7.19.29.217 1 msec 1 msec 1 msec
3 3.72.12.251 1 msec 1 msec 1 msec
4 * * *
5 12.14.11.18 2993 msec
12.14.11.20 2411 msec
12.14.11.18 2680 msec
6 *
17.29.3.75 2986 msec
18.72.6.162 2560 msec
7 *
17.29.3.1 332 msec
17.29.3.4 980 msec
8 17.29.4.21 1483 msec * *
9 8.8.8.8 1633 msec 1794 msec *
CoreSwitch#

Flavio Miranda · ‎08-27-2024

@Alek5942

"Traceroute involves sending UDP packets to each node along the way, and waiting for its timeout response (then moving on to the next node), whereas a ping is just forwarded. What you're seeing is the time it takes for each node to respond to the request instead of just forwarding a small packet."

https://www.cisco.com/c/en/us/support/docs/ios-nx-os-software/ios-software-releases-121-mainline/12778-ping-traceroute.html

Meaning, the process envolved in the traceroute is totally different then the process in the ping. The ping is just a packet from the source to the destination in which the nodes in between just need to route it forward whilst the trace route requires UDP packet to be sent and wait for the time out in order to account the amount of time.

Alek5942 · ‎08-27-2024

@Flavio Miranda I agree, but if such huge latency were only for one device, it would be understandable that some device has issue with CPU for example. But in my traceroute output staring from hop 5 all devices have issues with high latency. If it were only one device with issue, then only one hope would have high latency.

Flavio Miranda · ‎08-27-2024

The explanation I provided above was to make a point that the latency between ping and traceroute does not line up because they use a very different process in order to get the result. Ping being only routing on layer3 packets and traceroute being UDP, which means layer 4, which results in different time response.

That being said, for your scenerio I would say that was not conclusive. For example, as it was UDP traffic, the ISP could/can have some additional security policy to filter the traffic looking for some DDoS as the traffic is sent to the device CPU and not passing through the device data plane and this can explain why we can see such a high time response.

A test like that using trace route towards the internet is a good stating point but it is not conclusive enough and any latency listed below is extremely high to me

5 12.14.11.18 2993 msec
12.14.11.20 2411 msec
12.14.11.18 2680 msec
6 *
17.29.3.75 2986 msec
18.72.6.162 2560 msec
7 *
17.29.3.1 332 msec
17.29.3.4 980 msec
8 17.29.4.21 1483 msec * *
9 8.8.8.8 1633 msec 1794 msec *

MHM Cisco World · ‎08-28-2024

ISP use policy to protect it router CPU from high rate specific ICMP message type
this can slow the response and hence you get high latency
dont depend on traceroute alot in internet, it excellent tool for internal but for internet many ISP have policy about TTL exceed ICMP.

MHM

Giuseppe Larosa · ‎08-28-2024

Hello @Alek5942 ,

traceroute is not intended to be used to measure latency or RTT it is a tool to discover the L3 router hops between the source and the destination.

For the way it works as it has been explained by @Flavio Miranda the UDP probe packets with increasing TTL values expire in different nodes , the node will then answer back with an ICMP unreachable TTL expired to the source.

This packet is process switched in software by the nodes. An ICMP ping test is processed in hardware because it is user transit traffic.

Hope to help

Giuseppe

Joseph W. Doherty · ‎08-27-2024

"But my question is, how is possible that latency of ping was fine?"

Well, one difference, your pings and traceroute could have taken different paths. Additionally, transit devices might treat ping and traceroute packets differently.

What's also interesting in your traceroute results, you mention traceroute latency jumps at hop 5, but hop 4 returned no results. Also, starting with hop 5, traceroute shows multiple paths being used. Possibly, some of those hops have more than 3 paths, which might only been seen if you increase the traceroute count above the default of 3 and/or you record IPs, for the same hop, across multiple invocations.

BTW, when MPLS is being used in the Internet, and it often is, there can be even more hops than traceroute shows.

Alek5942 · ‎08-28-2024

@Joseph W. Doherty Thank you for the reply. Some other important facts: users were complaining about Teams calls quality and browsing. I did traceroute from Edge Router, and traceroute was normal. Edge Router has different public IP than Core switch, traffic from Core switch goes via Firewall and then Firewall does NAT to public IP. Both public IPs on Edge Router and Firewall are from the same IP range. After ISP fixed their issue, traceroute becomes normal, so there were really some issues inside ISP which caused such high latency for traceroute. This is what I understand from your reply: Maybe ISP core network has some load sharing / load balancing based on protocol and source IP and that's why traceroute from Router and ping from Core switch were ok, but traceroute from Core switch had a very big latency and users behind core switch which were using udp and tcp protocols were affected. So, we can put it that way: traffic with source public IP of Firewall with protocols like TCP / UDP took different path that traffic with source public IP of Firewall with ICMP protocol. Am I right? Is this something ISP usually does in their network?

Joseph W. Doherty · ‎08-28-2024

It's a possibility, but would be unusual.

Normally, ISPs treats all Internet traffic alike, except for traffic that might pose a threat to the ISP infrastructure or how it routes.

As described by others, a ping would be just another transit packet to most routers, but traceroute may want to interact with an ISP device.

However, even with ping, a nice option I've used over the years, for some path analysis, is source routing. Yet, that's considered a threat to natural routing, so it's often not accepted. Recording hops, in the ping, isn't too acceptable either.

If your ISP "fixed" the issue, it would appear they had some misconfiguration, i.e. something not really intended. Don't count on them ever admitting what it was.

Perhaps they had some kind of shaper to throttle ICMP responses which also inadvertently shaped traffic that it shouldn't have.

Alek5942 · ‎08-29-2024

@Joseph W. Doherty I agree, that I also doubt that they will do such complex PBR in their network, it would slow down their network. But this is only explanation I have to explain why traceroute from edge Router was fine, though I did only one traceroute from Router, maybe another one also would show high latency. Anyway, I still can't explain myself why ping from Core switch was fine, but traceroute from the same switch had high latency. And since there really were issues, traceroutr showed correct results. Do you think they could put some shaper to throttle ICMP responses and accidentally shaped normal traffic as well? But why ping wasn't affected?

MHM Cisco World · ‎08-29-2024

Friend

Icmp is pass through data plane so it not effect cpu of ISP router

Traceroute must process by cpu to send back to you icmp ttl exceed, this punt to cpu ISP prevent it and sure it use some policy to slow down or totally prevent traceroute

If you have issue with voice use IP sla udp jitter

MHM

Joseph W. Doherty · ‎08-29-2024

Unfortunately, without the ISP telling you what their actual resolution was you're left with suppositions.

What was important you identified some real issue that your ISP corrected.

Alek5942 · ‎08-28-2024

@Giuseppe Larosa @Flavio Miranda Thank you for you reply. But in my case, traceroute really indicated issue inside ISP. Users were complaining regarding Teams voice issues and general slowness. We opened ticket with ISP and after some time issue was resolved and traceroute become normal. I agree, that traceroute can show higher latency that ping, but it should not be such high. Below is traceroute results after issue inside ISP was resolved (IPs are changed):

CoreSwitch#traceroute 8.8.8.8 numeric
Type escape sequence to abort.
Tracing the route to 8.8.8.8
VRF info: (vrf in name/id, vrf out name/id)
1 192.21.17.225 0 msec 0 msec 0 msec
2 7.19.29.217 0 msec 1 msec 1 msec
3 3.72.12.251 1 msec 1 msec 1 msec
4 * * *
5 12.14.11.18 4 msec 5 msec
12.14.11.20 5 msec
6 17.29.3.75 11 msec 8 msec
18.72.6.162 6 msec
7 17.29.3.1 37 msec 11 msec
17.29.3.4 9 msec
8 * *
17.29.4.21 9 msec
9 8.8.8.8 7 msec 8 msec 13 msec

So, latency is normal. It proves that there were issue. My question is, why ping was normal while there were real issues and traceroute had very high latency? Why ping also didn't have latency? Also, it's very important to mention that when I did traceroute from Edge Router, traceroute was normal. Edge Router has different public IP than Core switch, traffic from Core switch goes via Firewall and then Firewall does NAT to public IP. Both public IPs on Edge Router and Firewall are from the same ip range. I think, maybe ISP core network has some load sharing / load balancing based on protocol and source IP and that's why traceroute from Router and ping from Core switch were ok, but traceroute from Core had a big latency and users behind core switch which were using udp and tcp protocols were affected. I don't have any other explanation. What do you think?

MHM Cisco World · ‎08-28-2024

Traceroute become normal and voice?

MHM

Flavio Miranda · ‎08-28-2024

I believe you are right.