I'm not a networking guy - I work on the system engineering/administration side of the house. Sorry if this seems like a 'noob' question.
I'm on the east coast & I bounced a machine on the west cost leaving a ping running to let me know when it came back up. The box came up fine, but I forgot to close the ping which ran for another 20-30 minutes. When I checked the ping window, I noticed what seemed to be a significant amount of packet loss.
I closed that ping & spun up a new one, letting it run for just a minute or two and when I stopped it, it showed 13% packet loss.
I tried again this time targeting a different host in the same site & I saw 24% packet loss.
Note: Both of these hosts are robust servers (16 cores/32GB of RAM) and were no where near taxed during the ping tests.
The “official” word from our networking team is that icmp traffic are given a lower priority than other traffic by the SilverPeaks, and they are occasionally dropped. When I asked our provider to test the line, they showed me multiple logs of 5000+ pings using our coast-to-coast line with zero drops. This suggests that the drops are being caused by our equipment, somewhere. Right?
Without having access to any of the switches, routers, cores etc, is there some sort of test I could perform to evaluate the condition of the line that doesn't rely on/use ICMP?
You can ask your team for the off hours when there is minimal usage or traffic. The imp ping should not drop then. Of course if you have access to your routers you can see the queue depth and drops if any.
Sent from Cisco Technical Support iPad App
Thanks for the response Durga Prasad!
The issue seems to happen round the clock and there isn't that much traffic. I'll collect more information before going to our networking guys to see if they have time to check the queue depth as you suggested.
Well your problem here is that you have multiple hops involved. If you are doing end to end pings you are going though lots of different devices. The first thing you would need to do is isolate where the drops are occurring.
To do this start with a trace route and document all of the hops along the way. Then start doing extending ping testing to each hop along the way and observe at which point you start to see the packet loss. The problem will be located somewhere between the point at which you have no loss and the point which you have loss.
Unfortunately, from what you say, this point is very likely outside of your control. So you will need to rely on others to troubleshoot from there. It may at least get you pointed at the correct people. If it were within your control you could check for line errors between those two points using internal error counters.
Thanks Gergory Snipes - excellent advice!
I'm fairly certain all of our offices are part of the MPLS cloud.
We went through a similar exercise a while ago so I ran through the tests again today as you suggested. As before, the traceroute shows 7 hops.
tracert host Tracing route to host.f.q.d.n [172.16.103.16] over a maximum of 30 hops: 1 <1 ms <1 ms <1 ms 172.16.38.2 2 <1 ms <1 ms <1 ms 172.16.251.1 3 <1 ms <1 ms <1 ms 172.16.202.250 4 74 ms 77 ms 88 ms 172.31.202.250 5 78 ms 77 ms 73 ms 172.31.202.1 6 93 ms 79 ms 74 ms 172.31.251.2 7 88 ms 83 ms 88 ms host.f.q.d.n [172.16.103.16] Trace complete.
Then did 100 pings to each hop, less the ultimate destination, and I see losses start at the 4th hop, the West Coast SilverPeak.
>for %i in (172.16.38.2 172.16.251.1 172.16.202.250 172.31.202.250 172.31.202.1 172.31.251.2) do ( ping %i -n 100 | find /i "loss" | find /v "Lost = 0" ) >(ping 172.16.38.2 -n 100 | find /i "loss" | find /v "Lost = 0" ) >(ping 172.16.251.1 -n 100 | find /i "loss" | find /v "Lost = 0" ) >(ping 172.16.202.250 -n 100 | find /i "loss" | find /v "Lost = 0" ) >(ping 172.31.202.250 -n 100 | find /i "loss" | find /v "Lost = 0" ) Packets: Sent = 100, Received = 98, Lost = 2 (2% loss), >(ping 172.31.202.1 -n 100 | find /i "loss" | find /v "Lost = 0" ) Packets: Sent = 100, Received = 94, Lost = 6 (6% loss), >(ping 172.31.251.2 -n 100 | find /i "loss" | find /v "Lost = 0" ) Packets: Sent = 100, Received = 95, Lost = 5 (5% loss),
Just so its clear:
But still, ICMP is the root protocol here, right? The last time we presented similar information to them they immediately responded with "icmp traffic is given a lower priority than other traffic by the SilverPeaks, and they are occasionally dropped". Leaving us without much more to go on.
I was hoping to leverage some other tool or method that's not using or otherwise relying on ICMP so they can't come back with that canned answer.
Thanks Gregory Snipes - Cryping is the kind of utility I'm looking for in this situation! If time permits I'll try setting up smokeping and custos - any other utility I should consider exploring?