Monitoring Jitter, Packet Loss, and RTT in a Traditional Network (MPLS

Deepak Kumar · ‎02-24-2025

We are operating a traditional network across multiple locations, connected via MPLS and IPsec VPN. There is no SD-WAN under our direct management, making it challenging to access real-time monitoring data such as jitter, packet loss, delay, and RTT.

Currently, we use PRTG for network monitoring and are considering implementing IPSLA or BFD to measure network health between our location’s core switch and the HQ core switch. Additionally, we are exploring the possibility of monitoring specific services using IPSLA but have yet to finalize the exact metrics to track.

I would like to understand:

How reliable is IPSLA or BFD for monitoring packet loss, jitter, and RTT in such a setup?
How can we effectively use this data to assess the health of our MPLS and Internet links?
Are there any practical limitations or challenges in using these methods?
Have you implemented similar monitoring techniques, and what has been your experience?

Any technical insights or best practices would be greatly appreciated.

Regards,
Deepak Kumar,
Don't forget to vote and accept the solution if this comment will help you!

Ramblin Tech · ‎02-25-2025

BFD is not a performance-monitoring / observability tool; BFD monitors “liveness” between between two IP end-points for the purpose of quickly notifying clients (typically routing protocols) that a topology change has occurred so that they can withdraw routes based on the old topology and reconverge on the new. Some NPU-based platforms supporting BFD hardware offload can support interval timers down to single-digit milliseconds. BFD does not measure delay/jitter/loss, it only detects when a keepalive packet has not been received before time expiry.

Cisco supports a number of observability tools such as IPSLA, as you mention, and also others such as TWAMP, ThousandEyes, and Accedian. TWAMP is essentially the open standard version of the Cisco-proprietary IPSLA and my personal preference is to use open standards unless there is a compelling reason to use a proprietary protocol. In the case of TWAMP, some NPUs support timestamping in hardware which greatly increases the accuracy.

Disclaimers: I am long in CSCO. Bad answers are my own fault as they are not AI generated.

Joseph W. Doherty · ‎02-26-2025

Like Jim, I wondered about the BFD reference, unless you have something else in mind besides Bidirectional Forwarding Detection.

Yes, IPSLA might be used. Unless you use some product that uses it under the covers, setting up useful IPSLA monitoring can be rather tedious, especially if done beyond a small scale. Further, you would need to consider how you will process its captured data.

"Any technical insights or best practices would be greatly appreciated."

Depends on your end goals and realistic understanding of the technology.

For about a decade my primary networking engineering focus was to maximize network performance across the WAN rightsized for need. Over the decade, went from p2p leased lines to frame-relay, ATM, MPLS VPN. Also, often concurrently additionally had site-to-site VPN across the Internet.

For the most part, the level of monitoring you seek, which we often had, was mostly useless except for documenting service provider SLA failures. If that's your goal, such monitoring is great, but, unfortunately (?), my only major interaction with our network monitoring group was when I "broke" monitoring when I rolled out a (then) new Cisco technology (OER/PfR) that rerouted around WAN performance issues so quickly, our monitoring (and our users) often no longer saw issues. (We did work out methods that are monitoring did see WAN issues, while they continued to be, mostly, non events to our network traffic. Also, BTW, OER/PfR, analyzed, in near real-time, interface loading, and its own NetFlow and IPSLA stats, across multiple routers. Probably the precursor to SD-WAN.)

If you want to get into a discussion how to get the best out of a network like yours, I might be able to help.

If you just want good stats for service provider, or your, SLA failures, IPSLA, might be fine.