Strange Call Call Stats (very high jitter)

edwardforgacs · ‎04-08-2024

Are there any known interoperability issues or more-specific causes of very high Jitter numbers in the call stats, such as the attached? Should the attached example be just considered invalid, or indicative of severe network issues?

In this particular example call, the stats were from an SCCP 7975 phone, with a call via SIP trunk on CME. The call connected normally but with some audio issues, and suddenly the Jitter numbers went through the roof. When the far-end attempted to transfer the call, it was lost. The call appeared to stay connected, but a separate Wireshark trace which was running in-between CME and the ITSP indicated that no RTP packets were being sent by the ITSP to the SIP trunk.

Similar very high and seemingly invalid Jitter numbers were observable in the Wireshark trace in-between CME and the ITSP in this example, although I have seen other example calls where only the phone displayed those invalid numbers.

I suspect issues at the ITSP end or interoperability issues with the SIP trunk, but I am struggling to isolate the issue further. Although having used the same ITSP for a couple of years, we started having jitter and packet loss issues a few months ago, and the audio quality has continued to deteriorate, and there are now more frequent random call failures, apart from the above, there has been occasional one-way audio.

b.winter · ‎04-08-2024

I think you have to be careful of how you interpret this single high number. It's just a "maximum" number.
This can be caused by a single packet.
The average Jitter looks pretty good with 3ms. You shouldn't have more than 30ms.
If you would have problems in your network, then there would be a lot of Jitter for every packet and therefore, the average would go up as well.

"When the far-end attempted to transfer the call, it was lost." --> This indicates more of a signalling problem. Sure, the route cause of it could also be a bad network, but it could as well be something different (e.g. codec negotiation issue). You should check the signalling logs to get more insight.

"The call appeared to stay connected, but a separate Wireshark trace which was running in-between CME and the ITSP indicated that no RTP packets were being sent by the ITSP to the SIP trunk." --> Adding to the above, also looks like a signalling issue. One side thinks, the call has ended (and he is not sending any more packets), the other side thinks it's still active.

You shouldn't mix up different kinds of problems (signalling and voice quality issues), even tough, they might have the same root cause in the end. But without more "data" (logs, debugs), you never know.
For signalling, you can check logs.
For audio quality, you have to check the network. What's your connection towards ISP? Is it internet? If yes, then it's just best effort and if it works, it works, if not, it doesn't (that's the internet).
If you have an MPLS, or private connection, does it have QoS enabled? Are all relevant network devices marking / forwarding the voice packets in the correct QoS-queue? Is CME marking the packets correctly?
Are there any port negotiation errors (mismatch between speed and / or duplex mode)?

edwardforgacs · ‎04-09-2024

Thanks for your response. I agree with most of the comments.

While it may seem like it is conflating signalling and network issues, I guess what I am asking is whether the very high, invalid jitter numbers could be indicative of a particular type of signalling problem? (Even if it is not the same problem causing the other issues).

As a further note, it appears the very high jitter stats issue only comes up on calls which connect as G.711u. I have not seen a G.711a call (which is the default in Australia) with these invalid stats. The ITSP randomly seems to switch between G.711u and G.711a, while we accept G.711a, G.711u and G.722, which makes me wonder if there is a transcoding issue at the ITSP end.

b.winter · ‎04-09-2024

What more you wanna hear? There are no definitive "yes" or "no" answers.
Again, ONE SINGLE HIGH JITTER packet drives the "max. value" up. That's the nature of a max. value.
But you see it yourself, that the average jitter is just 3ms. If you would have a high average, you should be worried, but not with 3ms.
A high jitter indicated via the RTP stream could be an indicator for signalling issues, because network congestion would obviously also affected signalling traffic. But again, when you press on hold / transfer and the call gets disconnected, I would start looking at the signalling itself and not any network issues. E.g. if you have a codec issue, the call will drop. And this clearly has nothing to do with the network.

Without you doing the troubleshooting and delivering logs, debugs, traces, ... everything we discuss here is just theoretical. Just guessing, assuming, philosophizing about what could be or could not be.

edwardforgacs · ‎04-09-2024

Appreciate your response. So I think the answer to the question about a known interoperability issue causing invalid jitter stats is "no".

It's important that to distinguish between invalid vs high, as the 134177911 number in the example is over 37 hours, this is just not possible as it is (much) longer than the example call. It is interesting because it's reasonable to suspect it's indicative of other issues, even though as you say the 3ms average jitter is not a concern.

Just to clarify in relation to the call drop issue, it is not caused by hold or transfer at the CME/CUBE end. Sometimes when the far-end party tries to transfer the call, the call is lost. At this point, the traces we have show no SIP signalling traffic at all. RTP packets just stop from the ITSP, then it drops a couple of minutes later. So hopefully you can understand that it is hard to know how to troubleshoot this, other than pointing the finger at the ITSP.

b.winter · ‎04-10-2024

If the far end presses hold or transfer, and you don't see any signalling on your end, then the ITSP is probably consuming the signalling. Which is a valid option.

Maybe some explanations about "SIP message consumption". I don't know how "good" you are in SIP:
From your point of view (or the CME'S), your "opposite" is the ITSP, and not the far end (the other call party), and you send / receive the media stream to / from the ITSP (any equipment in the ITSP network). Likewise for the other party, but you will never have a direct connection between both endpoints.
So, when the other party initiates a transfer on hold, your part of the audio connection doesn't change, so there is no need to update the SIP session.
Only the part between the ITSP and the far end is updated.

Edit: And yes, if the audio drops during such scenarios, and you don't see any signalling, then you should talk to your ITSP.

edwardforgacs · ‎04-11-2024

In relation to the invalid jitter numbers, I can answer my own question. The ip udp checksum command on a dial peer appears to cause problems, not just causing the implausible numbers, but actually materially reducing the call quality.

I was able to replicate the issue on an internal connection, although the effect it had on the link to the ITSP was significantly worse.

@b.winter wrote:
From your point of view (or the CME'S), your "opposite" is the ITSP, and not the far end (the other call party), and you send / receive the media stream to / from the ITSP (any equipment in the ITSP network). Likewise for the other party, but you will never have a direct connection between both endpoints.

Edit: And yes, if the audio drops during such scenarios, and you don't see any signalling, then you should talk to your ITSP.

Yes, this is absolutely 100% understood. Since this post, we have discovered another related interop issue to the same far-end party, which we are discussing with the ITSP. We believe the far end (other call party) is using MS Teams Calling, and although it may seem unbelievable, it is suspected that the ITSP has issues with their own routing/uplink to this network, and is not demarcating the signalling properly.

If you're interested, the reason is that the workaround for the related issue was to send Require: 100rel. This affects the behaviour at the ITSP end, even though the ITSP does not support 100rel and does not send PRACK. In other regions, I would discount these sort of ridiculous theories, but unfortunately most SIP trunk providers in Australia are a complete mess and this sort of unreliable SIP trunk routing is something we've experienced several times before, with more than one provider.