On a customer site, we have a telepresence infrastructure:
- VCS-C 7.2.2
- MCU 4501
- C20 and EX90 endpoints in the LAN and remote sites (~30 endpoints) all running FW TC6.2.0
The endpoints are registered in the VCS-C using SIP protocol only, no encryption.
We are experiencing Presentation problem in undefined conditions. It looks random even if it's not actually, a real cause exists we just don't know it yet.
In a typical scenario it's possible to reproduce the problem, but not with the same timers and not all the times.
A local C20 connects to the MCU and other remote participants C20 or EX90 connect to the same MCU conference.
Then, the local C20 (same behavior with a Movi client) starts a Presentation sharing. The conference runs for 2-3h approximately.
After a certain amount of time, the presentation is lost on the remote endpoints. Last time I was connected to the admin webpage of the remote endpoints (2x EX90 this time) and was able to monitor the statistics of the calls. I saw packet statistics for audio, video and presentation streams. When the problem occured, I saw the presentation stats disappeared and the thumbnail as well on the webpage. I looked on the different endpoints and one by one they lost the presentation, 2 or 4 minutes one after the other.
With a network capture tool, I validated that the C20 was still sending the Presentation stream, which was received by the MCU and the MCU continued to send the stream as well to the remote endpoints. It's just that they did not display it.
I triggered a stop of the presentatino sharing from the C20 and then a start, but this didn't help and the presentation never came back although the RTP streams were still active in a network point of view.
The only way to get the presentation active again was to disconnect the remote endpoint from the MCU conference and then join it back.
I collected the local logs from the TC endpoints, as well as some traces on the VCS and MCU.
In the logs of the remote endpoints, I see an interesting message when the presentation stopped:
MainEvents I: PresentationStopped() remote p=2 cause=[remoteVideoLoss]
and at the exact same time:
RTP W: RtpRxMedia_processRtpPacket(sid=3, strm=3) unable to find matching payload info, pt= 96, 1st consequtive packet
This last message continues until the end of the conference, proving that RTP stream was still sent by the MCU. For me, it means that the endpoint was receiving RTP packets but was not able to process it.
My first statement was that probably a network disconnection occured, global or just for the presentation stream. But the remote network statistics showed me that the RTP stream was still received by the endpoints.
But I had to validate the behavior of the EX90 when they stop receiving the packets.
I recreated the scenario with only one remote endpoint to the MCU and a presenter to the MCU as well.
When I blocked the RTP stream from the MCU to the remote endpoint, the presentation disappeared as expected, with the same message in the logs (remotevideoloss). But, 1 minute later when I removed the ACL, the presentation showed up back automatically.
My conclusion is that the endpoints act well when the stream is momentarily not received and recover well.
So I strongly think about another root cause than a network impairment, especially because the WAN lines are really good and the monitoring does not report any performance problem...
Did you have already experienced this kind of problem?
Thank you for your help!
Check if there is any ALG or or other things which tamper with the sip signaling, media or ip timeouts.
Also using SIP-TLS and might be interesting to try.
If you have the chance to set up endpoints, vcs and mcu in the same subnet it might give you a hint
if this is a network or a application issue.
I would recommend to try to upgrade at least the endpoints and the MCU to the latest versions first.
Also doing a wireshark dump on the endpoint and upfront the MCU (mirror port) might be handy to see.
Then upgrading the VCS might be worth a try, as well as to register the MCU with h323 and let the
VCS do the interworking.
How are the MCU settings regards to presentation?
Which bandwidth and resolution are you sending the presentation, how is it received?
Any issues doing point to point presentations, ...?
Is there any difference if you use a different resolution or a different source to send the signal?
You could also try to use h323 to see if
Please remember to rate helpful responses and identify
Thanks for the ideas!
Monday we spent a full-day trying to reproduce the issue and hopefully, we think we found the root cause of the problem.
To make it short, it seems that in the TC6.2+ or TC.7.0, the implementation of the BFCP protocol is bogus.
I'll not describe the steps of our troubleshooting but this is what we figured out:
- in TC6 or TC7, BFCP is over UDP not TCP
- when used over unreliable connectivity, it's mandatory to implement recovery mechanisms like retransmissions of status messages
- during a content sharing, the viewer sends "Hello" messages every 10 seconds
- the presenter endpoint replies with "HelloAck" messages
- if the HelloAck message is not received, the Hello message is sent again, 10 times.
- after 11 Hello without HelloAck, the presentation is stopped to be displayed on the viewer participant
The problem is: it seems the counter of retries is not reset to 0 after a successful reception of HelloAck.
Let say you have a 4hours session over a WAN link with random packet losses. If you have no luck and you lost 11 random Hello or HelloAck messages along the whole session (not 11 times in a row), the presentation will be stopped and never comes back until you disconnect/reconnect the endpoint. According to what we see, the counter of retries never comes to 0 even if new exchanges of Hello messages are successful between the lost packets.
I have a TAC case open for this and we are waiting for the feedback of the devs.