08-13-2014 01:31 PM - edited 03-21-2019 10:18 AM
We have 2 SPA-112 units (running the latest 1.3.3(015) firmware) which are experiencing loss of registration on a regular basis. The web interface shows that the line is off-hook, and that it is registered. However, there is no dial tone on the line, and the SIP registrar (a Mitel 3300) shows that the line is out of service. The telltale sign of this problem is that the "Next Registration In:" field in the Voice tab is blank. Attached is the SIP debug log of an event - sometime during that window, the SPA lost registration.
Solved! Go to Solution.
09-04-2014 01:04 PM
I had a very long back-and-forth with Cisco small business support. They recommended the firmware which was released just a few days ago (1.3.5). I upgraded the firmware, and also applied a patch to the 3300 that Mitel released just for us. We have not experienced the problem since.
08-13-2014 06:36 PM
The attached debug log seems not to confirm your claiming. The important lines of them are:
--------------------------------------------
--------------------------------------------
According the log, there has been outgoing fax call from 11:14:50 to 11:15:55 then second outgoing fax call from 11:20:54 to 11:21:52. Both calls has been connected to caller. It mean that from 11:14:50 to 11:21:52 (e.g. almost complete debug window) the line has NOT been out of service. It has been in use.
OK, the attached log can't help us so much with the issue analysis.
Please catch the SIP packets between your ATA and Mitel. We need to see registration requests send and related responses from Mitel. I wish it will help to solve the problem.
08-14-2014 09:44 AM
08-20-2014 10:54 AM
08-20-2014 04:39 PM
There is no sucessfull registration catched in the file. SO I can't decide if registration has been lost prematurely or it has just expired.
Also, all files you supplied looks simmilar - they cover active T.38 fax call. No other type of call, no phone in idle state. Should I assume the issue manifests only during such type of call ? You didn't mention such condition ...
08-21-2014 01:30 PM
The SIP boxes in question only handle fax calls. At the beginning of the time represented in the trace, the box was registered. At the end, it was not registered, at least from the point of view of the registrar. I will do another trace.
08-21-2014 04:32 PM
It should start at REGISTER time (including the register itself) and should not end before the phone become unregistered AND active call has ended. It's not necesarry to catch the RTP ...
08-22-2014 08:33 AM
OK, I've got another capture in progress. The problem is that from the SPA's perspective, the call is still in progress. The web interface shows "off hook".
08-22-2014 10:22 AM
If SPA claim off-hook, then fax is holding the line off-hook. The fax need to hang up the line. It can't be done by SPA.
08-26-2014 06:33 AM
08-26-2014 10:18 AM
Lets analyze. Both lines has been registered successfully (packets #13-#20) with expiration set to 3600seconds.
Registration has been renewed about 3570s later (packets #21-#28).
The registration has been successfully renewed so many times then. Successful incoming as well as outgoing calls has occurred several times in meantime.
Important part seems to start at #544. Incoming call to 774201 is diverted to 628 because of "user busy". But previous call has been disconnected so long time ago (#388-#407). So why the 774201 is busy ? Syslog&debug packets are not included in the packet dump, but I assume the fax is holding the line off-hook with no apparent reason. Similar session occurred three times.
But it doesn't affect the registration so we can ignore it.
No interesting event happened up to packet #1001. A device with IP 10.10.0.24 seems to flood the SPA by suspicious (and possibly malformed) T.38 UDPTL packets.
Not registration is still not lost, so even this can be ignored.
Mos suspicious part is starting at packet #1231. External PBX is attempting to setup incomming call, but upstream PBX decided to cancel it in ringing phase. SPA doesn't respond to such attempt. It's not so good, but it should not harm - upstream PBX should consider the dialog expired.
From packet #1245 we see attempt to setup another incoming call - such requests are not responded from SPA's side. The device seems to be dead. Although previous registration is not expired yet, the device is in wrong state and may claim anything - including it's not registered.
Well, what's now ?
I see several suspicious event that may cause device's abend.
At the first, the device's configuration related to T.38 seems to be broken. T.38 reinvite is attempted during almost all calls, but they are unsuccessful. If you are wishing for T.38 fax mode, the issue needs to be solved. Ask your VoIP provider for help as we can't know why the T.38 is rejected by them. If no T.38 is supported then it should be deconfigured on SPA's side.
At the second, there is a flood of packets from .24 node. The source and purpose of such packets needs to be identified. It may confuse the SPA.
Well, if it will not help, there is "at the third" paragraph. Call cancelled during RINGing phase of incoming call is latest event before the SPA become frozen. It should not be so hard to verify it cause the problem or not. Just call the SPA, don't allow fax to go on-hook, cancel the call during ringing phase. If you confirm it freeze the SPA you discovered the firmware bug. Downgrade may help then.
Note - don't try "third" unless "first" and "second" issue become solved.
Hope it help ...
08-27-2014 06:47 AM
Dan, this is an utterly fantastic analysis.
I'm going to go through your advice bit by bit. One note: the .24 node is the on-site SIP proxy, which is a Mitel Border Gateway.
Thank you!
08-27-2014 07:40 AM
It required about two hours of analysis (more detailed in first version, but I deleted unrelated parts then), so it SHOULD be considered useful ;-)
Start with T.38 - just try not to use it even in the case you wish to use it. You may reactivate it later.
Note the SPA112 is not state-of-the-art device even with the latest firmware. The abandoned PAP2T seems to be still more reliable and stable than SPA112. So try not to push the SPA112 to the edges.
And rate useful responses ;-)
08-27-2014 12:23 PM
For interest's sake, here's the SIP log from the device at the time of packet #1231 and #1245:
Aug 25 09:41:09 10.10.0.41 ################# CC: running... ############
Aug 25 09:41:53 10.10.0.40 ################# CC: running... ############
Aug 25 09:46:09 10.10.0.41 ################# CC: running... ############
Aug 25 09:46:52 10.10.0.40 ################# CC: running... ############
Aug 25 09:47:39 10.10.0.40 CC_eventProc(), event: CC_EV_SIG_CALL_ARRIVED(0x31), lid: 1, par: 0, par2: 0x407b9900
Aug 25 09:47:39 10.10.0.40 AUD_ccEventProc: event 49 vid 1 par 0x0 par2 0x407b9900
Aug 25 09:47:39 10.10.0.40 CC_lineBusy()
Aug 25 09:47:39 10.10.0.40 getCodecList line 0x22cc04 call 0x22cc08 clRemote: 0x0 bInbound 1
Aug 25 09:47:39 10.10.0.40 pconly: 0
Aug 25 09:47:39 10.10.0.40 ====== Local codec num 4 ======
Aug 25 09:47:39 10.10.0.40 0
Aug 25 09:47:40 10.10.0.40 130
Aug 25 09:47:40 10.10.0.40 8
Aug 25 09:47:40 10.10.0.40 18
Aug 25 09:47:40 10.10.0.40 ================================
Aug 25 09:47:40 10.10.0.40 [AUD]Get UCH node for AUD_LINE 1 0.
Aug 25 09:47:40 10.10.0.40 uchAllocateNode(), Node 0 allocated ret=0
Aug 25 09:47:41 10.10.0.40 [AUD]UCH node 0 allocated to AUD_LINE 1.
Aug 25 09:47:41 10.10.0.40 uchConnectEpToNode(), connecting EP FXS 2 to node 0
Aug 25 09:47:41 10.10.0.40 uchEnableNode(), Node 0 enbaled ret=0
Aug 25 09:47:41 10.10.0.40 CC_eventProc(), inf.strName = ONT GOVT
Aug 25 09:47:41 10.10.0.40 CC_eventProc(), inf.strPhone = 4163142695
Aug 25 09:47:41 10.10.0.40 callEventProcTable[0] is cepIdleProc
Aug 25 09:47:41 10.10.0.40 cepIdleProc(lid=1, call=0x22cc08, event=18(CC_EV_USR_ACCEPTCALL), par=0, par2=0x407b9900)
Aug 25 09:47:42 10.10.0.40 cepIdleProc(), lid=1
Aug 25 09:47:42 10.10.0.40 cepIdleProc(), line->sigProc(CC_CMD_ACCEPT)
Aug 25 09:47:42 10.10.0.40 cepIdleProc(), call->cinf.bAutoAnswer = 0
Aug 25 09:47:42 10.10.0.40 NEW_CALL_STATE(), call 0: old state = CC_CST_IDLE, new state CC_CST_RINGING
Aug 25 09:47:42 10.10.0.40 CID_initGen 8
Aug 25 09:47:42 10.10.0.40 [1]CID:CID_initGen() >>> offhook 0 delay 2200 phone 4163142695 name ONT GOVT
Aug 25 09:47:42 10.10.0.40 SLIC_startRing state 0 ts 0x25b18con 2000 off 4000 len 60000
Aug 25 09:47:43 10.10.0.40 [1]Ring cad event 0 pol 0
Aug 25 09:47:43 10.10.0.40 RTP_nextMediaPort(), port = 16476
Aug 25 09:47:43 10.10.0.40 RTP_nextMediaPort(), rc=16474
Aug 25 09:47:43 10.10.0.40 AUD_allocCallObj() call(0x277360)
Aug 25 09:47:43 10.10.0.40 [1:0]AUD ALLOC CALL (port=16474)
Aug 25 09:47:43 10.10.0.40 +++++ SIP_lineCcCmdProc CC_CMD_ACCEPT AUD_startRtpRx
Aug 25 09:47:43 10.10.0.40 [AUD]AUD_startRtpRx(0x277360) lid 1
Aug 25 09:47:44 10.10.0.40 Local loopback mode: None. Type: None.
Aug 25 09:47:44 10.10.0.40 Remote loopback mode: None. Type None.
Aug 25 09:47:44 10.10.0.40 RTP channel setup: udp_no_checksum 0, sysmmetric_rtp 0, tos 0xb8, cos 6, mlb 0.
Aug 25 09:47:44 10.10.0.40 uchConnectEpToNode(), connecting EP VoIP 0 to node 0
Aug 25 09:47:44 10.10.0.40 uchEnableEchoCan(), lid 1 EP 1 enable
Aug 25 09:47:44 10.10.0.40 uchEnableModemCall() Modem call state(0) not change
Aug 25 09:47:44 10.10.0.40 UCH sync parameter hold off time is 70
Aug 25 09:47:45 10.10.0.40 Set QoS succeed
Aug 25 09:47:45 10.10.0.40 uchSetGTD(), Disable GTD for FXS 2
Aug 25 09:47:45 10.10.0.40 uchSetGTD() GTD state not change
Aug 25 09:47:45 10.10.0.40 uchSetDTMFMute(), ENABLE
Aug 25 09:47:45 10.10.0.40 cordless_start_rtp(), chan:0 remote ip:(null) port:0 local:16474 rx:1 ipt:0 ptime:30 bInMdmPasstru:0
Aug 25 09:47:45 10.10.0.40 Starting Rx only RTP.
Aug 25 09:47:45 10.10.0.40 Socket 0 bound to port 16474.
Aug 25 09:47:46 10.10.0.40 Remote IP/port: 0.0.0.0:0
Aug 25 09:47:46 10.10.0.40 set RTP_SESSION_OPT_DTMF
Aug 25 09:47:46 10.10.0.40 VAD = 0
Aug 25 09:47:46 10.10.0.40 RTP configuration:
Aug 25 09:47:46 10.10.0.40 audio_mode RTP_MODE_REC_ONLY, media_loop_level RTP_LOOP_LEVEL_NONE, dtmf2833numEndPakcets 3, opts 0x0
Aug 25 09:47:46 10.10.0.40 Codec: duration 30, rx_pt_event 101, tx_pt_event -1, tx_pt 0
Aug 25 09:47:47 10.10.0.40 rx[0] 0 PCMU/8000, rx[1] 8 PCMA/8000, rx[2] 2 G.726/8000
Aug 25 09:47:47 10.10.0.40 rx[3] 18 G.729/8000, rx[4] 9 G.722/16000, rx[5] 0
Aug 25 09:47:47 10.10.0.40 Jib: max 180ms, min 60ms, adapt 1
Aug 25 09:47:47 10.10.0.40 RTP Channel 0 is virgin: 1.
Aug 25 09:47:47 10.10.0.40 #### rtp seq number is 5067
Aug 25 09:47:47 10.10.0.40 RTP session 0 started
Aug 25 09:47:47 10.10.0.40 [AUD]RTP Rx Up
Aug 25 09:51:09 10.10.0.41 ################# CC: running... ############
Aug 25 09:56:09 10.10.0.41 ################# CC: running... ############
Aug 25 10:01:09 10.10.0.41 ################# CC: running... ############
08-28-2014 06:14 AM
It conform with analysis results.
We can see the Call Control is aware of incoming call (CC_EV_SIG_CALL_ARRIVED) , call has entered RINGING Call STate (CC_CST_RINGING ). Also, rx-only early-audio channel has been established (Starting Rx only RTP and bellow).
There is nothing related to neither CANCEL nor INVITE ...
Unfortunately, there is nothing that can help us with identification of device freezing reason as well - so no new things nor advices will follow ;-)
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide