cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2102
Views
0
Helpful
3
Replies

Unstable SPA50x with 7.5.4 firmware, remote reset to factory default

Dan Lukes
VIP Alumni
VIP Alumni

We have branch with about 180 phones. Mostly SPA504G, few SPA508G. Identical configuration of all of them with exception of SIP account name and password and button label (device's phone number).

For some reason I triggered (using SIP NOTIFY) cold restart of all of them. I did it in deep night, no phone has been in use. Most of phones restarted with no problem. But 48 phones didn't. They become lost in neverending reboot loop. Just for completeness I attached complete syslog from one of them on bottom, but "why it did it" is not the question of the day.

The phone can be recovered by "reset to factory default" only. It's not problem as we have zero-touch environment, so even phone in factory default configuration will configure self within minutes.

The question of the day is: how I can order the phone to reset to factory default remotely ? Phones are accessible for short time between reboots (they even succesfully register to exchange). I wish it will not be necesarry to walk personally from phone to phone. In advance, we have other branches in other cities and also several foreign branches. I need to cold restart phones on those branches as well, but  I hesitate some phones will enter reboot loop as well.

Any advice will be aprecitated.

Sep  1 23:24:15 gateway ip:     10.xx.yy.1

Sep  1 23:24:15 IDBG: LS, 270-4d8

Sep  1 23:24:15 IDBG: SOK

Sep  1 23:24:15 IDBG: st-0

Sep  1 23:24:15 Dict_D> Store pDictTftp->prev_dict_enable: 2

Sep  1 23:24:15 Dict_N> !!!OK EXIT TFTP main process

Sep  1 23:24:15 Resolving 10.xx.zz.1

Sep  1 23:24:15 [BKpic]Loading text background image

Sep  1 23:24:15 [BKpic]Loading text background image

Sep  1 23:24:15 [0]Reg Addr Change(0) 0:0->a....01:5060

Sep  1 23:24:15 [0]Reg Addr Change(0) 0:0->a....01:5060

Sep  1 23:24:15 [0]RegOK. NextReg in 600 (0)

Sep  1 23:24:15 Dict_N> DICT_loadFromFlash eng dictionary ok, paylen = 52754

Sep  1 23:24:16 Dict_N> DICT_loadFromFlash non-eng dictionary ok, paylen = 57154

Sep  1 23:24:16 Dict_N> DICT feature is enabled!

Sep  1 23:24:16 fu:0:0be63, 7.29 1

Sep  1 23:24:16 [0]SubOK. NextSub in 599 (1)

Sep  1 23:24:16 [0]SubOK. NextSub in 599 (1)

Sep  1 23:24:18 fs:042600:042752:131072

Sep  1 23:24:18 fls:fuuuuuuaff:13:3120:127248

Sep  1 23:24:18 fbr:0:3000:3000:0be12:000e:000d:7.5.4

Sep  1 23:24:18 fhs:01:0:0001:upg:app:M:7.4.3a

Sep  1 23:24:18 fhs:02:0:0002:upg:app:0:7.4.8a

Sep  1 23:24:18 fhs:03:0:0003:upg:app:1:7.4.8a

Sep  1 23:24:18 fhs:04:0:0004:upg:app:2:7.4.8a

Sep  1 23:24:18 fhs:05:0:0005:upg:app:0:7.4.9a

Sep  1 23:24:18 fhs:06:0:0006:upg:app:1:7.4.9a

Sep  1 23:24:18 fhs:07:0:0007:upg:app:2:7.4.9a

Sep  1 23:24:18 fhs:08:0:0008:upg:app:0:7.4.9c

Sep  1 23:24:18 fhs:09:0:0009:upg:app:1:7.4.9c

Sep  1 23:24:18 fhs:0a:0:000a:upg:app:2:7.4.9c

Sep  1 23:24:18 fhs:0b:0:000b:upg:app:0:7.5.2b

Sep  1 23:24:18 fhs:0c:0:000c:upg:app:1:7.5.2b

Sep  1 23:24:18 fhs:0d:0:000d:upg:app:2:7.5.2b

Sep  1 23:24:18 fhs:0e:0:000e:upg:app:0:7.5.4

Sep  1 23:24:18 fhs:0f:0:000f:upg:app:1:7.5.4

Sep  1 23:24:18 fhs:10:0:0010:upg:app:2:7.5.4

Sep  1 23:24:18 dhcp opt 66: ""

Sep  1 23:24:19 fu:0:0be7d, 5.1.1 1

Sep  1 23:24:23 resync rule:  https://karlin-provisioning.xxxxx.cz/Cisco/Provisioning.php?MAC=c89c1d6d....;PSN=504G;Product=SPA504G;Serial=CBT150805..;SW=7.5.4;HW=1.0.2(0001);CERT=Installed;IP=10.xx.yy.147;EXTIP=;PRVST=0;EMS=;MUID=EMU;GPP_O=CZ;GPP_P=20130708T113333CEST

Sep  1 23:24:23 resync rule:  https://karlin-provisioning.xxxxx.cz/Cisco/Provisioning.php?MAC=c89c1d6d....;PSN=504G;Product=SPA504G;Serial=CBT150805..;SW=7.5.4;HW=1.0.2(0001);CERT=Installed;IP=10.xx.yy.147;EXTIP=;PRVST=0;EMS=;MUID=EMU;GPP_O=CZ;GPP_P=20130708T113333CEST

Sep  1 23:24:23 ++ j=0 sip=c3....92

Sep  1 23:24:23 ++ j=0 sip=c3....92

Sep  1 23:24:23 fs:042600:042752:131072:198019807999

Sep  1 23:24:23 pbs 230912

Sep  1 23:24:23 SPA504G c8:9c:1d:6d:..:.. -- Requesting resync https://195.xx.yy.146:443/Cisco/Provisioning.php?MAC=c89c1d6d....;PSN=504G;Product=SPA504G;Serial=CBT150805..;SW=7.5.4;HW=1.0.2(0001);CERT=Installed;IP=10.xx.yy.147;EXTIP=;PRVST=0;EMS=;MUID=EMU;GPP_O=CZ;GPP_P=20130708T113333CEST

Sep  1 23:24:23 SPA504G c8:9c:1d:6d:..:.. -- Requesting resync https://195.xx.yy.146:443/Cisco/Provisioning.php?MAC=c89c1d6d....;PSN=504G;Product=SPA504G;Serial=CBT150805..;SW=7.5.4;HW=1.0.2(0001);CERT=Installed;IP=10.xx.yy.147;EXTIP=;PRVST=0;EMS=;MUID=EMU;GPP_O=CZ;GPP_P=20130708T113333CEST Sep  1 23:24:23 FMM >>>> Requesting profile

Sep  1 23:24:23 request reboot type=4 reason=System 4(8000)

Sep  1 23:24:23 request reboot type=4 reason=System 4(8000)

Sep  1 23:24:23 [0]SubOK. NextSub in 1 (1)

Sep  1 23:24:23 [0]SubOK. NextSub in 1 (1)

Sep  1 23:24:23 [0]UnRegOK

Sep  1 23:24:24 [0]SUBS:TMO w/f NOTIFY 19

Sep  1 23:24:24 [0]SUBS:TMO w/f NOTIFY 19

Sep  1 23:24:32 fu:0:0bfda, 4.71 5.1.10 5.1.12 5.1.14 5.1.19 6.8 7.13 1

And on Sep  1 23:25:20 the same will start again. Repeat ad nauseam.

Note that log claim "Requesting profile" but phone sent no packet (even no TCP SYN) so the content of profile can't cause the abend. I suspect that problem is related to TCP stack or OpenSSL library, but it's not topic of this thread. Now I need to found how to recover those phones at the first.

3 Replies 3

nseto
Level 6
Level 6

There's no factory reset that can be done remotely.  I inquired with dev and that reboot reason is a phone crash.  Does the issue also happen with 7.5.5?  Since 48 of the 180 phones are doing this, what's different in the network setup, if any?

There's no factory reset that can be done remotely

There has been one, but it no longer works.

Of course, such change is not documented anywhere nor I received an answer to question related to it. See

Does the issue also happen with 7.5.5?

It's not easy to say. I can't trigger the problem in my lab even for 7.5.4. I can reproduce it only on large networks triggering reload of all devices. As some devices may become bricked and can't be recovered remotely, I can't test it so much.

The issue I described here is new for me. More often I hit other restart-related problem - device will not enter "never ending loop" but just freeze. Fortunately, it can be solved by power cycle (no problem with PoE everywhere).

I restarted (SIP NOTIFY) about 780 phones in three other locations within past two weeks and about 100 failed to start requiring power cycle (but only 3 devices ends with neverending loop). "Freezing" problem is known to us, we hit it on every upgrade I remember (e.g. from 7.4.7). But it can be solved, so I consider it anoying but not severe.

It's unsolvable booting loop that I hesitate about. All four branches I restarted are located in Prague, so I can solve problems easily. And they are our own branches. But now I should restart and upgrade branches of other companies in Poland, Germany, Italy, France, Hungary and Belgium ...

Since 48 of the 180 phones are doing this, what's different in the network setup, if any?

Good question. As far as I know, there is no difference in network setup nor phone configuration in all our branches. The only difference between this branch and three other is - in the first cases I restarted all 180 devices at once. On other three branches I sent command in smaller batches, about 30-40 devices at once, then few seconds pause, then next batch.

I assume it's a kind of race condition so it will not be easy to debug it ...

And even worse, it may be tied to particular configuration (I assume it as no other complained about it here) ...

Thank you for your support.

Just for completeness - CiscoIPPhoneExecute problem is caused by broken CRLF handling and there is workaround. In advance, reason for perpetual reboot loop seems to be the one described in - so it can be solved as well.

The only remaining problem is - phone freeze sometime instead of upgrade, but it can be solved by power cycle which is easy with POE (and all at all it's better to initiate upgrade using power cycle instead of software method)

It can't be considered ideal, but every problem has either workaround or can be solved remotely - so it can be considered acceptable...