02-03-2015 01:23 PM
We are in the process of troubleshooting disk mirroring on our RSP440-TR's. Every time we enable, the RSP crashes.
Detailed information on the three (3) scenarios and log results are listed below.
I'm curious if disk mirroring is necessary on our ASR9k's. I've read various posts that reference that there's no purpose to disk mirroring at all along with the following bug CSCus34978 where it references the RSP going corrupt after enabling disk mirroring.
https://tools.cisco.com/bugsearch/bug/CSCus34978
Any input from Xander & team would be much appreciated
Thanks!
-ae
######
Scenario 1
######
XR 4.3.4
1) install and activate all packages and SMUs
2) mirror location 0/RSP0/CPU0 disk0:disk1:
**Result - RSP CRASH
######
Scenario 2
######
XR 4.3.4
1) install only the mini PIE with TurboBoot
2) format disk1:
3) mirror location 0/RSP0/CPU0 disk0:disk1:
**Result - RSP CRASH
######
Scenario 3
######
XR 5.2.2
1) install and activate all packages and SMUs
2) mirror location 0/RSP0/CPU0 disk0:disk1:
**Result - RSP CRASH
######
Scenario 1 - LOG
######
RP/0/RSP0/CPU0:LAB-01(config)#mirror location 0/RSP0/CPU0 disk0:disk1:
RP/0/RSP0/CPU0:LAB-01(config)#commit
Mon Feb 2 09:47:19.850 UTC
RP/0/RSP0/CPU0:Feb 2 09:47:20.023 UTC: config[65742]: %MGBL-CONFIG-3-FILESAVE_ERROR : The OS returned an error while SysDb was attempting to perform a file save operation. Any changes have been rolled back. Error: ''sysdb' detected the 'warning' condition 'The OS returned an error while attempting to access a file': No such file or directory'
% Failed to commit .. As an error (Unknown) encountered during commit operation. Changes may not have been committed:
'sysdb' detected the 'warning' condition 'The OS returned an error while attempting to access a file': No such file or directory
RP/0/RSP0/CPU0:LAB-01(config)#RP/0/RSP0/CPU0:Feb 2 09:47:21.005 UTC: dumper[60]: %OS-DUMPER-4-SIGSEGV : Thread 1 received SIGSEGV - Segmentation Fault
RP/0/RSP0/CPU0:Feb 2 09:47:21.005 UTC: dumper[60]: %OS-DUMPER-4-SIGSEGV_INFO : Accessed BadAddr 0x108 at PC 0x420883b. Signal code 1 - SEGV_MAPPER. Address not mapped.
RP/0/RSP0/CPU0:Feb 2 09:47:21.005 UTC: dumper[60]: %OS-DUMPER-4-CRASH_INFO : Crashed pid = 36886 (pkg/bin/redfs_svr)
RP/0/RSP0/CPU0:Feb 2 09:47:21.006 UTC: dumper[60]: %OS-DUMPER-6-FALLBACK_CHOICE : Fall back choice: 0(harddisk:/dumper) in use
RP/0/RSP0/CPU0:Feb 2 09:47:21.006 UTC: dumper[60]: %OS-DUMPER-6-DEFAULT_PATH_TMP : Setting default path to /tmp/dumper for core size less than 20971520. Core dump size 2613248, free memory size 10122571776.: default path for core dump
RP/0/RSP0/CPU0:Feb 2 09:47:21.007 UTC: dumper[60]: %OS-DUMPER-6-REGISTERS_INFO : EDI ESI EBP(fp) EXX
RP/0/RSP0/CPU0:Feb 2 09:47:21.007 UTC: dumper[60]: %OS-DUMPER-6-REGISTERS_INFO : R0 00000000 10023c34 041ff200 fdfd5cd0
RP/0/RSP0/CPU0:Feb 2 09:47:21.007 UTC: dumper[60]: %OS-DUMPER-6-REGISTERS_INFO : EBX EDX ECX EAX
RP/0/RSP0/CPU0:Feb 2 09:47:21.007 UTC: dumper[60]: %OS-DUMPER-6-REGISTERS_INFO : R4 00000000 0802417a 041ff150 042238b0
RP/0/RSP0/CPU0:Feb 2 09:47:21.007 UTC: dumper[60]: %OS-DUMPER-6-REGISTERS_INFO : EIP(pc) CS EFL ESP
RP/0/RSP0/CPU0:Feb 2 09:47:21.007 UTC: dumper[60]: %OS-DUMPER-6-REGISTERS_INFO : R8 0420883b 000000f3 00011202 041ff1f4
RP/0/RSP0/CPU0:Feb 2 09:47:21.007 UTC: dumper[60]: %OS-DUMPER-6-REGISTERS_INFO : SS
RP/0/RSP0/CPU0:Feb 2 09:47:21.007 UTC: dumper[60]: %OS-DUMPER-6-REGISTERS_INFO : R12 000000fb
RP/0/RSP0/CPU0:Feb 2 09:47:21.007 UTC: dumper[60]: %OS-DUMPER-6-TRACE_BACK : #0 0x420883b
RP/0/RSP0/CPU0:Feb 2 09:47:21.008 UTC: dumper[60]: %OS-DUMPER-6-TRACE_BACK : #1 0x420cee8
RP/0/RSP0/CPU0:Feb 2 09:47:21.008 UTC: dumper[60]: %OS-DUMPER-6-TRACE_BACK : #2 0x420fa42
RP/0/RSP0/CPU0:Feb 2 09:47:21.008 UTC: dumper[60]: %OS-DUMPER-6-TRACE_BACK : #3 0x82476ae
RP/0/RSP0/CPU0:Feb 2 09:47:21.008 UTC: dumper[60]: %OS-DUMPER-6-TRACE_BACK : #4 0x8247656
RP/0/RSP0/CPU0:Feb 2 09:47:21.008 UTC: dumper[60]: %OS-DUMPER-6-TRACE_BACK : #5 0x8244737
RP/0/RSP0/CPU0:Feb 2 09:47:21.008 UTC: dumper[60]: %OS-DUMPER-6-TRACE_BACK : #6 0x421000d
RP/0/RSP0/CPU0:Feb 2 09:47:21.008 UTC: dumper[60]: %OS-DUMPER-6-TRACE_BACK : #7 0x4200290
RP/0/RSP0/CPU0:Feb 2 09:47:21.011 UTC: dumper[60]: %OS-DUMPER-6-FALLBACK_CHOICE : Fall back choice: tmp(/tmp/dumper) in use
RP/0/RSP0/CPU0:Feb 2 09:47:21.012 UTC: dumper[60]: %OS-DUMPER-5-CORE_FILE_NAME : Core for process pkg/bin/redfs_svr at /tmp/dumper/first.redfs_svr_88.20150202-094721.node0_RSP0_CPU0.x86.Z on local_node
RP/0/RSP0/CPU0:Feb 2 09:47:21.110 UTC: redfs_config[388]: %MEDIA-REDFSAPI-4-MSG_SEND_FAILED : Unable to send message to redfs server, op 9, errno=3:No such process. : pkg/bin/redfs_config : (PID=241860) : -Traceback= b6d1780 b6d094a 42000a1 8eba427 8ebc701 82465d7 8244724 4200700 4200038
RP/0/RSP0/CPU0:Feb 2 09:47:21.134 UTC: dumper[60]: %OS-DUMPER-6-CORE_MD5 : /tmp/dumper/first.redfs_svr_88.20150202-094721.node0_RSP0_CPU0.x86.Z has just been written. size:1132202 md5:d089e0c7bfea5c744ae2e4c63ed960c4
RP/0/RSP0/CPU0:Feb 2 09:47:21.153 UTC: dumper[60]: %OS-DUMPER-5-DUMP_SUCCESS : Core dump success
RP/0/RSP0/CPU0:Feb 2 09:47:21.174 UTC: dumper[60]: %OS-DUMPER-6-CORE_MD5 : /tmp/dumper/first.redfs_svr_88.20150202-094721.node0_RSP0_CPU0.x86.Z before transfer. size:1132202 md5:d089e0c7bfea5c744ae2e4c63ed960c4
RP/0/RSP0/CPU0:Feb 2 09:47:21.174 UTC: dumper[60]: %OS-DUMPER-6-FALLBACK_CHOICE : Fall back choice: 4(dumper_harddisk:/dumper) in use
RP/0/RSP0/CPU0:Feb 2 09:47:23.422 UTC: dumper[60]: %OS-DUMPER-5-CORE_TRANSFER_STATUS : Transfer of Core file from /tmp/dumper/first.redfs_svr_88.20150202-094721.node0_RSP0_CPU0.x86.Z to harddisk:/dumper/first.redfs_svr_88.20150202-094721.node0_RSP0_CPU0.x86.Z on local_node suceeded.
RP/0/RSP0/CPU0:Feb 2 09:47:23.434 UTC: dumper[60]: %OS-DUMPER-6-CORE_MD5 : harddisk:/dumper/first.redfs_svr_88.20150202-094721.node0_RSP0_CPU0.x86.Z after transfer. size:1132202 md5:d089e0c7bfea5c744ae2e4c63ed960c4
######
Scenario 2 - LOG
######
RP/0/RSP0/CPU0:LAB-01(config)#mirror location 0/RSP0/CPU0 disk0:disk1:
RP/0/RSP0/CPU0:LAB-01(config)#commit
Tue Feb 3 10:17:33.043 UTC
RP/0/RSP0/CPU0:Feb 3 10:17:33.116 UTC: config[65762]: %MGBL-CONFIG-3-FILESAVE_ERROR : The OS returned an error while SysDb was attempting to perform a file save operation. Any changes have been rolled back. Error: ''sysdb' detected the 'warning' condition 'The OS returned an error while attempting to access a file': No such file or directory'
% Failed to commit .. As an error (Unknown) encountered during commit operation. Changes may not have been committed:
'sysdb' detected the 'warning' condition 'The OS returned an error while attempting to access a file': No such file or directory
RP/0/RSP0/CPU0:LAB-01(config)#File name: first.redfs_svr_88.20150203-101734.node0_RSP0_CPU0.x86
RP/0/RSP0/CPU0:Feb 3 10:17:34.103 UTC: dumper[60]: %OS-DUMPER-4-SIGSEGV : Thread 1 received SIGSEGV - Segmentation Fault
RP/0/RSP0/CPU0:Feb 3 10:17:34.103 UTC: dumper[60]: %OS-DUMPER-4-SIGSEGV_INFO : Accessed BadAddr 0x108 at PC 0x42086b9. Signal code 1 - SEGV_MAPPER. Address not mapped.
RP/0/RSP0/CPU0:Feb 3 10:17:34.103 UTC: dumper[60]: %OS-DUMPER-4-CRASH_INFO : Crashed pid = 36885 (pkg/bin/redfs_svr)
RP/0/RSP0/CPU0:Feb 3 10:17:34.104 UTC: dumper[60]: %OS-DUMPER-6-FALLBACK_CHOICE : Fall back choice: 0(harddisk:/dumper) in use
RP/0/RSP0/CPU0:Feb 3 10:17:34.104 UTC: dumper[60]: %OS-DUMPER-6-DEFAULT_PATH_TMP : Setting default path to /tmp/dumper for core size less than 20971520. Core dump size 3104768, free memory size 11032899584.
RP/0/RSP0/CPU0:Feb 3 10:17:34.108 UTC: dumper[60]: %OS-DUMPER-6-FALLBACK_CHOICE : Fall back choice: tmp(/tmp/dumper) in use
RP/0/RSP0/CPU0:Feb 3 10:17:34.109 UTC: dumper[60]: %OS-DUMPER-5-CORE_FILE_NAME : Core for process pkg/bin/redfs_svr at /tmp/dumper/first.redfs_svr_88.20150203-101734.node0_RSP0_CPU0.x86.Z on local_node
RP/0/RSP0/CPU0:Feb 3 10:17:34.213 UTC: sysmgr[94]: redfs_svr(1) (jid 88) (pid 36885) (fail_count 1) abnormally terminated, restart scheduled
RP/0/RSP0/CPU0:Feb 3 10:17:34.214 UTC: redfs_config[381]: %MEDIA-REDFSAPI-4-MSG_SEND_FAILED : Unable to send message to redfs server, op 9, errno=3:No such process. : pkg/bin/redfs_config : (PID=266423) : -Traceback= af9b7a4 af9a96e 42000a1 8e7144b 8e73274 822605f 82242b2 4200700 4200038
RP/0/RSP0/CPU0:Feb 3 10:17:34.251 UTC: dumper[60]: %OS-DUMPER-6-CORE_MD5 : /tmp/dumper/first.redfs_svr_88.20150203-101734.node0_RSP0_CPU0.x86.Z has just been written. size:1194760 md5:442035043a30db3efc69392affcdca80
RP/0/RSP0/CPU0:Feb 3 10:17:34.272 UTC: dumper[60]: %OS-DUMPER-5-DUMP_SUCCESS : Core dump success
RP/0/RSP0/CPU0:Feb 3 10:17:34.294 UTC: dumper[60]: %OS-DUMPER-6-CORE_MD5 : /tmp/dumper/first.redfs_svr_88.20150203-101734.node0_RSP0_CPU0.x86.Z before transfer. size:1194760 md5:442035043a30db3efc69392affcdca80
RP/0/RSP0/CPU0:Feb 3 10:17:34.295 UTC: dumper[60]: %OS-DUMPER-6-FALLBACK_CHOICE : Fall back choice: 4(dumper_harddisk:/dumper) in use
RP/0/RSP0/CPU0:Feb 3 10:17:36.913 UTC: dumper[60]: %OS-DUMPER-5-CORE_TRANSFER_STATUS : Transfer of Core file from /tmp/dumper/first.redfs_svr_88.20150203-101734.node0_RSP0_CPU0.x86.Z to harddisk:/dumper/first.redfs_svr_88.20150203-101734.node0_RSP0_CPU0.x86.Z on local_node suceeded.
RP/0/RSP0/CPU0:Feb 3 10:17:36.926 UTC: dumper[60]: %OS-DUMPER-6-CORE_MD5 : harddisk:/dumper/first.redfs_svr_88.20150203-101734.node0_RSP0_CPU0.x86.Z after transfer. size:1194760 md5:442035043a30db3efc69392affcdca80
######
Scenario 3 - LOG
######
RP/0/RSP0/CPU0:LAB-01#format disk1:
Tue Feb 3 11:35:48.676 UTC
Format will destroy all data on "disk1:". Continue? [confirm]
Device partition disk1: is now formated and is available for use.
RP/0/RSP0/CPU0:LAB-01(config)#mirror location 0/RSP0/CPU0 disk0:disk1:
RP/0/RSP0/CPU0:LAB-01(config)#commit
Tue Feb 3 11:43:00.541 UTC
RP/0/RSP0/CPU0:Feb 3 11:43:00.724 : config[65755]: %MGBL-CONFIG-3-FILESAVE_ERROR : The OS returned an error while SysDb was attempting to perform a file save operation. Any changes have been rolled back. Error: ''sysdb' detected the 'warning' condition 'The OS returned an error while attempting to access a file': No such file or directory'
% Failed to commit .. As an error (Unknown) encountered during commit operation. Changes may not have been committed:
'sysdb' detected the 'warning' condition 'The OS returned an error while attempting to access a file': No such file or directory
RP/0/RSP0/CPU0:LAB-01(config)#File name: first.redfs_svr_88.20150203-114301.node0_RSP0_CPU0.x86
RP/0/RSP0/CPU0:Feb 3 11:43:01.613 : dumper[60]: %OS-DUMPER-4-SIGSEGV : Thread 1 received SIGSEGV - Segmentation Fault
RP/0/RSP0/CPU0:Feb 3 11:43:01.613 : dumper[60]: %OS-DUMPER-4-SIGSEGV_INFO : Accessed BadAddr 0x108 at PC 0x42086b9. Signal code 1 - SEGV_MAPPER. Address not mapped.
RP/0/RSP0/CPU0:Feb 3 11:43:01.613 : dumper[60]: %OS-DUMPER-4-CRASH_INFO : Crashed pid = 36885 (pkg/bin/redfs_svr)
RP/0/RSP0/CPU0:Feb 3 11:43:01.725 : sysmgr[94]: redfs_svr(1) (jid 88) (pid 36885) (fail_count 1) abnormally terminated, restart scheduled
RP/0/RSP0/CPU0:Feb 3 11:43:01.726 : redfs_config[361]: %MEDIA-REDFSAPI-4-MSG_SEND_FAILED : Unable to send message to redfs server, op 9, errno=3:No such process. : pkg/bin/redfs_config : (PID=270514) : -Traceback= af367a4 af3596e 42000a1 8ebf44b 8ec1274 822605f 82242b2 4200700 4200038
02-05-2015 11:48 AM
We found that you have to use a turboboot to recover from the problem. We also found that you have to change a ROMMON variable to prevent this problem from happening again. Finally, do not use disk mirroring on the ASR9k (there is no advantage, only reduced disk space available and a chance you may corrupt your RSPs).
Here's how you turboboot on your RSP440:
Here's how to disable the incorrect variable in ROMMON that hoses disk mirroring (you have to do this on each RSP you have)
Don't do disk mirroring on your RSP440. For the love of god, don't do it.
02-05-2015 02:13 PM
Stay relaxed Benjamin, its a nasty bug and yes, you have to do TURBOBOOT to recover from the problem because the bug actually corrupts the file system, but its still only a bug. There are good reasons for disk mirroring, as I already commented. Its not a good idea to rest the stability of the whole router to a single flash disk. Cisco recognized the bug and new RSP shipments will not contain this variable in the ROMMON.
Cheers,
Florian
02-05-2015 02:58 PM
Florian - I'm not sure I follow you. We could not find any good reason to do mirroring on the 9k. Xander explains it pretty well on this other thread - https://supportforums.cisco.com/discussion/12324091/disk-mirroring-asr9000-crs
This media does not behave like a typical HDD in your PC where it can mark bad sectors and skip them. It seems that if corruption occurs on either disk0 or disk1, you are going to have issues reading data off both partitions (remember, its the same physical media). You would also have to do an RMA on the entire RSP since the media is not field replaceable.
Can you please explain why you think disk mirroring would be beneficial in this scenario? All I can find is that disk mirroring is possible, but no explanation why it's beneficial. Data is already copied between RSPs, why do we need to have 2 copies on each RSP on the same physical storage media? That's like partitioning my USB drive into 2 separate volumes and copying files between them.
In my opinion, it's not worth the risk to enable disk mirroring. Especially when the Cisco config guides for disk mirroring do not mention having to check the ROMMON variables first.
Maybe I'm missing something?
02-06-2015 02:40 AM
Hi,
I was not aware of the article from Xander you are referring too. On RSP-4G disk mirroring made much sense because there you had two flash disk slots accessible on the front panel. With RSP-440 it seems that they are now not accessible at all, I was under the impression that you might reach them by removing the RSP from the slot. So as this is not the case, your are right, disk mirroring does not make much sense.
Regards,
Florian
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide