cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
5526
Views
20
Helpful
18
Replies

ASR9000/XR - Disk Mirroring causing RSP crash / corruption

Andy Erickson
Level 4
Level 4

We are in the process of troubleshooting disk mirroring on our RSP440-TR's.  Every time we enable, the RSP crashes.

 

Detailed information on the three (3) scenarios and log results are listed below.

 

I'm curious if disk mirroring is necessary on our ASR9k's.  I've read various posts that reference that there's no purpose to disk mirroring at all along with the following bug CSCus34978 where it references the RSP going corrupt after enabling disk mirroring.

 

https://tools.cisco.com/bugsearch/bug/CSCus34978 

 

Any input from Xander & team would be much appreciated

Thanks!

-ae

 

######
Scenario 1
######

XR 4.3.4

1) install and activate all packages and SMUs

2) mirror location 0/RSP0/CPU0 disk0:disk1:

**Result - RSP CRASH

 

######
Scenario 2
​######

XR 4.3.4

1) install only the mini PIE with TurboBoot

2) format disk1:

3) mirror location 0/RSP0/CPU0 disk0:disk1:

**Result - RSP CRASH

 

######
Scenario 3
​######

XR 5.2.2

1) install and activate all packages and SMUs

2) mirror location 0/RSP0/CPU0 disk0:disk1:

**Result - RSP CRASH

 

######
Scenario 1 - LOG
######

RP/0/RSP0/CPU0:LAB-01(config)#mirror location 0/RSP0/CPU0 disk0:disk1:

RP/0/RSP0/CPU0:LAB-01(config)#commit

Mon Feb  2 09:47:19.850 UTC

RP/0/RSP0/CPU0:Feb  2 09:47:20.023 UTC: config[65742]: %MGBL-CONFIG-3-FILESAVE_ERROR : The OS returned an error while SysDb was attempting to perform a file save operation. Any changes have been rolled back. Error: ''sysdb' detected the 'warning' condition 'The OS returned an error while attempting to access a file': No such file or directory'

 

% Failed to commit .. As an error (Unknown) encountered during commit operation. Changes may not have been committed:

'sysdb' detected the 'warning' condition 'The OS returned an error while attempting to access a file': No such file or directory

RP/0/RSP0/CPU0:LAB-01(config)#RP/0/RSP0/CPU0:Feb  2 09:47:21.005 UTC: dumper[60]: %OS-DUMPER-4-SIGSEGV : Thread 1 received SIGSEGV - Segmentation Fault

RP/0/RSP0/CPU0:Feb  2 09:47:21.005 UTC: dumper[60]: %OS-DUMPER-4-SIGSEGV_INFO : Accessed BadAddr 0x108 at PC 0x420883b. Signal code 1 - SEGV_MAPPER. Address not mapped.

RP/0/RSP0/CPU0:Feb  2 09:47:21.005 UTC: dumper[60]: %OS-DUMPER-4-CRASH_INFO : Crashed pid = 36886 (pkg/bin/redfs_svr)

RP/0/RSP0/CPU0:Feb  2 09:47:21.006 UTC: dumper[60]: %OS-DUMPER-6-FALLBACK_CHOICE : Fall back choice:  0(harddisk:/dumper) in use

RP/0/RSP0/CPU0:Feb  2 09:47:21.006 UTC: dumper[60]: %OS-DUMPER-6-DEFAULT_PATH_TMP : Setting default path to /tmp/dumper for core size less than 20971520. Core dump size 2613248, free memory size 10122571776.: default path for core dump

RP/0/RSP0/CPU0:Feb  2 09:47:21.007 UTC: dumper[60]: %OS-DUMPER-6-REGISTERS_INFO :            EDI      ESI  EBP(fp)      EXX

RP/0/RSP0/CPU0:Feb  2 09:47:21.007 UTC: dumper[60]: %OS-DUMPER-6-REGISTERS_INFO :   R0   00000000  10023c34  041ff200  fdfd5cd0  

RP/0/RSP0/CPU0:Feb  2 09:47:21.007 UTC: dumper[60]: %OS-DUMPER-6-REGISTERS_INFO :            EBX      EDX      ECX      EAX

RP/0/RSP0/CPU0:Feb  2 09:47:21.007 UTC: dumper[60]: %OS-DUMPER-6-REGISTERS_INFO :   R4   00000000  0802417a  041ff150  042238b0  

RP/0/RSP0/CPU0:Feb  2 09:47:21.007 UTC: dumper[60]: %OS-DUMPER-6-REGISTERS_INFO :        EIP(pc)       CS      EFL      ESP

RP/0/RSP0/CPU0:Feb  2 09:47:21.007 UTC: dumper[60]: %OS-DUMPER-6-REGISTERS_INFO :   R8   0420883b  000000f3  00011202  041ff1f4  

RP/0/RSP0/CPU0:Feb  2 09:47:21.007 UTC: dumper[60]: %OS-DUMPER-6-REGISTERS_INFO :             SS

RP/0/RSP0/CPU0:Feb  2 09:47:21.007 UTC: dumper[60]: %OS-DUMPER-6-REGISTERS_INFO :   R12  000000fb  

RP/0/RSP0/CPU0:Feb  2 09:47:21.007 UTC: dumper[60]: %OS-DUMPER-6-TRACE_BACK : #0  0x420883b

RP/0/RSP0/CPU0:Feb  2 09:47:21.008 UTC: dumper[60]: %OS-DUMPER-6-TRACE_BACK : #1  0x420cee8

RP/0/RSP0/CPU0:Feb  2 09:47:21.008 UTC: dumper[60]: %OS-DUMPER-6-TRACE_BACK : #2  0x420fa42

RP/0/RSP0/CPU0:Feb  2 09:47:21.008 UTC: dumper[60]: %OS-DUMPER-6-TRACE_BACK : #3  0x82476ae

RP/0/RSP0/CPU0:Feb  2 09:47:21.008 UTC: dumper[60]: %OS-DUMPER-6-TRACE_BACK : #4  0x8247656

RP/0/RSP0/CPU0:Feb  2 09:47:21.008 UTC: dumper[60]: %OS-DUMPER-6-TRACE_BACK : #5  0x8244737

RP/0/RSP0/CPU0:Feb  2 09:47:21.008 UTC: dumper[60]: %OS-DUMPER-6-TRACE_BACK : #6  0x421000d

RP/0/RSP0/CPU0:Feb  2 09:47:21.008 UTC: dumper[60]: %OS-DUMPER-6-TRACE_BACK : #7  0x4200290

RP/0/RSP0/CPU0:Feb  2 09:47:21.011 UTC: dumper[60]: %OS-DUMPER-6-FALLBACK_CHOICE : Fall back choice:  tmp(/tmp/dumper) in use

RP/0/RSP0/CPU0:Feb  2 09:47:21.012 UTC: dumper[60]: %OS-DUMPER-5-CORE_FILE_NAME : Core for process pkg/bin/redfs_svr at /tmp/dumper/first.redfs_svr_88.20150202-094721.node0_RSP0_CPU0.x86.Z on local_node

RP/0/RSP0/CPU0:Feb  2 09:47:21.110 UTC: redfs_config[388]: %MEDIA-REDFSAPI-4-MSG_SEND_FAILED : Unable to send message to redfs server, op 9, errno=3:No such process.  : pkg/bin/redfs_config : (PID=241860) :  -Traceback= b6d1780 b6d094a 42000a1 8eba427 8ebc701 82465d7 8244724 4200700 4200038

RP/0/RSP0/CPU0:Feb  2 09:47:21.134 UTC: dumper[60]: %OS-DUMPER-6-CORE_MD5 : /tmp/dumper/first.redfs_svr_88.20150202-094721.node0_RSP0_CPU0.x86.Z has just been written. size:1132202 md5:d089e0c7bfea5c744ae2e4c63ed960c4

RP/0/RSP0/CPU0:Feb  2 09:47:21.153 UTC: dumper[60]: %OS-DUMPER-5-DUMP_SUCCESS : Core dump success

RP/0/RSP0/CPU0:Feb  2 09:47:21.174 UTC: dumper[60]: %OS-DUMPER-6-CORE_MD5 : /tmp/dumper/first.redfs_svr_88.20150202-094721.node0_RSP0_CPU0.x86.Z before transfer. size:1132202 md5:d089e0c7bfea5c744ae2e4c63ed960c4

RP/0/RSP0/CPU0:Feb  2 09:47:21.174 UTC: dumper[60]: %OS-DUMPER-6-FALLBACK_CHOICE : Fall back choice:  4(dumper_harddisk:/dumper) in use

RP/0/RSP0/CPU0:Feb  2 09:47:23.422 UTC: dumper[60]: %OS-DUMPER-5-CORE_TRANSFER_STATUS : Transfer of Core file from /tmp/dumper/first.redfs_svr_88.20150202-094721.node0_RSP0_CPU0.x86.Z to harddisk:/dumper/first.redfs_svr_88.20150202-094721.node0_RSP0_CPU0.x86.Z on local_node suceeded.

RP/0/RSP0/CPU0:Feb  2 09:47:23.434 UTC: dumper[60]: %OS-DUMPER-6-CORE_MD5 : harddisk:/dumper/first.redfs_svr_88.20150202-094721.node0_RSP0_CPU0.x86.Z after transfer. size:1132202 md5:d089e0c7bfea5c744ae2e4c63ed960c4

 

######
Scenario 2 - LOG
######

 

RP/0/RSP0/CPU0:LAB-01(config)#mirror location 0/RSP0/CPU0 disk0:disk1:

RP/0/RSP0/CPU0:LAB-01(config)#commit

Tue Feb  3 10:17:33.043 UTC

RP/0/RSP0/CPU0:Feb  3 10:17:33.116 UTC: config[65762]: %MGBL-CONFIG-3-FILESAVE_ERROR : The OS returned an error while SysDb was attempting to perform a file save operation. Any changes have been rolled back. Error: ''sysdb' detected the 'warning' condition 'The OS returned an error while attempting to access a file': No such file or directory'

 

% Failed to commit .. As an error (Unknown) encountered during commit operation. Changes may not have been committed:

'sysdb' detected the 'warning' condition 'The OS returned an error while attempting to access a file': No such file or directory

RP/0/RSP0/CPU0:LAB-01(config)#File name: first.redfs_svr_88.20150203-101734.node0_RSP0_CPU0.x86

RP/0/RSP0/CPU0:Feb  3 10:17:34.103 UTC: dumper[60]: %OS-DUMPER-4-SIGSEGV : Thread 1 received SIGSEGV - Segmentation Fault

RP/0/RSP0/CPU0:Feb  3 10:17:34.103 UTC: dumper[60]: %OS-DUMPER-4-SIGSEGV_INFO : Accessed BadAddr 0x108 at PC 0x42086b9. Signal code 1 - SEGV_MAPPER. Address not mapped.

RP/0/RSP0/CPU0:Feb  3 10:17:34.103 UTC: dumper[60]: %OS-DUMPER-4-CRASH_INFO : Crashed pid = 36885 (pkg/bin/redfs_svr)

RP/0/RSP0/CPU0:Feb  3 10:17:34.104 UTC: dumper[60]: %OS-DUMPER-6-FALLBACK_CHOICE : Fall back choice:  0(harddisk:/dumper) in use

RP/0/RSP0/CPU0:Feb  3 10:17:34.104 UTC: dumper[60]: %OS-DUMPER-6-DEFAULT_PATH_TMP : Setting default path to /tmp/dumper for core size less than 20971520. Core dump size 3104768, free memory size 11032899584.

RP/0/RSP0/CPU0:Feb  3 10:17:34.108 UTC: dumper[60]: %OS-DUMPER-6-FALLBACK_CHOICE : Fall back choice:  tmp(/tmp/dumper) in use

RP/0/RSP0/CPU0:Feb  3 10:17:34.109 UTC: dumper[60]: %OS-DUMPER-5-CORE_FILE_NAME : Core for process pkg/bin/redfs_svr at /tmp/dumper/first.redfs_svr_88.20150203-101734.node0_RSP0_CPU0.x86.Z on local_node

RP/0/RSP0/CPU0:Feb  3 10:17:34.213 UTC: sysmgr[94]: redfs_svr(1) (jid 88) (pid 36885) (fail_count 1) abnormally terminated, restart scheduled

RP/0/RSP0/CPU0:Feb  3 10:17:34.214 UTC: redfs_config[381]: %MEDIA-REDFSAPI-4-MSG_SEND_FAILED : Unable to send message to redfs server, op 9, errno=3:No such process.  : pkg/bin/redfs_config : (PID=266423) :  -Traceback= af9b7a4 af9a96e 42000a1 8e7144b 8e73274 822605f 82242b2 4200700 4200038

RP/0/RSP0/CPU0:Feb  3 10:17:34.251 UTC: dumper[60]: %OS-DUMPER-6-CORE_MD5 : /tmp/dumper/first.redfs_svr_88.20150203-101734.node0_RSP0_CPU0.x86.Z has just been written. size:1194760 md5:442035043a30db3efc69392affcdca80

RP/0/RSP0/CPU0:Feb  3 10:17:34.272 UTC: dumper[60]: %OS-DUMPER-5-DUMP_SUCCESS : Core dump success

RP/0/RSP0/CPU0:Feb  3 10:17:34.294 UTC: dumper[60]: %OS-DUMPER-6-CORE_MD5 : /tmp/dumper/first.redfs_svr_88.20150203-101734.node0_RSP0_CPU0.x86.Z before transfer. size:1194760 md5:442035043a30db3efc69392affcdca80

RP/0/RSP0/CPU0:Feb  3 10:17:34.295 UTC: dumper[60]: %OS-DUMPER-6-FALLBACK_CHOICE : Fall back choice:  4(dumper_harddisk:/dumper) in use

RP/0/RSP0/CPU0:Feb  3 10:17:36.913 UTC: dumper[60]: %OS-DUMPER-5-CORE_TRANSFER_STATUS : Transfer of Core file from /tmp/dumper/first.redfs_svr_88.20150203-101734.node0_RSP0_CPU0.x86.Z to harddisk:/dumper/first.redfs_svr_88.20150203-101734.node0_RSP0_CPU0.x86.Z on local_node suceeded.

RP/0/RSP0/CPU0:Feb  3 10:17:36.926 UTC: dumper[60]: %OS-DUMPER-6-CORE_MD5 : harddisk:/dumper/first.redfs_svr_88.20150203-101734.node0_RSP0_CPU0.x86.Z after transfer. size:1194760 md5:442035043a30db3efc69392affcdca80

 

######
Scenario 3 - LOG
######

 

RP/0/RSP0/CPU0:LAB-01#format disk1:

Tue Feb  3 11:35:48.676 UTC

 

Format will destroy all data on "disk1:". Continue? [confirm]

 

Device partition disk1: is now formated and is available for use.

 

RP/0/RSP0/CPU0:LAB-01(config)#mirror location 0/RSP0/CPU0 disk0:disk1:

RP/0/RSP0/CPU0:LAB-01(config)#commit

Tue Feb  3 11:43:00.541 UTC

RP/0/RSP0/CPU0:Feb  3 11:43:00.724 : config[65755]: %MGBL-CONFIG-3-FILESAVE_ERROR : The OS returned an error while SysDb was attempting to perform a file save operation. Any changes have been rolled back. Error: ''sysdb' detected the 'warning' condition 'The OS returned an error while attempting to access a file': No such file or directory'

 

% Failed to commit .. As an error (Unknown) encountered during commit operation. Changes may not have been committed:

'sysdb' detected the 'warning' condition 'The OS returned an error while attempting to access a file': No such file or directory

RP/0/RSP0/CPU0:LAB-01(config)#File name: first.redfs_svr_88.20150203-114301.node0_RSP0_CPU0.x86

RP/0/RSP0/CPU0:Feb  3 11:43:01.613 : dumper[60]: %OS-DUMPER-4-SIGSEGV : Thread 1 received SIGSEGV - Segmentation Fault

RP/0/RSP0/CPU0:Feb  3 11:43:01.613 : dumper[60]: %OS-DUMPER-4-SIGSEGV_INFO : Accessed BadAddr 0x108 at PC 0x42086b9. Signal code 1 - SEGV_MAPPER. Address not mapped.

RP/0/RSP0/CPU0:Feb  3 11:43:01.613 : dumper[60]: %OS-DUMPER-4-CRASH_INFO : Crashed pid = 36885 (pkg/bin/redfs_svr)

RP/0/RSP0/CPU0:Feb  3 11:43:01.725 : sysmgr[94]: redfs_svr(1) (jid 88) (pid 36885) (fail_count 1) abnormally terminated, restart scheduled

RP/0/RSP0/CPU0:Feb  3 11:43:01.726 : redfs_config[361]: %MEDIA-REDFSAPI-4-MSG_SEND_FAILED : Unable to send message to redfs server, op 9, errno=3:No such process.  : pkg/bin/redfs_config : (PID=270514) :  -Traceback= af367a4 af3596e 42000a1 8ebf44b 8ec1274 822605f 82242b2 4200700 4200038

 

18 Replies 18

We found that you have to use a turboboot to recover from the problem.  We also found that you have to change a ROMMON variable to prevent this problem from happening again.  Finally, do not use disk mirroring on the ASR9k (there is no advantage, only reduced disk space available and a chance you may corrupt your RSPs).

Here's how you turboboot on your RSP440:

  1. Load your USB drive with your asr9k-mini-px.vm-X.X.X file at root.  Insert drive into your USB port on your RSP.
  2. Eject your standby RSP
  3. Reload your primary RSP and use ctrl-C to send break and get into ROMMON
  4. Enter the following commands in ROMMON to initiate turboboot:
    1. unset BOOT
    2. TURBOBOOT=on,disk0,format
    3. sync
    4. boot usb:\asr9k-mini-px.vm-X.X.X
  5. Your router will reload and now you are back to your base load with only the 'mini' package.
  6. Add and activate remaining packages and SMUs as ususal.  Copy / paste your running config

 

Here's how to disable the incorrect variable in ROMMON that hoses disk mirroring (you have to do this on each RSP you have)

  1. Boot into ROMMON
  2. unset MIRROR_DISABLE
  3. sync
  4. reset
  5. Repeat for your second RSP

 

Don't do disk mirroring on your RSP440.  For the love of god, don't do it.

 

Stay relaxed Benjamin, its a nasty bug and yes, you have to do TURBOBOOT to recover from the problem because the bug actually corrupts the file system, but its still only a bug. There are good reasons for disk mirroring, as I already commented. Its not a good idea to rest the stability of the whole router to a single flash disk. Cisco recognized the bug and new RSP shipments will not contain this variable in the ROMMON.

 

Cheers,

Florian

Florian - I'm not sure I follow you.  We could not find any good reason to do mirroring on the 9k.  Xander explains it pretty well on this other thread - https://supportforums.cisco.com/discussion/12324091/disk-mirroring-asr9000-crs

This media does not behave like a typical HDD in your PC where it can mark bad sectors and skip them.  It seems that if corruption occurs on either disk0 or disk1, you are going to have issues reading data off both partitions (remember, its the same physical media).  You would also have to do an RMA on the entire RSP since the media is not field replaceable.

Can you please explain why you think disk mirroring would be beneficial in this scenario?  All I can find is that disk mirroring is possible, but no explanation why it's beneficial.  Data is already copied between RSPs, why do we need to have 2 copies on each RSP on the same physical storage media?  That's like partitioning my USB drive into 2 separate volumes and copying files between them.

In my opinion, it's not worth the risk to enable disk mirroring.  Especially when the Cisco config guides for disk mirroring do not mention having to check the ROMMON variables first.

Maybe I'm missing something?

Hi,

 

I was not aware of the article from Xander you are referring too. On RSP-4G disk mirroring made much sense because there you had two flash disk slots accessible on the front panel. With RSP-440 it seems that they are now not accessible at all, I was under the impression that you might reach them by removing the RSP from the slot. So as this is not the case, your are right, disk mirroring does not make much sense.

 

Regards,

Florian