cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1827
Views
0
Helpful
5
Replies

CUCM Crash, ata1: translated ATA stat/err / EXT3-fs error

michellp
Level 1
Level 1

Hi there!

We have in our office 2 callmanager servers(models below) running as publisher/subscriber (cm1/cm2)

1) Publisher

Hardware Model: 7816H3

Processors : 1

Type              : Intel(R) Celeron(R) D CPU 3.20GHz

Speed           : 3200 MHz

Memory        : 2048 MB

Software Release Version: 6.1.4.1000-10

Platform Release Version: 2.0.0.1-1

2) Subscriber

Hardware Model: 7816I4

Processors : 1

Type              : Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz

Speed           : 3000 MHz

Memory        : 2048 MB

Software Release Version: 6.1.4.1000-10

Platform Release Version: 2.0.0.1-1

The problem I have, is that cm2 (subscriber) every once in a while crashes. And by crashing I mean that the server stays powered up, but is no longer accessible through the console or remotely. Nor is it pingable. I Have the real time monitoring tool running and it looses the cm2 and gives errors: ServerDown, DBreplication failure, SDLLinkOutOfService, etc etc. These crashes occur sometimes, multiple times a week but it may also stay running for a few weeks before crashing. When the server has crashed it always displays the same 2 error message on the console window and keeps repeating these errors rapidly making it impossible to login. The only option I have left is to hard reset the server by pressing the on/off switch. After that the server works just fine till the next crash.

The error messages:

1) ata1: translated ATA stat/err 0x61/04 to SCSI SK/ASC/ASCQ 0xb/00/00

2) EXT3-fs error (device sd(8,6)) in start_transaction: Journal has aborted

So it seems to have something to do with the disks, but is it hardware/software?? What can I do about this? I have done some searching on google, but get more confused/lost by the page. So hopefully the cisco supportforum can shed some light on this matter and point me in the right direction to a solution.

Thanks

1 Accepted Solution

Accepted Solutions

ndravid
Level 1
Level 1

Hey,

In this situation since your Publisher is unaffected I would like you to take a DRS Backup first. 
Next you will need to run the Recovery CD on your Subscriber. 
If you do not have the Recovery CD then you can download it from –

http://tools.cisco.com/squish/3De04f

Here are the steps to use the CD

1.    Please boot-up the troubled server with recovery CD.

2.    You will see the below options

3.    Choose Option F

4.    and say YES if system requests to correct the files or to relocate

5.    finally choose option Q

6.    Reboot the server and check if that helps

7.    If you still have the same error boot from recovery CD again and

8.    choose option M

9.    say YES if system requests to correct the files

10.    finally choose option Q

11.    Reboot the server

*********************************************************************

***         Welcome to Cisco CallManager Recovery Disk

***                  Version RELEASE_VER

***            Copyright - Cisco System, INC. 2006

***

***

***  Please enter one of the following options:

***

***  [S]|[s] Swap the active and inactive partitions.

***  [W]|[w] Windows pre-installation setup.

***  [F]|[f] Check and automatically correct disk file systems.

***  [M]|[m] Check and manually correct disk file systems.

***  [V]|[v] Verify the disk partitioning layout.

***  [Q]|[q] Quit this recovery disk program.

*********************************************************************

I am sure that this will solve your issue and will eliminate the error messages.

But i would suggest that you rebuild the Call Manager Server in question after you recover it using the recovery disc.

If you face the same issue again after the Recovery and Rebuild procedure, the only way forward to fix the issue is by replacing the HDD's on the affected server.

Hope this helps....

Regards

Nachiket

View solution in original post

5 Replies 5

Tommer Catlin
VIP Alumni
VIP Alumni

Id open a tac case. I think these servers have an issue with the board or the controller.  I seemed to remember having one like this do the same to me.   If you open up the case, look for any blinking red lights.  I believe mine had some memory bank issues.

I trust that you are right. Any other opinions / suggestions by anyone?

Hi there.

I had issue like this. There is a problem with the HD firmware.

Which HD firmware do you have?

Try to check this

Cisco-HDD-FWUpdate-3.0.1-I on cisco. This is a FW upgrade for CUCM

Thanks for your response Vladpetra. I have not yet had a chance to look at your suggestion. I've gone with the suggestion to open a tac case for this. I'll await the outcome and post here the result. Thanks to everyone so far

ndravid
Level 1
Level 1

Hey,

In this situation since your Publisher is unaffected I would like you to take a DRS Backup first. 
Next you will need to run the Recovery CD on your Subscriber. 
If you do not have the Recovery CD then you can download it from –

http://tools.cisco.com/squish/3De04f

Here are the steps to use the CD

1.    Please boot-up the troubled server with recovery CD.

2.    You will see the below options

3.    Choose Option F

4.    and say YES if system requests to correct the files or to relocate

5.    finally choose option Q

6.    Reboot the server and check if that helps

7.    If you still have the same error boot from recovery CD again and

8.    choose option M

9.    say YES if system requests to correct the files

10.    finally choose option Q

11.    Reboot the server

*********************************************************************

***         Welcome to Cisco CallManager Recovery Disk

***                  Version RELEASE_VER

***            Copyright - Cisco System, INC. 2006

***

***

***  Please enter one of the following options:

***

***  [S]|[s] Swap the active and inactive partitions.

***  [W]|[w] Windows pre-installation setup.

***  [F]|[f] Check and automatically correct disk file systems.

***  [M]|[m] Check and manually correct disk file systems.

***  [V]|[v] Verify the disk partitioning layout.

***  [Q]|[q] Quit this recovery disk program.

*********************************************************************

I am sure that this will solve your issue and will eliminate the error messages.

But i would suggest that you rebuild the Call Manager Server in question after you recover it using the recovery disc.

If you face the same issue again after the Recovery and Rebuild procedure, the only way forward to fix the issue is by replacing the HDD's on the affected server.

Hope this helps....

Regards

Nachiket