cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Announcements

**Updated 16 February 2011** IBM 7816-I4 782x-I4 filesystem errors

42848
Views
10
Helpful
89
Comments

 

Summary

Cisco Media Convergence Servers 7816-I4, 7825-I4 (and IBM x3250-M2  equivalent) and 7828-I4 have recently been experiencing technical issues.  These servers are used by Cisco Unified Communications Manager and various other Cisco Collaboration software products.

The symptom is that the local disk drives' file-system goes into read-only mode, which can manifest as application services going down, the server becoming  unresponsive via the network or the management interfaces, or worst case data corruption necessitating a reinstall and restore from backup.

 

Root cause has been identified by Cisco and its suppliers as a disk drive issue stemming from interaction with system firmware. 

 

Field Notice 63374 has been published and includes more technical details regarding this  issue.  Cisco and its suppliers are committed to high quality and  apologize for any disruptions or impact caused by this issue.

 

Solution

The file-system going read-only issue which has recently been affecting server models MCS-7816-I4, MCS-7825-I4, and MCS-7828-I4 (or their IBM equivilants) in the field is addressed by CSCti52867 - "IBM 7816-I4 and 782x-I4 READONLY file system".

 

The fix for CSCti52867 is now available and requires the application of two patch files.  Install both of these patch files in the order listed below.

 

1. First install ciscocm.ibm-diskex-1.0.cop.sgn 
     The Readme file ciscocm.ibm-diskex-1.0.cop.sgn includes installation instructions for this .cop.sgn.

     Make sure to only install this utility when show hardware CLI output indicates the array is in a healthy state.

     If your server has never had the filesystem go readonly then this step is optional. 
2. Next install Cisco-HDD-FWUpdate-3.0.1-I.ISO .
     The Readme file Cisco-HDD-FWUpdate-3.0.1-I.Readme.pdf includes installation instructions for this ISO.

     This installer is completely independant of the OS installed on the server.

Note:  Installing the FWUpdate v3.0(1) or later will get you firmware with the fix for this defect.  It is always recommended that you apply the latest FWUCD available for your server.

 

Refer to the Release Note of CSCti52867 and the Readme file for each of the above mentioned patch files for more details.

 

Symptoms

  • The file system goes READONLY, then CUCM services may go down, the server may become "unresponsive" meaning that it is not possible to ssh into the server, login to the console, or web into the server although it may still respond to pings.
  • Traces from all services stop writing (including syslog)
  • You see the following error on the server console
 
EXT3-fs error (device sda6) in start_transaction: Jornal has aborted

 

 

 

 

 

  • If you are able to login to the server via SSH, the following output may be displayed.
 
Last login: Mon Aug  X XX:XX:XX XXXX from XXX.XXX.XXX.XXX
Command Line Interface is starting up, please wait ...
java.io.FileNotFoundException: /var/log/active/platform/log/cli.bin (Read-only file system)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
    :
    :
    :
        at org.apache.log4j.Category.info(Category.java:674)
        at sdMain.main(sdMain.java:611)
log4j:ERROR No output stream or file set for the appender named [CLI_LOG].

   Welcome to the Platform Command Line Interface
    WARNING:
        The /common file system is mounted read only.  <<<<<<<<<<<<<<<<<<

        Please use Recovery Disk to check the file system using fsck.
admin:

 

 

 

How to determine the current version of firmware on the hard drive

 

 

 

  • For MCS-7825-I4 and MCS-7828-I4, running Cisco UCM 7.1 and above, you can use the CLI command 'show hardware' to verify the firmware version.

 

 



admin:show hardware



HW Platform       : 7828I4

Processors        : 1

Type            
: Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz

CPU Speed         : 2660

Memory            : 8192
MBytes

Object ID         : 1.3.6.1.4.1.9.1.899

OS Version        : UCOS 4.0.0.0-34

Serial Number     : KQRBVVB



RAID Version      :

Raid firmware version: 1.26.81.00

Raid Bios version: 6.16.00.00



BIOS Information  :

IBM IBMBIOSVersion1.44-[M9E144AUS-1.44]- 06/11/2009



RAID Details      :

LSI Logic IR Configuration Utility 2.00.15

Read configuration has been initiated for controller 0

------------------------------------------------------------------------

Controller information

------------------------------------------------------------------------

  Controller
type                       
: SAS1064E

  BIOS
version                          
: 6.16.00.00

  Firmware version                      
: 1.26.81.00

  Channel
description                   
: 1 Serial Attached SCSI

  Initiator
ID                          
: 112

  Maximum physical
devices              
: 62

  Concurrent commands
supported           : 266

  Slot                                  
: 0


Bus                                   
: 1


Device                                
: 0


Function                              
: 0

  RAID
Support                          
: Yes

------------------------------------------------------------------------

IR Volume information

------------------------------------------------------------------------

IR volume 1

  Volume
ID                             
: 7

  Status of
volume                      
: Okay (OKY)

  RAID
level                            
: 1

  Size (in
MB)                          
: 237464

  Physical hard disks (Target
ID)         : 9 8

------------------------------------------------------------------------

Physical device information

------------------------------------------------------------------------

Initiator at ID #112

Target on ID #8

  Device is a Hard disk

  Enclosure
#                           
: 1

  Slot
#                                
: 1

  Target
ID                             
: 8


State                                 
: Online (ONL)

  Size (in MB)/(in
sectors)             
: 238475/488397168


Manufacturer                          
: ATA   

  Model
Number                          
: WD2502ABYS
-23B7A

  Firmware Revision                     
: 3B04


  Serial
No                             
:      WD-WCAT1D712130

  Drive
Type                            
: SATA

Target on ID #9

  Device is a Hard disk

  Enclosure
#                           
: 1

  Slot
#                                
: 0

  Target
ID                             
: 9


State                                 
: Online (ONL)

  Size (in MB)/(in
sectors)             
: 238475/488397168


Manufacturer                          
: ATA   

  Model Number                          
: WD2502ABYS
-23B7A

  Firmware
Revision                     
: 3B04


  Serial
No                             
:      WD-WCAT1D723848

  Drive
Type                            
: SATA

------------------------------------------------------------------------

Enclosure information

------------------------------------------------------------------------

Enclosure#                              
: 1

  Logical
ID                            
: 5005076b:0648afc0

  Numslots                              
: 4


StartSlot                             
: 0

  Start
TargetID                        
: 0

  Start
Bus                             
: 0

------------------------------------------------------------------------



 

The text highlited in red are the info you need.  This output shows a server with two drives with model  number WD2502ABYS on 3B04 firmware.  These drives should be upgraded as  soon as possible.

 

  • For MCS-7825-I4 and MCS-7828-I4 models running Cisco UCM versions previous to 7.1, as well as any version of Cisco UCM running on a MCS-7816-I4 model server, you must download and boot off of a CD burned with Cisco-HDD-FWUpdate-3.0.1-I.ISO (refer to the "Solution" section for links to download the ISO and readme).  Upon successful boot of the Cisco-HDD-FWUpdate-3.0.1-I.ISO CD, you will be presented with the current HDD FW version as well as the opportunity to upgrade to HDD FW version 02.03B06.

What should be done if filesystem issues persist after applying the patches?


As of 16 February 2011 if you encounter any further filesystem or hard drive issues after applying both the firmware and disk exerciser you should proceed to replace the affected drive(s).

 

There are three ways you can replace the drive(s).

  1. If you have an MCS server with an active Cisco support contract open a TAC service request.
  2. If you purchased the IBM x3250 M2 MCS equivalent and have an IBM support contract contact IBM support.
  3. If you do not have any support contract for the server you can purchase a new drive from Cisco or IBM.
    • The Cisco part number for the hard drive is HDD-7825-I4-250=.
    • Contact your IBM reseller to confirm the correct part number for the 250GB SATA simple-swap HD for the x3250 M2 server.

 

If you have any questions you can leave a comment on this document. The ibm-fs-failure@cisco.com email address is no longer active as of 1 September 2014.

Sending the email will not generate a TAC SR but will allow us to collect more information.  This is an informal submission with no associated SLA and we will make every effort to follow up submissions but cannot guarantee a response.

 

Related Defects

 

 

 

Related Links

Comments
Cisco Employee

I did the upgrade for the brand new IBM HDDs with 02.03B05 to 02.03B06. You can only see the long name of firmwareversion with that "Cisco-HDD-FWUpdate-3.0.1-I.ISO". With DSA 3.20 I could only see 02.0

Correct, on the MCS-7816-I4 servers, the DSA will not show the full HD FW version.  It will only show the first 4 digits of the FW version (ex 02.0).  But when you run "Cisco-HDD-FWUpdate-3.0.1-I.ISO" it will show the current "full" version (ex 02.03B04) and the version which will be applied during the upgrade (02.03B06).

Beginner

Shane,

Since yesterday I have now rebuilt and am now running version 6.1.4. I still cannot run the DSA even after doing all that you suggested.  I have sent ibm a video of the server failing half way through the DSA load but i'm not sure what they can do about this apart from swapping the server out.  I suspect it's an issue with the linux kernel and the servers disk controller.  If you had this server which couldn't run the DSA nor the automatic firmware updates would you get this swapped out?  I ask this as the tac engineer said it wasn't an issue!

Cisco Employee

networkdefence,

Please send an email to ibm-fs-failure@cisco.com indicating your TAC SR number.

Enthusiast

Hello,
This article is totally confusing since the last updates. Indeed, in "Current status" section, we understand that we need to apply 2 patches in order to upgrade to the final version 02.03b06. Good.

But, there is still a section "What to do if you have not yet encountered this problem" where the instructions are to apply an upgrade to version 02.03b05, without mention of the 2 other patches from the first section.

What we have to do?

We already applied the first FW upgrade 3b05 on several servers 7816/25 I4, does that mean that we have to reapply both new patches? Is it mandatory to upgrade in 3b06 even if we are already in 3b05?

Thank you to clean this document up because we are lost now.

Best regards,

Yorick Petey

Cisco Employee

What we have to do?

We already applied the first FW upgrade 3b05 on several servers 7816/25 I4, does that mean that we have to reapply both new patches? Is it mandatory to upgrade in 3b06 even if we are already in 3b05?

Thank you to clean this document up because we are lost now.

Best regards,

Yorick Petey

Yorick,

Thanks for the feedback regarding the doc.  I will update the doc later today.


To answer your questions.  Yes, even though you have already applied 3b05 on your 7816/25 I4 servers, you should also run BOTH patches mentioned in the "Current Status" section in the order described.

Beginner

Hi Philip,

I have two questions regarding the mentioned procedures.

- What are we doing with the cop file, what's in it

- Do I have to re-apply the cop file after a upgrade to a new CUCM SU or CUPS build?

At the moment I am perfoming the procedure on CUCM 7.1.3b on a 7825-I4 and CUPS 7.0.8 also on 7825-I4

kind regards,

Bas

Cisco Employee

Bas,

You only need to apply the ciscocm.ibm-diskex-1.0.cop.sgn once per affected server before applying the HDD FW update.  You do NOT need to reapply the ciscocm.ibm-diskex-1.0.cop.sgn after any application SUs or upgrades.

Shane

Beginner

Hi Shane,

Thanks for your reply, clear. Do you know what the exersizer is doing, is it checking the disk?

Further, After I applied the firmware update, both disks where out of sync. After the reboot I got a message that it was doing a re-sync. However it did not re-sync automatically. After waiting for two hours I rebooted the server again, entered the RAID controller BIOS and noticed that the "sync level" was at 0%, then the controller started the sync automatically. Is this expected behaviour after applying a firmware update?

Kind regards,

Bas

Cisco Employee

Bas,

We can only point you to the text of the bug and the upcoming Field Notice for comments on the source of the problem and how the fix works.   On this forum all you will get is that you need to run both the exerciser utility and upgrade the firmware.

Regarding the array rebuild it is expected that the array will rebuild after the firmware update.  This rebuild can take anywhere from 3 hours to a day to complete.

Were you able to confirm before the rebuild that the resync was stuck at 0% or is it possible that it was going but not yet completed before the second reboot?

-Ryan

Hi Shane,

Thanks for your reply, clear. Do you know what the exersizer is doing, is it checking the disk?

Further, After I applied the firmware update, both disks where out of sync. After the reboot I got a message that it was doing a re-sync. However it did not re-sync automatically. After waiting for two hours I rebooted the server again, entered the RAID controller BIOS and noticed that the "sync level" was at 0%, then the controller started the sync automatically. Is this expected behaviour after applying a firmware update?

Kind regards,

Bas

Beginner

Philips,

I am not sure if resync was stuck or not. After the firmware and first reboot I waited two hours. After that it was still out of sync according to the show hardware command. Then I rebooted again and entered the RAID bios, sync level was still at 0%. I stayed in the RAID bios, without touching anything it started to sync. The sync took abouut one hour and a half per server.

kind regards,

Bas

Community Member

Hi

One of our customers has an MCS-7816-I4 V01.

The part number for this HW is 74-6298-02 A0 so, according to the field notice 63374, this HD should not be affected by this problem.

Yet, I've booted the server with the FW upgrade CD and noticed that the HD FW version is 02.03B04.

Should I execute the HD FW upgrade?

Best regards

Cisco Employee

Hi

One of our customers has an MCS-7816-I4 V01.

The part number for this HW is 74-6298-02 A0 so, according to the field notice 63374, this HD should not be affected by this problem.

Yet, I've booted the server with the FW upgrade CD and noticed that the HD FW version is 02.03B04.

Should I execute the HD FW upgrade?

Best regards

Your server rev is actually an earlier one than noted in the field notice so you are affected by the problem.  The HD firmware needs to be 3B06 and if this server has ever been in production you need to also install the disk exerciser utility.

Enthusiast

Shane,
We have a customer that is running  IP IVR 8.0(2) on an MCS 7825-I4.The HDD Firmware is affected and must be upgraded (FN 63374).

However, the Drive Exerciser Utility could not be installed via OS Administration GUI. Rather than installing/running ciscocm.ibm-diskex-1.0.cop.sgn via FTP, the GUI returns "No valid upgrade options were found" and "not a signed patch file." is written next to the file.

Note that the Drive Exerciser Utility ran smoothly when installing the same .cop.sgn File on an adjacent 7825-I4 running Unity Connection 8.0

1. Can you point out any fix for a successful installation of ciscocm.ibm-diskex-1.0.cop.sgn?
2. What does this Utility exactly do and can Drive Exercising be safely omitted before running the actual HDD Firmware Upgrade? I still can't answer these legitimate questions after going through readmes, FN, Bug Notes etc.

Kind regards

/David

Cisco Employee

David,

Thanks for bringing this to our attention.  I'm going to do some digging and see what we can do about this problem for the contact center servers.

-Ryan

Shane,
We have a customer that is running  IP IVR 8.0(2) on an MCS 7825-I4.The HDD Firmware is affected and must be upgraded (FN 63374).

However, the Drive Exerciser Utility could not be installed via OS Administration GUI. Rather than installing/running ciscocm.ibm-diskex-1.0.cop.sgn via FTP, the GUI returns "No valid upgrade options were found" and "not a signed patch file." is written next to the file.

Note that the Drive Exerciser Utility ran smoothly when installing the same .cop.sgn File on an adjacent 7825-I4 running Unity Connection 8.0

1. Can you point out any fix for a successful installation of ciscocm.ibm-diskex-1.0.cop.sgn?
2. What does this Utility exactly do and can Drive Exercising be safely omitted before running the actual HDD Firmware Upgrade? I still can't answer these legitimate questions after going through readmes, FN, Bug Notes etc.

Kind regards

/David

Beginner

we hit the bug CSCti52867.

I found a new version 3.6.1 ,

FWUCD-3.6.1-I.iso

Release Date: 30/NOV/2010

Size: 334360.00 KB (342384640 bytes)

questions:

1) can some one tell me if I should use this 3.6 or the old 3.0.1.

2) if use the 3.6 I have to use first the ciscocm.ibm-diskex-1.0.cop.sgn or with version 3.6 is not necesary?

http://www.cisco.com/cisco/software/release.html?mdfid=282152197&catid=278875240&flowid=20295&softwareid=283046743&release=3.6%281%29&rellifecycle=&relind=AVAILABLE&reltype=all

kind regards,