cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
49603
Views
10
Helpful
89
Comments
Phillip Ratliff
Cisco Employee
Cisco Employee

 

Summary

Cisco Media Convergence Servers 7816-I4, 7825-I4 (and IBM x3250-M2  equivalent) and 7828-I4 have recently been experiencing technical issues.  These servers are used by Cisco Unified Communications Manager and various other Cisco Collaboration software products.

The symptom is that the local disk drives' file-system goes into read-only mode, which can manifest as application services going down, the server becoming  unresponsive via the network or the management interfaces, or worst case data corruption necessitating a reinstall and restore from backup.

 

Root cause has been identified by Cisco and its suppliers as a disk drive issue stemming from interaction with system firmware. 

 

Field Notice 63374 has been published and includes more technical details regarding this  issue.  Cisco and its suppliers are committed to high quality and  apologize for any disruptions or impact caused by this issue.

 

Solution

The file-system going read-only issue which has recently been affecting server models MCS-7816-I4, MCS-7825-I4, and MCS-7828-I4 (or their IBM equivilants) in the field is addressed by CSCti52867 - "IBM 7816-I4 and 782x-I4 READONLY file system".

 

The fix for CSCti52867 is now available and requires the application of two patch files.  Install both of these patch files in the order listed below.

 

1. First install ciscocm.ibm-diskex-1.0.cop.sgn 
     The Readme file ciscocm.ibm-diskex-1.0.cop.sgn includes installation instructions for this .cop.sgn.

     Make sure to only install this utility when show hardware CLI output indicates the array is in a healthy state.

     If your server has never had the filesystem go readonly then this step is optional. 
2. Next install Cisco-HDD-FWUpdate-3.0.1-I.ISO .
     The Readme file Cisco-HDD-FWUpdate-3.0.1-I.Readme.pdf includes installation instructions for this ISO.

     This installer is completely independant of the OS installed on the server.

Note:  Installing the FWUpdate v3.0(1) or later will get you firmware with the fix for this defect.  It is always recommended that you apply the latest FWUCD available for your server.

 

Refer to the Release Note of CSCti52867 and the Readme file for each of the above mentioned patch files for more details.

 

Symptoms

  • The file system goes READONLY, then CUCM services may go down, the server may become "unresponsive" meaning that it is not possible to ssh into the server, login to the console, or web into the server although it may still respond to pings.
  • Traces from all services stop writing (including syslog)
  • You see the following error on the server console
 
EXT3-fs error (device sda6) in start_transaction: Jornal has aborted

 

 

 

 

 

  • If you are able to login to the server via SSH, the following output may be displayed.
 
Last login: Mon Aug  X XX:XX:XX XXXX from XXX.XXX.XXX.XXX
Command Line Interface is starting up, please wait ...
java.io.FileNotFoundException: /var/log/active/platform/log/cli.bin (Read-only file system)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
    :
    :
    :
        at org.apache.log4j.Category.info(Category.java:674)
        at sdMain.main(sdMain.java:611)
log4j:ERROR No output stream or file set for the appender named [CLI_LOG].

   Welcome to the Platform Command Line Interface
    WARNING:
        The /common file system is mounted read only.  <<<<<<<<<<<<<<<<<<

        Please use Recovery Disk to check the file system using fsck.
admin:

 

 

 

How to determine the current version of firmware on the hard drive

 

 

 

  • For MCS-7825-I4 and MCS-7828-I4, running Cisco UCM 7.1 and above, you can use the CLI command 'show hardware' to verify the firmware version.

 

 



admin:show hardware



HW Platform       : 7828I4

Processors        : 1

Type            
: Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz

CPU Speed         : 2660

Memory            : 8192
MBytes

Object ID         : 1.3.6.1.4.1.9.1.899

OS Version        : UCOS 4.0.0.0-34

Serial Number     : KQRBVVB



RAID Version      :

Raid firmware version: 1.26.81.00

Raid Bios version: 6.16.00.00



BIOS Information  :

IBM IBMBIOSVersion1.44-[M9E144AUS-1.44]- 06/11/2009



RAID Details      :

LSI Logic IR Configuration Utility 2.00.15

Read configuration has been initiated for controller 0

------------------------------------------------------------------------

Controller information

------------------------------------------------------------------------

  Controller
type                       
: SAS1064E

  BIOS
version                          
: 6.16.00.00

  Firmware version                      
: 1.26.81.00

  Channel
description                   
: 1 Serial Attached SCSI

  Initiator
ID                          
: 112

  Maximum physical
devices              
: 62

  Concurrent commands
supported           : 266

  Slot                                  
: 0


Bus                                   
: 1


Device                                
: 0


Function                              
: 0

  RAID
Support                          
: Yes

------------------------------------------------------------------------

IR Volume information

------------------------------------------------------------------------

IR volume 1

  Volume
ID                             
: 7

  Status of
volume                      
: Okay (OKY)

  RAID
level                            
: 1

  Size (in
MB)                          
: 237464

  Physical hard disks (Target
ID)         : 9 8

------------------------------------------------------------------------

Physical device information

------------------------------------------------------------------------

Initiator at ID #112

Target on ID #8

  Device is a Hard disk

  Enclosure
#                           
: 1

  Slot
#                                
: 1

  Target
ID                             
: 8


State                                 
: Online (ONL)

  Size (in MB)/(in
sectors)             
: 238475/488397168


Manufacturer                          
: ATA   

  Model
Number                          
: WD2502ABYS
-23B7A

  Firmware Revision                     
: 3B04


  Serial
No                             
:      WD-WCAT1D712130

  Drive
Type                            
: SATA

Target on ID #9

  Device is a Hard disk

  Enclosure
#                           
: 1

  Slot
#                                
: 0

  Target
ID                             
: 9


State                                 
: Online (ONL)

  Size (in MB)/(in
sectors)             
: 238475/488397168


Manufacturer                          
: ATA   

  Model Number                          
: WD2502ABYS
-23B7A

  Firmware
Revision                     
: 3B04


  Serial
No                             
:      WD-WCAT1D723848

  Drive
Type                            
: SATA

------------------------------------------------------------------------

Enclosure information

------------------------------------------------------------------------

Enclosure#                              
: 1

  Logical
ID                            
: 5005076b:0648afc0

  Numslots                              
: 4


StartSlot                             
: 0

  Start
TargetID                        
: 0

  Start
Bus                             
: 0

------------------------------------------------------------------------



 

The text highlited in red are the info you need.  This output shows a server with two drives with model  number WD2502ABYS on 3B04 firmware.  These drives should be upgraded as  soon as possible.

 

  • For MCS-7825-I4 and MCS-7828-I4 models running Cisco UCM versions previous to 7.1, as well as any version of Cisco UCM running on a MCS-7816-I4 model server, you must download and boot off of a CD burned with Cisco-HDD-FWUpdate-3.0.1-I.ISO (refer to the "Solution" section for links to download the ISO and readme).  Upon successful boot of the Cisco-HDD-FWUpdate-3.0.1-I.ISO CD, you will be presented with the current HDD FW version as well as the opportunity to upgrade to HDD FW version 02.03B06.

What should be done if filesystem issues persist after applying the patches?


As of 16 February 2011 if you encounter any further filesystem or hard drive issues after applying both the firmware and disk exerciser you should proceed to replace the affected drive(s).

 

There are three ways you can replace the drive(s).

  1. If you have an MCS server with an active Cisco support contract open a TAC service request.
  2. If you purchased the IBM x3250 M2 MCS equivalent and have an IBM support contract contact IBM support.
  3. If you do not have any support contract for the server you can purchase a new drive from Cisco or IBM.
    • The Cisco part number for the hard drive is HDD-7825-I4-250=.
    • Contact your IBM reseller to confirm the correct part number for the 250GB SATA simple-swap HD for the x3250 M2 server.

 

If you have any questions you can leave a comment on this document. The ibm-fs-failure@cisco.com email address is no longer active as of 1 September 2014.

Sending the email will not generate a TAC SR but will allow us to collect more information.  This is an informal submission with no associated SLA and we will make every effort to follow up submissions but cannot guarantee a response.

 

Related Defects

 

 

 

Related Links

Comments
garessespinosa
Community Member

Hello Phillip,

Thank you for the information. I've installed the patches successfully on one of our CUCM servers. I will patch the second one shortly.

I also have a couple of UCCX servers with the same affected hard drives. Although I don't recall either server going into read-only mode, I would like to patch them anyway.

However, I came across another bug doc that mentions that it may not be possible to install the Disk Exerciser utility on UCCX VOS systems - documented here.

What are your thoughts? Should I apply the HD FW update and skip the cop file installation as the workaround suggests?

Thank you in advance

Phillip Ratliff
Cisco Employee
Cisco Employee

If the UCCX servers have never gone RO then you can skip the disk exerciser utility and just do the hard disk firmware.

Hello Phillip,

Thank you for the information. I've installed the patches successfully on one of our CUCM servers. I will patch the second one shortly.

I also have a couple of UCCX servers with the same affected hard drives. Although I don't recall either server going into read-only mode, I would like to patch them anyway.

However, I came across another bug doc that mentions that it may not be possible to install the Disk Exerciser utility on UCCX VOS systems - documented here.

What are your thoughts? Should I apply the HD FW update and skip the cop file installation as the workaround suggests?

Thank you in advance

garessespinosa
Community Member

Sounds like a plan. Thank you for the quick reply, Phillip

Shawn Smith
Level 1
Level 1

Hello, I have UCM Pub and SUB (7816 I4).  After inputting the CLI command "Show Hardware" I do not receive the full output, I only receive what is displayed below.  Any ideas?  Thanks!

Show Hardware.jpg

hani_altaher
Level 1
Level 1

Does this affect 7845-I2 i have CUCM 7.1.5 then upgraded to 8.5 and i am facing same problem for CUCM, and CUP servers with same hardware.

thanks

Phillip Ratliff
Cisco Employee
Cisco Employee

The root cause of this issue does not apply to any 7835 or 7845 server.   For a 7845-I2 on 8.5 you are likely hitting CSCtq52199.  You will need to open up a TAC SR to get the cop.sgn file with the fix.

hani_altaher
Level 1
Level 1

thanks Ryan

which is most recomended solution ,taking bacup after recovery then reinstall fresh 8.5 and restore, or just install fix. the concern that system went on this mode 2 times i do not know if OS(/common) will be working well during production and not affecting the performance.

Phillip Ratliff
Cisco Employee
Cisco Employee

Any unexpected filesystem issue raises the possibility of corruption and the correct recovery procedure would be backup and reinstall/restore.  If you haven't had to run the filesystem check or if you have and it hasn't found any errors then you are likely safe.  If you've had to do a filesystem check to get the server to boot then you should proceed with the reinstall/restore.

Gentry
Level 1
Level 1

So does this bug also apply to 7825I5 platform?  I seem to have run into the same problem.

Command Line Interface is starting up, please wait ...
java.io.FileNotFoundException: /var/log/active/platform/log/cli.bin (Read-only file system)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
        at com.cisco.iptplatform.fappend.ciscoRollingFileAppender.restoreIndex(ciscoRollingFileAppender.java:100)
        at com.cisco.iptplatform.fappend.ciscoRollingFileAppender.setFile(ciscoRollingFileAppender.java:43)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.log4j.config.PropertySetter.setProperty(PropertySetter.java:196)
        at org.apache.log4j.config.PropertySetter.setProperty(PropertySetter.java:155)
        at org.apache.log4j.xml.DOMConfigurator.setParameter(DOMConfigurator.java:530)
        at org.apache.log4j.xml.DOMConfigurator.parseAppender(DOMConfigurator.java:182)
        at org.apache.log4j.xml.DOMConfigurator.findAppenderByName(DOMConfigurator.java:140)
        at org.apache.log4j.xml.DOMConfigurator.findAppenderByReference(DOMConfigurator.java:153)
        at org.apache.log4j.xml.DOMConfigurator.parseChildrenOfLoggerElement(DOMConfigurator.java:415)
        at org.apache.log4j.xml.DOMConfigurator.parseRoot(DOMConfigurator.java:384)
        at org.apache.log4j.xml.DOMConfigurator.parse(DOMConfigurator.java:783)
        at org.apache.log4j.xml.DOMConfigurator.doConfigure(DOMConfigurator.java:666)
        at org.apache.log4j.xml.DOMConfigurator.doConfigure(DOMConfigurator.java:616)
        at org.apache.log4j.xml.DOMConfigurator.doConfigure(DOMConfigurator.java:584)
        at org.apache.log4j.xml.DOMConfigurator.configure(DOMConfigurator.java:687)
        at sdMain.initialize(sdMain.java:479)
        at sdMain.main(sdMain.java:646)
java.lang.NullPointerException
        at com.cisco.iptplatform.fappend.ciscoRollingFileAppender.updateIndex(ciscoRollingFileAppender.java:117)
        at com.cisco.iptplatform.fappend.ciscoRollingFileAppender.nextFileName(ciscoRollingFileAppender.java:92)
        at com.cisco.iptplatform.fappend.ciscoRollingFileAppender.append(ciscoRollingFileAppender.java:74)
        at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:221)
        at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:57)
        at org.apache.log4j.Category.callAppenders(Category.java:187)
        at org.apache.log4j.Category.forcedLog(Category.java:372)
        at org.apache.log4j.Category.debug(Category.java:241)
        at com.cisco.iptplatform.cli.CliSettings.getInstance(CliSettings.java:106)
        at sdMain.initialize(sdMain.java:491)
        at sdMain.main(sdMain.java:646)
log4j:ERROR No output stream or file set for the appender named [CLI_LOG].

   Welcome to the Platform Command Line Interface

    WARNING:
        The /common file system is mounted read only.

        Please use Recovery Disk to check the file system using fsck.

admin:
admin:show hardware

HW Platform       : 7825I5
Processors        : 1
Type              : Intel(R) Xeon(R) CPU           X3430  @ 2.40GHz
CPU Speed         : 2400
Memory            : 4096 MBytes
Object ID         : 1.3.6.1.4.1.9.1.746
OS Version        : UCOS 4.0.0.0-44
Serial Number     : RAID Version      :
Raid firmware version: 1.27.86.00
Raid Bios version: 6.1a.00.00

BIOS Information  :
IBMCorp. -[GYE135AUS-1.06]- 05/18/2010

RAID Details      :
LSI Logic IR Configuration Utility 2.00.15
Read configuration has been initiated for controller 0
------------------------------------------------------------------------
Controller information
------------------------------------------------------------------------
  Controller type                         : SAS1064E
  BIOS version                            : 6.1a.00.00
  Firmware version                        : 1.27.86.00
  Channel description                     : 1 Serial Attached SCSI
  Initiator ID                            : 112
  Maximum physical devices                : 62
  Concurrent commands supported           : 277
  Slot                                    : 1
  Bus                                     : 1
  Device                                  : 0
  Function                                : 0
  RAID Support                            : Yes
------------------------------------------------------------------------
IR Volume information
------------------------------------------------------------------------
IR volume 1
  Volume ID                               : 1
  Status of volume                        : Okay (OKY)
  RAID level                              : 1
  Size (in MB)                            : 237464
  Physical hard disks (Target ID)         : 3 2
------------------------------------------------------------------------
Physical device information
------------------------------------------------------------------------
Initiator at ID #112
Target on ID #2
  Device is a Hard disk
  Enclosure #                             : 1
  Slot #                                  : 1
  Target ID                               : 2
  State                                   : Online (ONL)
  Size (in MB)/(in sectors)               : 238475/488397168
  Manufacturer                            : ATA    
  Model Number                            : WD2502ABYS-23B7A
  Firmware Revision                       : 3B07
  Serial No                               : WD-WCAT1H600816
  Drive Type                              : SATA
Target on ID #3
  Device is a Hard disk
  Enclosure #                             : 1
  Slot #                                  : 0
  Target ID                               : 3
  State                                   : Online (ONL)
  Size (in MB)/(in sectors)               : 238475/488397168
  Manufacturer                            : ATA    
  Model Number                            : WD2502ABYS-23B7A
  Firmware Revision                       : 3B07
  Serial No                               : WD-WCAT1H602165
  Drive Type                              : SATA
------------------------------------------------------------------------
Enclosure information
------------------------------------------------------------------------
Enclosure#                                : 1
  Logical ID                              : 50050760:1ec60070
  Numslots                                : 4
  StartSlot                               : 0
  Start TargetID                          : 0
  Start Bus                               : 0
------------------------------------------------------------------------
admin: utils create re
admin:utils create report h
admin:utils create report hardware ?
Syntax:
utils create report hardware
no parameters are required


             
admin:utils create report hardware

         *** WARNING ***
This process can take several minutes as the disk array, remote console,
system diagnostics and enviromental systems are probed for their current
values.

Continue (y/n)?y
Internal CLI failure
admin:

Phillip Ratliff
Cisco Employee
Cisco Employee

You have the right hard drives, but your firmware has the fix for this issue.   I'd recommend running the latest FWUCD to update everything to the latest and if it happens again gather a DSA and open a TAC SR.

Troy Hamilton wrote:

                       

So does this bug also apply to 7825I5 platform?  I seem to have run into the same problem.

   

  Model Number                            : WD2502ABYS-23B7A
  Firmware Revision                       : 3B07
 

             
admin:utils create report hardware

         *** WARNING ***
This process can take several minutes as the disk array, remote console,
system diagnostics and enviromental systems are probed for their current
values.

Continue (y/n)?y
Internal CLI failure
admin:

                   

The DSA failed when you tried it because it has to write to the hd to generate the report, and your filesystem was already readonly.  You'd need to reboot to recover then run it.

Gentry
Level 1
Level 1

Phillip, I'm trying to download the cd but it appears to require smartnet.  Unfortunately ours expired on our call manager and hasn't been renewed yet.  How can I download the cd?

Haytham Nassar
Level 1
Level 1

Hello,

I am facing a similar issue and I need to confirm if I am hitting the same bug or not. It happened before 3 monthes, and I fixed it using the recovery CD and it was fixed. Now it is happening again and we need a permanent solution.

It happens when the customer will not be able to take a backup. When he will login to the server, he will get the attached error.

Can you confirm if we applied the mentioned patches in your article it will resolve my problem ?CUCM Publisher Issue.jpg

Nishad Ismail
Level 1
Level 1

We have applied the workaround for the bug ,after the upgrade of firmware the Raid array status is showing "Resyncing" and on the HDD is showing "out of sync"

is there any way i can find out the progree of resync ? we are waiting for 2 hours stilll this not completed ?

Nishad Ismail
Level 1
Level 1

How we will find out the persentage completed the of the array "resyncing" and one of the HDD is showing the "out of sync" .

Please help me ...

We are getting the status from past 3 hours after update the firmware

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: