09-09-2010 07:51 AM - edited 03-12-2019 09:31 AM
Cisco Media Convergence Servers 7816-I4, 7825-I4 (and IBM x3250-M2 equivalent) and 7828-I4 have recently been experiencing technical issues. These servers are used by Cisco Unified Communications Manager and various other Cisco Collaboration software products.
The symptom is that the local disk drives' file-system goes into read-only mode, which can manifest as application services going down, the server becoming unresponsive via the network or the management interfaces, or worst case data corruption necessitating a reinstall and restore from backup.
Root cause has been identified by Cisco and its suppliers as a disk drive issue stemming from interaction with system firmware.
Field Notice 63374 has been published and includes more technical details regarding this issue. Cisco and its suppliers are committed to high quality and apologize for any disruptions or impact caused by this issue.
The file-system going read-only issue which has recently been affecting server models MCS-7816-I4, MCS-7825-I4, and MCS-7828-I4 (or their IBM equivilants) in the field is addressed by CSCti52867 - "IBM 7816-I4 and 782x-I4 READONLY file system".
The fix for CSCti52867 is now available and requires the application of two patch files. Install both of these patch files in the order listed below.
1. First install ciscocm.ibm-diskex-1.0.cop.sgn
The Readme file ciscocm.ibm-diskex-1.0.cop.sgn includes installation instructions for this .cop.sgn.
Make sure to only install this utility when show hardware CLI output indicates the array is in a healthy state.
If your server has never had the filesystem go readonly then this step is optional.
2. Next install Cisco-HDD-FWUpdate-3.0.1-I.ISO .
The Readme file Cisco-HDD-FWUpdate-3.0.1-I.Readme.pdf includes installation instructions for this ISO.
This installer is completely independant of the OS installed on the server.
Note: Installing the FWUpdate v3.0(1) or later will get you firmware with the fix for this defect. It is always recommended that you apply the latest FWUCD available for your server.
Refer to the Release Note of CSCti52867 and the Readme file for each of the above mentioned patch files for more details.
EXT3-fs error (device sda6) in start_transaction: Jornal has aborted
Last login: Mon Aug X XX:XX:XX XXXX from XXX.XXX.XXX.XXX Command Line Interface is starting up, please wait ... java.io.FileNotFoundException: /var/log/active/platform/log/cli.bin (Read-only file system) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) : : : at org.apache.log4j.Category.info(Category.java:674) at sdMain.main(sdMain.java:611) log4j:ERROR No output stream or file set for the appender named [CLI_LOG]. Welcome to the Platform Command Line Interface WARNING: The /common file system is mounted read only. <<<<<<<<<<<<<<<<<< Please use Recovery Disk to check the file system using fsck. admin:
For MCS-7825-I4 and MCS-7828-I4, running Cisco UCM 7.1 and above, you can use the CLI command 'show hardware' to verify the firmware version.
admin:show hardware
HW Platform : 7828I4
Processors : 1
Type
: Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
CPU Speed : 2660
Memory : 8192
MBytes
Object ID : 1.3.6.1.4.1.9.1.899
OS Version : UCOS 4.0.0.0-34
Serial Number : KQRBVVB
RAID Version :
Raid firmware version: 1.26.81.00
Raid Bios version: 6.16.00.00
BIOS Information :
IBM IBMBIOSVersion1.44-[M9E144AUS-1.44]- 06/11/2009
RAID Details :
LSI Logic IR Configuration Utility 2.00.15
Read configuration has been initiated for controller 0
------------------------------------------------------------------------
Controller information
------------------------------------------------------------------------
Controller
type
: SAS1064E
BIOS
version
: 6.16.00.00
Firmware version
: 1.26.81.00
Channel
description
: 1 Serial Attached SCSI
Initiator
ID
: 112
Maximum physical
devices
: 62
Concurrent commands
supported : 266
Slot
: 0
Bus
: 1
Device
: 0
Function
: 0
RAID
Support
: Yes
------------------------------------------------------------------------
IR Volume information
------------------------------------------------------------------------
IR volume 1
Volume
ID
: 7
Status of
volume
: Okay (OKY)
RAID
level
: 1
Size (in
MB)
: 237464
Physical hard disks (Target
ID) : 9 8
------------------------------------------------------------------------
Physical device information
------------------------------------------------------------------------
Initiator at ID #112
Target on ID #8
Device is a Hard disk
Enclosure
#
: 1
Slot
#
: 1
Target
ID
: 8
State
: Online (ONL)
Size (in MB)/(in
sectors)
: 238475/488397168
Manufacturer
: ATA
Model
Number
: WD2502ABYS-23B7A
Firmware Revision
: 3B04
Serial
No
: WD-WCAT1D712130
Drive
Type
: SATA
Target on ID #9
Device is a Hard disk
Enclosure
#
: 1
Slot
#
: 0
Target
ID
: 9
State
: Online (ONL)
Size (in MB)/(in
sectors)
: 238475/488397168
Manufacturer
: ATA
Model Number
: WD2502ABYS-23B7A
Firmware
Revision
: 3B04
Serial
No
: WD-WCAT1D723848
Drive
Type
: SATA
------------------------------------------------------------------------
Enclosure information
------------------------------------------------------------------------
Enclosure#
: 1
Logical
ID
: 5005076b:0648afc0
Numslots
: 4
StartSlot
: 0
Start
TargetID
: 0
Start
Bus
: 0
------------------------------------------------------------------------
The text highlited in red are the info you need. This output shows a server with two drives with model number WD2502ABYS on 3B04 firmware. These drives should be upgraded as soon as possible.
As of 16 February 2011 if you encounter any further filesystem or hard drive issues after applying both the firmware and disk exerciser you should proceed to replace the affected drive(s).
There are three ways you can replace the drive(s).
If you have any questions you can leave a comment on this document. The ibm-fs-failure@cisco.com email address is no longer active as of 1 September 2014.
Sending the email will not generate a TAC SR but will allow us to collect more information. This is an informal submission with no associated SLA and we will make every effort to follow up submissions but cannot guarantee a response.
The Readme for the 3.6(1) FWUCD shows that it includes hard drive firmware 3B05. While it is a good idea to keep all of the firmware on the server up to date you will still need to run the standalone 3B06 firmware update in addition to the ciscocm.ibm-diskex-1.0.cop.sgn file to get the complete fix for CSCti52867.
Since a reboot is required to apply the 3B06 firmware that presents an ideal time to apply the FWUCD as well.
-Ryan
we hit the bug CSCti52867.
I found a new version 3.6.1 ,
FWUCD-3.6.1-I.iso | |
Release Date: 30/NOV/2010 | |
Size: 334360.00 KB (342384640 bytes) |
questions:
1) can some one tell me if I should use this 3.6 or the old 3.0.1.
2) if use the 3.6 I have to use first the ciscocm.ibm-diskex-1.0.cop.sgn or with version 3.6 is not necesary?
kind regards,
Ryan,
We have sorted this out with TAC and your help. Since the R/O isse has not yet occured on the UCCX 8.0 Server, we will just upgrade the firmware. Nonetheless, you might want to reference the defects that were opened while resolving the SR:
CSCtn17205 Drive Exerciser Utility can not be installed on UCCX VOS platform
CSCti28336 Need document using the recovery CD when file system mounted read only
/David
Ryan,
We recently had an issue with a UCCX 7.x installation, running on a 7816-I4 (and Windows, of course):
- Windows Event Logs every now and then showed "bad blocks", and the server restarted itself automatically
- In one case, the server was unresponsive and manually had to be powered off and on in order to restore its function.
- I attributed it to a bad Drive and had the HDD RMAed.
My questions:
- The info on this R/O filesystem issue with respect to Windows OS seems a bit vague. Any chance this was related to the issue we're discussing here?
- Is HDD FW Upgrade supported on MCSes running Windows OS?
- Would you even recommend it?
- How do you accomplish this (step-by-step instructions)
Thanks for your help
Thanks David I've updated the document with the bugs you cited. Thanks for your patience working through the UCCX issues.
We never saw any complaints of this issue on a Windows server so I cannot confirm or refute that the issue you saw was due to this problem or not. I would encourage anyone with one of these servers to apply the hard drive firmware update regardless of whether you have seen issues. The installer is a self contained patch utility from IBM that does not rely on any data on the hard drives. It can be run on a server with no OS at all.
If you are seeing bad blocks reported on a hard drive from Windows then I would replace the drive regardless of firmware. You can also confirm using the IBM DSA utility if the drive is showing SMART errors.
Ryan,
We recently had an issue with a UCCX 7.x installation, running on a 7816-I4 (and Windows, of course):
- Windows Event Logs every now and then showed "bad blocks", and the server restarted itself automatically
- In one case, the server was unresponsive and manually had to be powered off and on in order to restore its function.
- I attributed it to a bad Drive and had the HDD RMAed.
My questions:
- The info on this R/O filesystem issue with respect to Windows OS seems a bit vague. Any chance this was related to the issue we're discussing here?
- Is HDD FW Upgrade supported on MCSes running Windows OS?
- Would you even recommend it?
- How do you accomplish this (step-by-step instructions)
Thanks for your help
Yesterday I've also hit the "read-only"-issue when trying to upgrade to v8.5.1 of CUCM, nevertheless I have already installed the B06-Firmware fix in December 2010. So I contacted Cisco TAC and as described above I requested the new HDD's to replace the old ones.
In the meantime I would like to know if there is any possiblity for me to get the CUCM work again until the new HDD's are delivered? I'm a little bit afraid of trying a simple restart of the CUCM, because now mostly eighty percent of our phones are working because they were logged in with Extension Mobility when the error occured. May be after the restart no one will be working in appropriate function, because login with ExMo is not available. I would be pleased if someone could give me any good advice to follow.
Kind regards from Germany,
Andi
Most of the time simply rebooting the server will recover it. If this doesn't work for you then a filesystem check may get you up enough to proceed but you may be stuck until you get the HDD(s) replaced.
Yesterday I've also hit the "read-only"-issue when trying to upgrade to v8.5.1 of CUCM, nevertheless I have already installed the B06-Firmware fix in December 2010. So I contacted Cisco TAC and as described above I requested the new HDD's to replace the old ones.
In the meantime I would like to know if there is any possiblity for me to get the CUCM work again until the new HDD's are delivered? I'm a little bit afraid of trying a simple restart of the CUCM, because now mostly eighty percent of our phones are working because they were logged in with Extension Mobility when the error occured. May be after the restart no one will be working in appropriate function, because login with ExMo is not available. I would be pleased if someone could give me any good advice to follow.
Kind regards from Germany,
Andi
Phillip,
I followed your advice and made a reset (right at the front of the machine) of the cucm. Now Ex-Mo and any other services are working again! So I'm very happy and have to thank you very, very much
The only thing that I'm missing now is the inactive partition in the "CUCM OS => Settings => Version windows" window. Normally there should be the option to switch to the displayed inactive partition with V.x.x.x- installed on it. May be you have for this any useful advice.
Kind regards,
Andreas
If it happened during an upgrade then it's likely your inactive partition got wiped in preparation for it to be come the new active partition.
Phillip,
I followed your advice and made a reset (right at the front of the machine) of the cucm. Now Ex-Mo and any other services are working again! So I'm very happy and have to thank you very, very much
The only thing that I'm missing now is the inactive partition in the "CUCM OS => Settings => Version windows" window. Normally there should be the option to switch to the displayed inactive partition with V.x.x.x- installed on it. May be you have for this any useful advice.
Kind regards,
Andreas
I Have same issue with 7828I3, is there any HDD firmware update for l3 ?
HW Platform : 7828I3
Processors : 1
Type : Family: Core 2
CPU Speed : 2130
Memory : 6144 MBytes
Object ID : 1.3.6.1.4.1.9.1.899
OS Version : UCOS 4.0.0.0-44
Serial Number : KQFZKTV
RAID Version :
RAID Firmware Version: 1.18.83.00
RAID BIOS Version: 6.0e.00.00
BIOS Information :
1.45
RAID Details :
LSI Logic IR Configuration Utility 2.00.15
Read configuration has been initiated for controller 0
------------------------------------------------------------------------
Controller information
------------------------------------------------------------------------
Controller type : SAS1064E
BIOS version : 6.0e.00.00
Firmware version : 1.18.83.00
Channel description : 1 Serial Attached SCSI
Initiator ID : 112
Maximum physical devices : 62
Concurrent commands supported : 511
Slot : 0
Bus : 5
Device : 0
Function : 0
RAID Support : Yes
------------------------------------------------------------------------
IR Volume information
------------------------------------------------------------------------
IR volume 1
Volume ID : 0
Status of volume : Resyncing (RSY)
RAID level : 1
Size (in MB) : 237464
Physical hard disks (Target ID) : 4 1
------------------------------------------------------------------------
Physical device information
------------------------------------------------------------------------
Initiator at ID #112
Target on ID #1
Device is a Hard disk
Enclosure # : 1
Slot # : 1
Target ID : 1
State : Out of Sync (OSY)
Size (in MB)/(in sectors) : 238475/488397168
Manufacturer : ATA
Model Number : WD2502ABYS-23B7A
Firmware Revision : 3B02
Serial No : WD-WCAT14597708
Drive Type : SATA
Target on ID #4
Device is a Hard disk
Enclosure # : 1
Slot # : 0
Target ID : 4
State : Online (ONL)
Size (in MB)/(in sectors) : 238475/488397168
Manufacturer : ATA
Model Number : WD2502ABYS-23B7A
Firmware Revision : 3B02
Serial No : WD-WCAT14631340
Drive Type : SATA
I wasn't aware the 7828I3 had those drives but yours sure does. The firmware update is specific to the hard drives, not the server so you should be able to apply it successfully to your server.
If your system went readonly then you should also install the cop.sgn file just as if you had an I4.
Thanks Philip,
I was able to install both cop and hdd update on 7828-l3,
Initiator at ID #112
Target on ID #1
Device is a Hard disk
Enclosure # : 1
Slot # : 1
Target ID : 1
State : Out of Sync (OSY)
Size (in MB)/(in sectors) : 238475/488397168
Manufacturer : ATA
Model Number : WD2502ABYS-23B7A
Firmware Revision : 3B06
Serial No : WD-WCAT14597708
Drive Type : SATA
Target on ID #4
Device is a Hard disk
Enclosure # : 1
Slot # : 0
Target ID : 4
State : Online (ONL)
Size (in MB)/(in sectors) : 238475/488397168
Manufacturer : ATA
Model Number : WD2502ABYS-23B7A
Firmware Revision : 3B06
Serial No : WD-WCAT14631340
Drive Type : SATA
At this point you are the first I've heard of to report this on an I3 but I'll keep an eye out for others and update the document accordingly.
I noticed you array is out of sync. This is expected after applying the firmware update but the cop.sgn file should be run when the array sync is finished, otherwise it only runs on the single drive.
So I should keep checking the array status once it synced I should run the cop file again ?
Phillip
Will the fix for the issue be integrated into CUCM installation disks in order to relieve customers from having to separately upgrade the server firmware?
Tim
Unfortunately we don't have the ability to update hd firmware during software or OS install so it needs to be done via the FWUCD.
Phillip
Will the fix for the issue be integrated into CUCM installation disks in order to relieve customers from having to separately upgrade the server firmware?
Tim
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: