09-09-2010 07:51 AM - edited 03-12-2019 09:31 AM
Cisco Media Convergence Servers 7816-I4, 7825-I4 (and IBM x3250-M2 equivalent) and 7828-I4 have recently been experiencing technical issues. These servers are used by Cisco Unified Communications Manager and various other Cisco Collaboration software products.
The symptom is that the local disk drives' file-system goes into read-only mode, which can manifest as application services going down, the server becoming unresponsive via the network or the management interfaces, or worst case data corruption necessitating a reinstall and restore from backup.
Root cause has been identified by Cisco and its suppliers as a disk drive issue stemming from interaction with system firmware.
Field Notice 63374 has been published and includes more technical details regarding this issue. Cisco and its suppliers are committed to high quality and apologize for any disruptions or impact caused by this issue.
The file-system going read-only issue which has recently been affecting server models MCS-7816-I4, MCS-7825-I4, and MCS-7828-I4 (or their IBM equivilants) in the field is addressed by CSCti52867 - "IBM 7816-I4 and 782x-I4 READONLY file system".
The fix for CSCti52867 is now available and requires the application of two patch files. Install both of these patch files in the order listed below.
1. First install ciscocm.ibm-diskex-1.0.cop.sgn
The Readme file ciscocm.ibm-diskex-1.0.cop.sgn includes installation instructions for this .cop.sgn.
Make sure to only install this utility when show hardware CLI output indicates the array is in a healthy state.
If your server has never had the filesystem go readonly then this step is optional.
2. Next install Cisco-HDD-FWUpdate-3.0.1-I.ISO .
The Readme file Cisco-HDD-FWUpdate-3.0.1-I.Readme.pdf includes installation instructions for this ISO.
This installer is completely independant of the OS installed on the server.
Note: Installing the FWUpdate v3.0(1) or later will get you firmware with the fix for this defect. It is always recommended that you apply the latest FWUCD available for your server.
Refer to the Release Note of CSCti52867 and the Readme file for each of the above mentioned patch files for more details.
EXT3-fs error (device sda6) in start_transaction: Jornal has aborted
Last login: Mon Aug X XX:XX:XX XXXX from XXX.XXX.XXX.XXX Command Line Interface is starting up, please wait ... java.io.FileNotFoundException: /var/log/active/platform/log/cli.bin (Read-only file system) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) : : : at org.apache.log4j.Category.info(Category.java:674) at sdMain.main(sdMain.java:611) log4j:ERROR No output stream or file set for the appender named [CLI_LOG]. Welcome to the Platform Command Line Interface WARNING: The /common file system is mounted read only. <<<<<<<<<<<<<<<<<< Please use Recovery Disk to check the file system using fsck. admin:
For MCS-7825-I4 and MCS-7828-I4, running Cisco UCM 7.1 and above, you can use the CLI command 'show hardware' to verify the firmware version.
admin:show hardware
HW Platform : 7828I4
Processors : 1
Type
: Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
CPU Speed : 2660
Memory : 8192
MBytes
Object ID : 1.3.6.1.4.1.9.1.899
OS Version : UCOS 4.0.0.0-34
Serial Number : KQRBVVB
RAID Version :
Raid firmware version: 1.26.81.00
Raid Bios version: 6.16.00.00
BIOS Information :
IBM IBMBIOSVersion1.44-[M9E144AUS-1.44]- 06/11/2009
RAID Details :
LSI Logic IR Configuration Utility 2.00.15
Read configuration has been initiated for controller 0
------------------------------------------------------------------------
Controller information
------------------------------------------------------------------------
Controller
type
: SAS1064E
BIOS
version
: 6.16.00.00
Firmware version
: 1.26.81.00
Channel
description
: 1 Serial Attached SCSI
Initiator
ID
: 112
Maximum physical
devices
: 62
Concurrent commands
supported : 266
Slot
: 0
Bus
: 1
Device
: 0
Function
: 0
RAID
Support
: Yes
------------------------------------------------------------------------
IR Volume information
------------------------------------------------------------------------
IR volume 1
Volume
ID
: 7
Status of
volume
: Okay (OKY)
RAID
level
: 1
Size (in
MB)
: 237464
Physical hard disks (Target
ID) : 9 8
------------------------------------------------------------------------
Physical device information
------------------------------------------------------------------------
Initiator at ID #112
Target on ID #8
Device is a Hard disk
Enclosure
#
: 1
Slot
#
: 1
Target
ID
: 8
State
: Online (ONL)
Size (in MB)/(in
sectors)
: 238475/488397168
Manufacturer
: ATA
Model
Number
: WD2502ABYS-23B7A
Firmware Revision
: 3B04
Serial
No
: WD-WCAT1D712130
Drive
Type
: SATA
Target on ID #9
Device is a Hard disk
Enclosure
#
: 1
Slot
#
: 0
Target
ID
: 9
State
: Online (ONL)
Size (in MB)/(in
sectors)
: 238475/488397168
Manufacturer
: ATA
Model Number
: WD2502ABYS-23B7A
Firmware
Revision
: 3B04
Serial
No
: WD-WCAT1D723848
Drive
Type
: SATA
------------------------------------------------------------------------
Enclosure information
------------------------------------------------------------------------
Enclosure#
: 1
Logical
ID
: 5005076b:0648afc0
Numslots
: 4
StartSlot
: 0
Start
TargetID
: 0
Start
Bus
: 0
------------------------------------------------------------------------
The text highlited in red are the info you need. This output shows a server with two drives with model number WD2502ABYS on 3B04 firmware. These drives should be upgraded as soon as possible.
As of 16 February 2011 if you encounter any further filesystem or hard drive issues after applying both the firmware and disk exerciser you should proceed to replace the affected drive(s).
There are three ways you can replace the drive(s).
If you have any questions you can leave a comment on this document. The ibm-fs-failure@cisco.com email address is no longer active as of 1 September 2014.
Sending the email will not generate a TAC SR but will allow us to collect more information. This is an informal submission with no associated SLA and we will make every effort to follow up submissions but cannot guarantee a response.
You are correct that the F2 option was removed. We are looking for the replacement application and I will update the document when we have a known working procedure.
The DSA utility is installed with 6.1 however a bug with the permissions prevents it from being run from the CLI. TAC can use a remote support account to run the file manually but without going through TAC the only option is to use the bootable DSA utility.
There is no way with F2 during POST to get into the drive's self diagnostic application.
We bought the MCS7816I4-K9-CMB2 with CUCM6.1 preinstalled.
Can it be that the Dynamic System Analysis (DSA) is not preinstalled for this machine?
Is it the only way to take the ibm_fw_dsa_3.10_anyos.iso?
(I have no support contract on the server from IBM or Cisco)
There are no informations about IBM part number for the drive with MCS7816-I4
I could read from another discussion that there is no equivalent between MCS7816-I4 and IBM Server
Can I take as equivalent the Server IBM x3250-M2 with IBM part number for the drive: 39M4509 ?
Using the 7825I4 equivalent is exactly the right thing to do. All three of these servers use the exact same hard drive.
For the HD self diagnostic I'm getting ready to update the wiki but you need to use the standalone IBM DSA to run the diagnostic.
(I have no support contract on the server from IBM or Cisco)
There are no informations about IBM part number for the drive with MCS7816-I4
I could read from another discussion that there is no equivalent between MCS7816-I4 and IBM Server
Can I take as equivalent the Server IBM x3250-M2 with IBM part number for the drive: 39M4509 ?
I have used the bootable DSA utility.
I did "1c. Run HD Self Diagnostic test" the result was that the Test was passed.
I have collected the Inventary:
http://rapidshare.com/files/429589229/4194PBP_KQVBBLN_20101105-125520.txt
But the Version of the HDDs Firmware seems to be Revision 02.0 So it is not very similar to yours...
I had a look to the System Logs - messages from that day of failure and could see this messages:
Error : ata1: translated ATA stat/err 0x51/10 to SCSI SK/ASC/ASCQ 0xb/14/00
Warning : ata1: status=0x51 { DriveReady SeekComplete Error }
Warning : ata1: error=0x10 { SectorIdNotFound }
I have used the bootable DSA utility.
I did "1c. Run HD Self Diagnostic test" the result was that the Test was passed.
I have collected the Inventary:http://rapidshare.com/files/429589229/4194PBP_KQVBBLN_20101105-125520.txt
But the Version of the HDDs Firmware seems to be Revision 02.0 So it is not very similar to yours...
I had a look to the System Logs - messages from that day of failure and could see this messages:
Error : ata1: translated ATA stat/err 0x51/10 to SCSI SK/ASC/ASCQ 0xb/14/00
Warning : ata1: status=0x51 { DriveReady SeekComplete Error }
Warning : ata1: error=0x10 { SectorIdNotFound }
Please email the .gz file and the messages* file(s) from the server to ibm-fs-failure@cisco.com so we can get them logged and see if your errors match the other reports.
Marcus,
We seem to be experiencing issues the same issues you are experiencing with the same make and model of server. I have also checked the system logs and we receive the same error messages you are getting. In addition to this IBM have swapped out the hard disk now the second time and we now cannot seem to do a restore to the server where by it just hangs. Because of these issues we are currently running on the faulty drive at the moment, but it has revision 02.0 firmware.
Has anyone else seen this issue and if so would to fix be to place a different model of hard drive if the firmware is at fault?
I've completed step 1 and still have the same issue (it's a branded Cisco IBM 7828) and have been working with Cisco on the issue. Nothing like rebooting your production server every few days (if I don't it will crash usually within 5 to 7 days). This is ridiculous and can't believe that Cisco won't provide me with a new server that actually works. Two months of this broken hardware and I shouldn't have to baby sit and reboot our server because of faulty hardware... (I'm done ranting).
I'm now onto step 1b and cannot seem to get step format the USB drive. I've installed the application as mentioned. I get the following error message when pressing the "Start" button to format the USB Thumb Drive:
"The user-supplied DOS system files are not compatible with FAT32"
I've pointed the DOS system files to the ones supplied (freedos). It also doesn't give me an option of just "FAT" as described above. The only two options are FAT32 & NTFS. I've tried this on three different OS (Windows 7 64bit, Windows Vista 32 bit, and Windows XP 32 bit) and all three give the exact same error message.
Has anyone been able to get 1b to work and if so what am I missing.
Phillip,
We have just recieved a replacement server from yourselves and because I didn't know what hd firmware was running I decided to run a DSA well half way through loading this up it tries to read via ata3 which is the sata disk. This then fails because the communication is too slow. It then tries to transmit at a lower speed which again fails and keeps on repeating this I then see I/B errors and SCSI errors. I have tried running version 3.02 of the DSA and also 3.10 and both version do the same.
Because of this i'm unsure whether the continue rebuilding the server as it doesn't give me much confidence if the DSA fails to load due to these errors. To note from what I can tell the revision of the hard disk is 02.0.
networkdefence,
Cisco posted a new Cisco-HDD-FWUpdate-3.0.1-I.ISO yesterday November 15, 2010 which includes new HDD firmware that prevents the issue caused by "CSCti52867 - IBM 7816-I4 and 782x-I4 READONLY file system".
Therefore, please upgrade your HDD FW using this new "Cisco Standalone HDD Firmware Update Version 3.0".
The read-me for the "Cisco Standalone HDD Firmware Update Version 3.0" may downloaded here.
After updating the firmware, please post your results.
Which Cisco UC application will be installed on this server?
I did some research on the USB and FAT not showing up. In most cases, if your USB Drive is bigger than 2 gigs it will have issues (mine was 4 gigs). I purchased a Dane-Elec 2GB USB Drive from Target for $8. This allowed the FAT to show up and format the drive without any issues.
networkdefence,
Cisco posted a new Cisco-HDD-FWUpdate-3.0.1-I.ISO yesterday November 15, 2010 which includes new HDD firmware that prevents the issue caused by "CSCti52867 - IBM 7816-I4 and 782x-I4 READONLY file system".
Therefore, please upgrade your HDD FW using this new "Cisco Standalone HDD Firmware Update Version 3.0".
The read-me for the "Cisco Standalone HDD Firmware Update Version 3.0" may downloaded here.
After updating the firmware, please post your results.
Which Cisco UC application will be installed on this server?
I have a 7828-I4. In the table listed it indicates that HDD Product ID should be "
WD2502ABYS-23B7A0".
However, when I run the "utils create report hardware" it shows the following (basically, the 0 is missing at the end).
| ModelNumber |WD2502ABYS-23B7A |
| FirmwareRevision |3B05 |
Is this just at typo in the reference chart in the readme.pdf file?
Cam,
Thanks for the update regarding the size of the USB key and the 2GB size limitation. I'm glad that you were able to proceed.
In regards to your question, WD2502ABYS-23B7A is the affected HDD Product ID and the readme file will be updated accordingly. So you can proceed with the HDD update.
Cam,
Thanks again for providing your 2GB USB key finding! Step 1b has been updated with this recommendation. I also sympathize with your frustration regarding this issue. Please rest assured that this issue is top-of-mind and focus for many Cisco resources and suppliers who are working diligently to deliver a resolution for this problem.
As a follow-on to the previous recommendation of applying the new HDD firmware bundled in Cisco-HDD-FWUpdate-3.0.1-I.ISO, you should ALSO apply the newly published ciscocm.ibm-diskex-1.0.cop.sgn.
The Readme file ciscocm.ibm-diskex-1.0.cop.sgn-Readme.html includes installation instructions for the .cop.sgn.
I have completed 1b successfully and have the log file.
I also applied the firmware update that was made available and it now shows that I'm running firmware revision 3B06 instead of 3B05.
The system is running and we'll see how long it runs before it crashes (hopefully it won't, but only time will tell).
If you need me to post my logs or anything please let me know. I did attach it to my TAC that I have open.
I also ran 1b after doing the firmware as well. Not sure if that would be helpful. I did not attach that to my TAC, but thought I would mention it.
Cam,
Thanks again for providing your 2GB USB key finding! Step 1b has been updated with this recommendation. I also sympathize with your frustration regarding this issue. Please rest assured that this issue is top-of-mind and focus for many Cisco resources and suppliers who are working diligently to deliver a resolution for this problem.
As a follow-on to the previous recommendation of applying the new HDD firmware bundled in Cisco-HDD-FWUpdate-3.0.1-I.ISO, you should ALSO apply the newly published ciscocm.ibm-diskex-1.0.cop.sgn.
The Readme file ciscocm.ibm-diskex-1.0.cop.sgn-Readme.html includes installation instructions for the .cop.sgn.
Thanks for mentioning the .cop.sgn file. It is in process of applying right now. From what I read it sounds like that it shouldn't matter that I'm apply after I did the firmware. It this isn' the case let me know and what I need to do. Thanks.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: