09-09-2010 07:51 AM - edited 03-12-2019 09:31 AM
Cisco Media Convergence Servers 7816-I4, 7825-I4 (and IBM x3250-M2 equivalent) and 7828-I4 have recently been experiencing technical issues. These servers are used by Cisco Unified Communications Manager and various other Cisco Collaboration software products.
The symptom is that the local disk drives' file-system goes into read-only mode, which can manifest as application services going down, the server becoming unresponsive via the network or the management interfaces, or worst case data corruption necessitating a reinstall and restore from backup.
Root cause has been identified by Cisco and its suppliers as a disk drive issue stemming from interaction with system firmware.
Field Notice 63374 has been published and includes more technical details regarding this issue. Cisco and its suppliers are committed to high quality and apologize for any disruptions or impact caused by this issue.
The file-system going read-only issue which has recently been affecting server models MCS-7816-I4, MCS-7825-I4, and MCS-7828-I4 (or their IBM equivilants) in the field is addressed by CSCti52867 - "IBM 7816-I4 and 782x-I4 READONLY file system".
The fix for CSCti52867 is now available and requires the application of two patch files. Install both of these patch files in the order listed below.
1. First install ciscocm.ibm-diskex-1.0.cop.sgn
The Readme file ciscocm.ibm-diskex-1.0.cop.sgn includes installation instructions for this .cop.sgn.
Make sure to only install this utility when show hardware CLI output indicates the array is in a healthy state.
If your server has never had the filesystem go readonly then this step is optional.
2. Next install Cisco-HDD-FWUpdate-3.0.1-I.ISO .
The Readme file Cisco-HDD-FWUpdate-3.0.1-I.Readme.pdf includes installation instructions for this ISO.
This installer is completely independant of the OS installed on the server.
Note: Installing the FWUpdate v3.0(1) or later will get you firmware with the fix for this defect. It is always recommended that you apply the latest FWUCD available for your server.
Refer to the Release Note of CSCti52867 and the Readme file for each of the above mentioned patch files for more details.
EXT3-fs error (device sda6) in start_transaction: Jornal has aborted
Last login: Mon Aug X XX:XX:XX XXXX from XXX.XXX.XXX.XXX Command Line Interface is starting up, please wait ... java.io.FileNotFoundException: /var/log/active/platform/log/cli.bin (Read-only file system) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) : : : at org.apache.log4j.Category.info(Category.java:674) at sdMain.main(sdMain.java:611) log4j:ERROR No output stream or file set for the appender named [CLI_LOG]. Welcome to the Platform Command Line Interface WARNING: The /common file system is mounted read only. <<<<<<<<<<<<<<<<<< Please use Recovery Disk to check the file system using fsck. admin:
For MCS-7825-I4 and MCS-7828-I4, running Cisco UCM 7.1 and above, you can use the CLI command 'show hardware' to verify the firmware version.
admin:show hardware
HW Platform : 7828I4
Processors : 1
Type
: Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
CPU Speed : 2660
Memory : 8192
MBytes
Object ID : 1.3.6.1.4.1.9.1.899
OS Version : UCOS 4.0.0.0-34
Serial Number : KQRBVVB
RAID Version :
Raid firmware version: 1.26.81.00
Raid Bios version: 6.16.00.00
BIOS Information :
IBM IBMBIOSVersion1.44-[M9E144AUS-1.44]- 06/11/2009
RAID Details :
LSI Logic IR Configuration Utility 2.00.15
Read configuration has been initiated for controller 0
------------------------------------------------------------------------
Controller information
------------------------------------------------------------------------
Controller
type
: SAS1064E
BIOS
version
: 6.16.00.00
Firmware version
: 1.26.81.00
Channel
description
: 1 Serial Attached SCSI
Initiator
ID
: 112
Maximum physical
devices
: 62
Concurrent commands
supported : 266
Slot
: 0
Bus
: 1
Device
: 0
Function
: 0
RAID
Support
: Yes
------------------------------------------------------------------------
IR Volume information
------------------------------------------------------------------------
IR volume 1
Volume
ID
: 7
Status of
volume
: Okay (OKY)
RAID
level
: 1
Size (in
MB)
: 237464
Physical hard disks (Target
ID) : 9 8
------------------------------------------------------------------------
Physical device information
------------------------------------------------------------------------
Initiator at ID #112
Target on ID #8
Device is a Hard disk
Enclosure
#
: 1
Slot
#
: 1
Target
ID
: 8
State
: Online (ONL)
Size (in MB)/(in
sectors)
: 238475/488397168
Manufacturer
: ATA
Model
Number
: WD2502ABYS-23B7A
Firmware Revision
: 3B04
Serial
No
: WD-WCAT1D712130
Drive
Type
: SATA
Target on ID #9
Device is a Hard disk
Enclosure
#
: 1
Slot
#
: 0
Target
ID
: 9
State
: Online (ONL)
Size (in MB)/(in
sectors)
: 238475/488397168
Manufacturer
: ATA
Model Number
: WD2502ABYS-23B7A
Firmware
Revision
: 3B04
Serial
No
: WD-WCAT1D723848
Drive
Type
: SATA
------------------------------------------------------------------------
Enclosure information
------------------------------------------------------------------------
Enclosure#
: 1
Logical
ID
: 5005076b:0648afc0
Numslots
: 4
StartSlot
: 0
Start
TargetID
: 0
Start
Bus
: 0
------------------------------------------------------------------------
The text highlited in red are the info you need. This output shows a server with two drives with model number WD2502ABYS on 3B04 firmware. These drives should be upgraded as soon as possible.
As of 16 February 2011 if you encounter any further filesystem or hard drive issues after applying both the firmware and disk exerciser you should proceed to replace the affected drive(s).
There are three ways you can replace the drive(s).
If you have any questions you can leave a comment on this document. The ibm-fs-failure@cisco.com email address is no longer active as of 1 September 2014.
Sending the email will not generate a TAC SR but will allow us to collect more information. This is an informal submission with no associated SLA and we will make every effort to follow up submissions but cannot guarantee a response.
If you had tried the original firmware update on a 7816 and it didn't work there has been a new one posted that will. The links in the document all point to the new one.
This document talk about server under linux-based server, but what happen for the Windows-based servers like Cisco UCCX or Cisco Unity ?
I have numbers of these server in my customers, I meet 2 time this bug on CUCM, but I am afraid to see it on UCCX and UNITY server, which are windows based...
This document talk about server under linux-based server, but what happen for the Windows-based servers like Cisco UCCX or Cisco Unity ?
I have numbers of these server in my customers, I meet 2 time this bug on CUCM, but I am afraid to see it on UCCX and UNITY server, which are windows based...
This is a very good question.
I don't think we've seen it yet on any Windows based system. My personal guess is because Windows doesn't respond to the underlying hardware issue or timeout in the same way Linux does. The filesystem going readonly is the OS' way of protecting itself in response to an issue with the disk subsystem. Windows may not have a similar protection mechanism.
I got the same issue, but I'm already to the hard disk firmware 3b05 from factory.
I did the recovery disk and found no errors.
After restart, i'm good for 20-24 hours and then the system goes in read-only file system.
Any other things to try before a full reinstall ?
Thanks
I got the same issue, but I'm already to the hard disk firmware 3b05 from factory.
I did the recovery disk and found no errors.
After restart, i'm good for 20-24 hours and then the system goes in read-only file system.
Any other things to try before a full reinstall ?
Thanks
Please gather all of the information outlined in the document and open a TAC SR so that we can get your information over to IBM.
I already open a TAC and told me to rebuilt/re-install the server and do a restore.
Any chance of this issue affecting Cisco Unity Connection as well?
Any chance of this issue affecting Cisco Unity Connection as well?
This issue can hit any of these servers. We haven't seen it on Windows-based Unity but have on UC, CUP, and CUCM.
Finally I did a full rebuilt of my CUCM and it's stable now.
Cisco told me a new hard disk firmware upgrade should be out very soon to fix that issue.
The latest version here with instructions :
Instructions to install :
http://www.cisco.com/web/software/283046743/32751/782x-I4_Firmware_Updatev10.pdf
Louis
Unfortunately I have the same problem on my Cluster of two 7816I4 with CUCM6.1
admin:utils create report hardware
*** WARNING ***
This process can take several minutes as the disk array, remote console,
system diagnostics and enviromental systems are probed for their current
values.
Continue (y/n)?y
Internal CLI failure
admin:utils create report hardware
*** WARNING ***
This process can take several minutes as the disk array, remote console,
system diagnostics and enviromental systems are probed for their current
values.
Continue (y/n)?y
Password:
Internal CLI failure
As password I used that from my admin login
With RTMT I can see the following
At Mon Nov 01 16:05:19 CET 2010 on node 192.168.0.2; the following SyslogSeverityMatchFound events generated: SeverityMatch - Alert sudo: admin : command not allowed ; TTY=unknown ; PWD=/usr/local/platform/bin ; USER=root ; COMMAND=/opt/ibm/dsa/ibm_utl_dsa_212p_rhel3_i386.bin -b -text -d /var/log/active/platform/log
What can I do? Please Help.
Unfortunately I have the same problem on my Cluster of two 7816I4 with CUCM6.1
admin:utils create report hardware
*** WARNING ***
This process can take several minutes as the disk array, remote console,
system diagnostics and enviromental systems are probed for their current
values.
Continue (y/n)?y
Internal CLI failure
admin:utils create report hardware
*** WARNING ***
This process can take several minutes as the disk array, remote console,
system diagnostics and enviromental systems are probed for their current
values.
Continue (y/n)?y
Password:
Internal CLI failure
As password I used that from my admin login
With RTMT I can see the following
At Mon Nov 01 16:05:19 CET 2010 on node 192.168.0.2; the following SyslogSeverityMatchFound events generated: SeverityMatch - Alert sudo: admin : command not allowed ; TTY=unknown ; PWD=/usr/local/platform/bin ; USER=root ; COMMAND=/opt/ibm/dsa/ibm_utl_dsa_212p_rhel3_i386.bin -b -text -d /var/log/active/platform/log
What can I do? Please Help.
For the 7816s the first thing you should do is reboot the server and hit F2 during POST. This will get you into the drive's self diagnostic application. If that report shows a failure then you need to replace the hard drive.
Edit: Looks like the F2 option was removed from these servers. We are looking for the replacement utility and the doc will be updated when it is known.
For your issue running the CLI command you are hitting the 3rd bug listed in the Releated Defects above (CSCtg26203). You can run the bootable DSA or open a TAC SR to use a workaround.
"replace the hard drive" - So it is not useful to upgrade the firmware?
Which HDD is recommended by cisco?
"replace the hard drive" - So it is not useful to upgrade the firmware?
Which HDD is recommended by cisco?
If you have a faulty hard drive then upgrading firmware will not help.
The HDD replacement needs to go through Cisco TAC if you have an MCS server. If you bought the server directly from IBM then you need to contact their support to arrange a replacement drive. If you do not have a support contract on the server from IBM or Cisco then the IBM part number for the drive is listed on the IBM server solutions page at www.cisco.com/go/swonly.
There is no way with F2 during POST to get into the drive's self diagnostic application.
We bought the MCS7816I4-K9-CMB2 with CUCM6.1 preinstalled.
Can it be that the Dynamic System Analysis (DSA) is not preinstalled for this machine?
Is it the only way to take the ibm_fw_dsa_3.10_anyos.iso?
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: