11-07-2013 10:47 AM
Once again I had multiple hard drives fail all at once that forced me to replace them and do a factory restore (time number 6). I got Suse reinstalled, updated the VSMS server to 6.3.3, restored from backup. Now the VSOM server sees the VSMS as up, but cannot see a version number, capacity, and says no repositories avaliable. The VSMS server has 2 local repositories defined and active.
Has anyone seen this before and have any idea how to resolve it?
Thanks for any ideas/suggestions.
Solved! Go to Solution.
12-27-2013 09:03 AM
Danny, I have VSOM on seperate server with 55 Media servers and yes I have seen this countless times. The first thing you should do is UPGRADE FW package on your servers to 11.0.1-0046 <-- this will fix lot of your issues with muiltple drive failures.
If this issue happens after FW upgrade follow these procedures. Because more then likely your operating System is still intact and all your data is still GOOD. Beleive me I fixed this many many times
If you have hard drive fail and after rebuild second drive fails, go into WebBios CTRL-H and look at drive states normally one drive will say OFFLINE and other FAILED. Make offlline drive - ONLINE and reboot server. After reboot ,WebBios check state of hard drives Which will probably say Unconfigured Spun Up.
Command MegaCli -cfgForeign -Scan -aALL
Command MegaCli -cfgForeign -Clear -aALL
Now make Foreign drive as Hot Spare
Comamnd MegaCli -PDHSP -Set -Physdrv \[E:S\] -aALL
E - Enclosure ID S - Slot drive number
Hope this helps because I know your pain times ten !!!!!!!!!!!!!!
11-18-2013 07:06 AM
Is this a co-located server? i.e. - Are both VSOM and VSMS running on the same server?
Best place to start for something like this is to ensure you have the proper host entries in /etc/hosts.
Usually ends up looking something like:
# special IPv6 addresses
::1 ipv6-localhost ipv6-loopback
fe00::0 ipv6-localnet
ff00::0 ipv6-mcastprefix
ff02::1 ipv6-allnodes
ff02::2 ipv6-allrouters
ff02::3 ipv6-allhosts
172.19.171.23 linux.site linux
172.19.171.23 BSIFREVS3.bulletproofsi.com BSIFREVS3
172.19.171.23 VSM7.bulletproofsi.com VSM7
BSIFREVS3:/etc #
Wouldn't hurt to punt services after this change, or just perform a full reboot.
Hope this helps.
Cheers.
11-18-2013 07:09 AM
I forgot to update this posting. Turns out the ethernet port on the VSMS server was failing. Swapped over to the other port and everything picked right back up.
Another issue on my lemon server
Thanks for reading and the reply.
12-27-2013 09:03 AM
Danny, I have VSOM on seperate server with 55 Media servers and yes I have seen this countless times. The first thing you should do is UPGRADE FW package on your servers to 11.0.1-0046 <-- this will fix lot of your issues with muiltple drive failures.
If this issue happens after FW upgrade follow these procedures. Because more then likely your operating System is still intact and all your data is still GOOD. Beleive me I fixed this many many times
If you have hard drive fail and after rebuild second drive fails, go into WebBios CTRL-H and look at drive states normally one drive will say OFFLINE and other FAILED. Make offlline drive - ONLINE and reboot server. After reboot ,WebBios check state of hard drives Which will probably say Unconfigured Spun Up.
Command MegaCli -cfgForeign -Scan -aALL
Command MegaCli -cfgForeign -Clear -aALL
Now make Foreign drive as Hot Spare
Comamnd MegaCli -PDHSP -Set -Physdrv \[E:S\] -aALL
E - Enclosure ID S - Slot drive number
Hope this helps because I know your pain times ten !!!!!!!!!!!!!!
01-02-2014 07:10 AM
Daniel,
Can you elaborate on the FW package you mentioned (11.0.1-0046)? I'm assuming this is a firmware upgrade for the discrete RAID adapter in the server? Which RAID HBA model? What Cisco chassis model?
01-02-2014 08:26 AM
The LSI Megaraid FW upgrade for Raid controller for CIVS-MSP-4RU media servers.The eariler FW packages 11.0.1-0036 has issues with failing hard drives that are known good drives. The upgrade to 11.0.1-0046 will fix lot of issues like this which is simple upgrade process. If you need the firmware zip file it can be found on LSI Megaraid website or I can upload to your FTP
I do recommend first opening up VSMC page and performing Media server backup
Best Regards,
01-02-2014 11:43 AM
Thanks for the firmware tip. I've noted several Megaraid firmware issues in the past with TAC, but they didn't advise upgrading the firmware on the chassis. Most notably 'failure lights lit on slots that aren't in use' (FW bug), and we've had two events where the RAID adapter firmware simply 'hung' and stopped responding. (FYI - What appears to happen in this case if the whole disk subsystem disappears and the filesystems get re-mounted READ-ONLY. Hilariously, the Cisco services still report a 'running' state, despite the complete chaos that a RO filesystem causes)
I'll keep this in mind and make a renewed effort to get some of our deployed chassis upgraded.
Cheers!
01-02-2014 12:44 PM
Cisco TAC is pretty much hit or miss depending on you owns your case. I found great TAC case engineer Alan Mattson which knows about everything concerning these servers.
Just keep in mind what firmware you downlaod because the CIVS-MSP-2RU and CIVS-MSP-4RU are two different upgrade procedures.
I had so many issues with this system Cisco VSOM 6.3.2 running many thrid party software to operate eff enought for casino that I recently got approved to rip and replace the entire system. I'm migrating to Surveillus (Real Casino software VSM) on UCS and EMC Isilion OneFS for storage @ 2 Petabytes.....I can't wait to rip this Cisco system out.
Best of luck and upgrade now before you have another failure!
01-06-2014 05:38 AM
You are the man!!! I followed your steps and the OS booted back up again and allowed me to upgrade the fimware.
01-06-2014 05:49 AM
GREAT NEWS!!!!
Did you follow the steps to bring for muiltiple drive failures? and then you upgraded firmware?
I'm asking because I'm curious! When I found these forums I was blown away because I thought I was the only one out in the world dealing with all these issues (This is one of many many issues I have had in past 5 years with this system) because TAC seemed to never have the answer and since my system is regulated by the state and subjected to fines for equipment failure I really had find my own ways to recover these servers from failure.
01-06-2014 06:01 AM
Hey Daniel,
I noticed above you mentioned you are using WinSCP for SFTP transfers in windows. I used to as well. Give FileZilla a try; https://filezilla-project.org/download.php I've found it to be substantially faster than WinSCP.
Cheers!
Scott Olsen
Solutions Specialist
Bulletproof Solutions Inc.
Web: www.bulletproofsi.com
01-27-2014 10:58 AM
Well I spoke too soon. I finally got back to messing with the server and now it tells me that the system has exceded the maximun limit of devices per quad.
Ever seen that one before?
01-28-2014 07:08 AM
Danny, No I haven't seen this message before. Can you post some screen shots?
01-28-2014 07:17 AM
FInally made it work sorta. Had to remove the second row of hard drives...
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide