06-22-2011 10:08 AM - edited 07-03-2021 08:20 PM
My WCS quit, apparently after the server was rebooted following Microsoft updates while I was on vacation earlier this week.
WCS 7.0.172
Win Server 2003 R2 on VMware
4BG RAM
<6GB of 50GB disk space used.
After stopping WCS & rebooting:
Health Monitor is running with an error.
WCS is stopped.
Database server is stopped
Apache server is running
After attempting to restart/start WCS:
Health Monitor is already running.
This was a clean install/upgrade a few weeks ago. 7.0.172 wouldn't install on top of the previous version so I had to uninstall, install and restore. WCS had been solid ever since.
Any logs that can tell me what's broke? Any other ideas?
Thanks
06-22-2011 10:22 AM
Can you look into the hm_launghout.txt and launchout.txt to see what is happening ?
06-22-2011 10:29 AM
They don't tell me much...
hm_launchout.txt:
Starting Health Monitor as a primary
Checking for Port 8082 availability... OK
Key entry exists; Update Apache config from Keystore
Configuring Apache server for key
Starting Health Montior Web Server...
Health Monitor Web Server Started.
Starting Health Monitor Server...
Health Monitor Server Started.
launchout.txt:
Checking for Port 21 availability... OK
Checking for Port 8456 availability... OK
Checking for Port 8457 availability... OK
Checking for Port 1299 availability... OK
Checking for local Port 1299 availability... OK
Checking for Port 6789 availability... OK
Checking for local Port 6789 availability... OK
Checking for UDP Port 69 availability... OK
Checking for Port 8005 availability... OK
Checking for UDP Port 162 availability... OK
Checking for Port 8009 availability... OK
Starting RMI registry on port 1299
Registry started.
Key entry exists; Update Apache config from Keystore
Configuring Apache server for key
DB is already started
Starting NMS Server...
NMS Server Started.
Starting up TFTP server...
TFTP Server started.
Starting up FTP server
Started FTP
FTP Server started
Non Privileged User Flag is set to false
Apache server is already started
Shutting down Apache server ...
Shutting down Apache server ...
Stopping TFTP server...
Stopping FTP server...
Stopped TFTP server.
Stopped FTP server.
Leaving database server running.
06-23-2011 08:21 AM
Next question: without the app running, how can I get the serial number or other info to get a TAC case opened?
06-23-2011 11:51 AM
This is from nmsadmin-0-0.log:
06/22/11 09:57:17.750 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service is stopping.
06/22/11 09:57:17.750 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service has stopped.
06/22/11 09:57:34.328 ERROR [general] [Thread-4] [Wed Jun 22 09:57:34 2011] [notice] Disabled use of AcceptEx() WinSock2 API
06/22/11 09:59:44.589 ERROR [general] [main] initHealthMonitor(): can not start DB
06/22/11 09:59:44.589 ERROR [general] [main] Failed to start WCS. Check hm_launchout.txt for details.
06/22/11 09:59:44.589 ERROR [general] [main] Log files are located under webnms\logs in the WCS install directory.
06/22/11 10:46:30.988 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service is stopping.
06/22/11 10:46:31.004 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service has stopped.
06/22/11 10:56:24.472 ERROR [general] [main] Health Monitor is already running.
06/22/11 10:57:28.760 ERROR [general] [main] Health Monitor is already running.
06/22/11 11:43:06.015 ERROR [general] [main] Health Monitor is already running.
06/23/11 12:38:37.234 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service is stopping.
06/23/11 12:38:37.234 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service has stopped.
06/23/11 12:40:08.962 ERROR [general] [Thread-4] [Thu Jun 23 12:40:08 2011] [notice] Disabled use of AcceptEx() WinSock2 API
06/23/11 12:42:17.439 ERROR [general] [main] initHealthMonitor(): can not start DB
06/23/11 12:42:17.439 ERROR [general] [main] Failed to start WCS. Check hm_launchout.txt for details.
06/23/11 12:42:17.439 ERROR [general] [main] Log files are located under webnms\logs in the WCS install directory. 06/22/11 09:57:17.750 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service is stopping.
06/22/11 09:57:17.750 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service has stopped.
06/22/11 09:57:34.328 ERROR [general] [Thread-4] [Wed Jun 22 09:57:34 2011] [notice] Disabled use of AcceptEx() WinSock2 API
06/22/11 09:59:44.589 ERROR [general] [main] initHealthMonitor(): can not start DB
06/22/11 09:59:44.589 ERROR [general] [main] Failed to start WCS. Check hm_launchout.txt for details.
06/22/11 09:59:44.589 ERROR [general] [main] Log files are located under webnms\logs in the WCS install directory.
06/22/11 10:46:30.988 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service is stopping.
06/22/11 10:46:31.004 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service has stopped.
06/22/11 10:56:24.472 ERROR [general] [main] Health Monitor is already running.
06/22/11 10:57:28.760 ERROR [general] [main] Health Monitor is already running.
06/22/11 11:43:06.015 ERROR [general] [main] Health Monitor is already running.
06/23/11 12:38:37.234 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service is stopping.
06/23/11 12:38:37.234 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service has stopped.
06/23/11 12:40:08.962 ERROR [general] [Thread-4] [Thu Jun 23 12:40:08 2011] [notice] Disabled use of AcceptEx() WinSock2 API
06/23/11 12:42:17.439 ERROR [general] [main] initHealthMonitor(): can not start DB
06/23/11 12:42:17.439 ERROR [general] [main] Failed to start WCS. Check hm_launchout.txt for details.
06/23/11 12:42:17.439 ERROR [general] [main] Log files are located under webnms\logs in the WCS install directory.
TAC won't take my case.
07-25-2011 02:58 PM
I am having the same error with the exact same version of WCS. Were you able to resolve it?
07-25-2011 04:21 PM
WCS really doesn't have a serial number that you would use to open a case. Most customers seem to use the serial from a controller. Can you send the dbadmin-0-0.log and hiblog.txt to me at sschmidt@cisco.com. Did you have any power issues or hardware issues prior to this problem? Are you excluding the WCS install directory from backup, virus scanner or malware scanners? Basically anything that can lock the db.
Stephen
08-25-2011 09:01 AM
Wanted to follow up on this as I hope it will help someone else. I was able to get WCS running again without reinstalling. Stephen Schmidt was kind and patentient enough to work through this with me and eventually opened the following bugs:
Apparently there is some kind of corruption in the DB. The fix/work-around for both is to contact TAC as I think they need to do some magic to the DB but at least there is a fix.
Finally, make sure you're taking regular backups! I was but my DB had been corrupted for some time and I believe we had to go back through almost two months of weeklies to find a good DB they could work with.
07-26-2011 09:24 AM
I "resolved" it by uninstalling WCS, rebooting the server, reinstalling WCS and restoring my latest backup/snapshot.
08-25-2011 09:46 AM
I wanted to reply to this in the hope it may help someone else. I was able to get this resolved without reinstalling and starting over. Stephen Schmidt was kind and patient enough to walk through the troubleshooting process with. As a result the following bugs have been opened:
The work-around/fix for both is to contact TAC as I believe they need to do some magic to the DB.
Finally, make sure to take regular backups! We had to go back through almost 2 months of weeklies to find a good DB configuration as apparently the DB had been corrupted for a while but running fine until a scheduled reboot of the server.
03-26-2013 06:59 AM
This is indicative of an improper shut down of WCS. The WCS database does not get instantaneously updated but writes updates when it has time. Before the database can be updated, the update is stored is a flat file called sol####.log where # is a number.
WCS has in rare situations seen cases of database corruption that might cause something like what you are seeing. In most of these cases, you can delete the database log files (awaiting updates), and WCS will eventually start with only maybe a few
minutes worth of data lost.
The first thing to do, however, is to make sure the system has plenty of free space on the partition where WCS is installed (at least 5GB). If the WCS partition runs out of disk space, it can cause the database server to fail to start, and in some situations it can cause database file corruption.
- Find the highest numbered sol####.log file, where #### is a four digit number, and delete it.
- Attempt to start WCS.
- If WCS does not start, repeat for next sol###.log file, and so on.
- If WCS does not start after all sol####.log files are deleted, then the base database file is probably corrupted, and you will need to restore the database from a backup.
Again, please make sure that before trying any of this that the you have saved the database directory in case something goes wrong.
Hope this helps.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide