cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
6867
Views
0
Helpful
10
Replies

WCS 7.0.172 will not start

mscherting
Level 1
Level 1

My WCS quit, apparently after the server was rebooted following Microsoft updates while I was on vacation earlier this week.

WCS 7.0.172

Win Server 2003 R2 on VMware

4BG RAM

<6GB of 50GB disk space used.

After stopping WCS & rebooting:

Health Monitor is running with an error.

WCS is stopped.

Database server is stopped

Apache server is running

After attempting to restart/start WCS:

Health Monitor is already running.

This was a clean install/upgrade a few weeks ago.  7.0.172 wouldn't install on top of the previous version so I had to uninstall, install and restore.  WCS had been solid ever since.

Any logs that can tell me what's broke?  Any other ideas?

Thanks

10 Replies 10

Can you look into the hm_launghout.txt and launchout.txt to see what is happening ?

They don't tell me much...

hm_launchout.txt:

Starting Health Monitor as a primary

Checking for Port 8082 availability... OK

Key entry exists; Update Apache config from Keystore

Configuring Apache server for key

Starting Health Montior Web Server...

Health Monitor Web Server Started.

Starting Health Monitor Server...

Health Monitor Server Started.

launchout.txt:

Checking for Port 21 availability... OK

Checking for Port 8456 availability... OK

Checking for Port 8457 availability... OK

Checking for Port 1299 availability... OK

Checking for local Port 1299 availability... OK

Checking for Port 6789 availability... OK

Checking for local Port 6789 availability... OK

Checking for UDP Port 69 availability... OK

Checking for Port 8005 availability... OK

Checking for UDP Port 162 availability... OK

Checking for Port 8009 availability... OK

Starting RMI registry on port 1299

Registry started.

Key entry exists; Update Apache config from Keystore

Configuring Apache server for key

DB is already started

Starting NMS Server...

NMS Server Started.

Starting up TFTP server...

TFTP Server started.

Starting up FTP server

Started FTP

FTP Server started

Non Privileged User Flag is set to false

Apache server is already started

Shutting down Apache server ...

Shutting down Apache server ...

Stopping TFTP server...

Stopping FTP server...

Stopped TFTP server.

Stopped FTP server.

Leaving database server running.

mscherting
Level 1
Level 1

Next question:  without the app running, how can I get the serial number or other info to get a TAC case opened?

mscherting
Level 1
Level 1

This is from nmsadmin-0-0.log:

06/22/11 09:57:17.750 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service is stopping.

06/22/11 09:57:17.750 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service has stopped.

06/22/11 09:57:34.328 ERROR [general] [Thread-4] [Wed Jun 22 09:57:34 2011] [notice] Disabled use of AcceptEx() WinSock2 API

06/22/11 09:59:44.589 ERROR [general] [main] initHealthMonitor(): can not start DB

06/22/11 09:59:44.589 ERROR [general] [main] Failed to start WCS.  Check hm_launchout.txt for details.

06/22/11 09:59:44.589 ERROR [general] [main] Log files are located under webnms\logs in the WCS install directory.

06/22/11 10:46:30.988 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service is stopping.

06/22/11 10:46:31.004 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service has stopped.

06/22/11 10:56:24.472 ERROR [general] [main] Health Monitor is already running.

06/22/11 10:57:28.760 ERROR [general] [main] Health Monitor is already running.

06/22/11 11:43:06.015 ERROR [general] [main] Health Monitor is already running.

06/23/11 12:38:37.234 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service is stopping.

06/23/11 12:38:37.234 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service has stopped.

06/23/11 12:40:08.962 ERROR [general] [Thread-4] [Thu Jun 23 12:40:08 2011] [notice] Disabled use of AcceptEx() WinSock2 API

06/23/11 12:42:17.439 ERROR [general] [main] initHealthMonitor(): can not start DB

06/23/11 12:42:17.439 ERROR [general] [main] Failed to start WCS.  Check hm_launchout.txt for details.

06/23/11 12:42:17.439 ERROR [general] [main] Log files are located under webnms\logs in the WCS install directory. 06/22/11 09:57:17.750 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service is stopping.
06/22/11 09:57:17.750 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service has stopped.
06/22/11 09:57:34.328 ERROR [general] [Thread-4] [Wed Jun 22 09:57:34 2011] [notice] Disabled use of AcceptEx() WinSock2 API
06/22/11 09:59:44.589 ERROR [general] [main] initHealthMonitor(): can not start DB
06/22/11 09:59:44.589 ERROR [general] [main] Failed to start WCS.  Check hm_launchout.txt for details.
06/22/11 09:59:44.589 ERROR [general] [main] Log files are located under webnms\logs in the WCS install directory.
06/22/11 10:46:30.988 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service is stopping.
06/22/11 10:46:31.004 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service has stopped.
06/22/11 10:56:24.472 ERROR [general] [main] Health Monitor is already running.
06/22/11 10:57:28.760 ERROR [general] [main] Health Monitor is already running.
06/22/11 11:43:06.015 ERROR [general] [main] Health Monitor is already running.
06/23/11 12:38:37.234 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service is stopping.
06/23/11 12:38:37.234 ERROR [general] [Thread-6] The Nms_Apache_7_0_172_0 service has stopped.
06/23/11 12:40:08.962 ERROR [general] [Thread-4] [Thu Jun 23 12:40:08 2011] [notice] Disabled use of AcceptEx() WinSock2 API
06/23/11 12:42:17.439 ERROR [general] [main] initHealthMonitor(): can not start DB
06/23/11 12:42:17.439 ERROR [general] [main] Failed to start WCS.  Check hm_launchout.txt for details.
06/23/11 12:42:17.439 ERROR [general] [main] Log files are located under webnms\logs in the WCS install directory.

TAC won't take my case.

I am having the same error with the exact same version of WCS.  Were you able to resolve it?

WCS really doesn't have a serial number that you would use to open a case.  Most customers seem to use the serial from a controller.  Can you send the dbadmin-0-0.log and hiblog.txt to me at sschmidt@cisco.com.  Did you have any power issues or hardware issues prior to this problem?  Are you excluding the WCS install directory from backup, virus scanner or malware scanners?  Basically anything that can lock the db.

Stephen

Wanted to follow up on this as I hope it will help someone else.  I was able to get WCS running again without reinstalling.  Stephen Schmidt was kind and patentient enough to work through this with me and eventually opened the following bugs:

CSCtr97951

CSCtr97576

Apparently there is some kind of corruption in the DB.  The fix/work-around for both is to contact TAC as I think they need to do some magic to the DB but at least there is a fix.

Finally, make sure you're taking regular backups!  I was but my DB had been corrupted for some time and I believe we had to go back through almost two months of weeklies to find a good DB they could work with.

I "resolved" it by uninstalling WCS, rebooting the server, reinstalling WCS and restoring my latest backup/snapshot.

stevenpagan
Level 4
Level 4

I wanted to reply to this in the hope it may help someone else.  I was able to get this resolved without reinstalling and starting over.  Stephen Schmidt was kind and patient enough to walk through the troubleshooting process with.  As a result the following bugs have been opened:

CSCtr97951

CSCtr97576

The work-around/fix for both is to contact TAC as I believe they need to do some magic to the DB.

Finally, make sure to take regular backups!  We had to go back through almost 2 months of weeklies to find a good DB configuration as apparently the DB had been corrupted for a while but running fine until a scheduled reboot of the server.

This is indicative of an improper shut down of WCS. The WCS database does not get instantaneously updated but writes updates when it has time. Before the database can be updated, the update is stored is a flat file called sol####.log where # is a number.

WCS has in rare situations seen cases of database corruption that might cause something like what you are seeing. In most of these cases, you can delete the database log files (awaiting updates), and WCS will eventually start with only maybe a few

minutes worth of data lost.

The first thing to do, however, is to make sure the system has plenty of free space on the partition where WCS is installed (at least 5GB). If the WCS partition runs out of disk space, it can cause the database server to fail to start, and in some situations it can cause  database file corruption.

- Find the highest numbered sol####.log file, where #### is a four digit number, and delete it.

  - Attempt to start WCS.

  - If WCS does not start, repeat for next sol###.log file, and so on.

  - If WCS does not start after all sol####.log files are deleted, then the base database file is probably corrupted, and you will need to restore the database from a backup.

Again, please make sure that before trying any of this that the you have saved the database directory in case something goes wrong.

Hope this helps.

Review Cisco Networking for a $25 gift card