Re: RME 4.0.5: Syslog Records - Default purge job

yjdabear · ‎01-08-2008

Does this indicate a corruption in the db? Maybe that's why my db backup has been getting much bigger after each day.

/var/adm/CSCOpx/files/rme/jobs/syslog/1394.log

[ Tue Jan 08 01:00:01 EST 2008 ],INFO ,[main],Starting purge job 1394

[ Tue Jan 08 01:00:01 EST 2008 ],INFO ,[main], Its a default purge job

[ Tue Jan 08 01:00:11 EST 2008 ],ERROR,[main],Drop table failed:ASA Error -210:

User 'DBA' has the row in 'SYSLOG_20071116' locked 8405 42W18

[ Tue Jan 08 01:00:19 EST 2008 ],ERROR,[main],Drop table failed:ASA Error -210:

User 'DBA' has the row in 'SYSLOG_20071117' locked 8405 42W18

[ Tue Jan 08 01:00:27 EST 2008 ],ERROR,[main],Drop table failed:ASA Error -210:

User 'DBA' has the row in 'SYSLOG_20071118' locked 8405 42W18

[ Tue Jan 08 01:00:34 EST 2008 ],ERROR,[main],Failed to purge syslogs

[ Tue Jan 08 01:00:36 EST 2008 ],ERROR,[main],Purge not successful

David Stanford · ‎01-08-2008

Looks like this may be related to two different bugs. CSCsb66514 which was unreproducible, but connected to CSCsb87660.

In the second defect duplicate inventory jobs may actually cause the above to happen.

Restarting the daemons often clears this up (perhaps temporarily), but clearing up the duplicate jobs is a better solution.

Find the duplicate jobs and do the following:

The job cli can be used to delete the unwanted jobs

cd NMSROOT/bin

cwjava -cw NMSROOT com.cisco.nm.cmf.jrm.jobcli

delete unwanted_job_id

net stop crmdmgtd

net start crmdmgtd

Even though these are inventory jobs they can cause a problem with syslog purge.

David Stanford · ‎01-08-2008

Also, I have seen dbsrv9 clear this up too

Stop the daemons

cd to NMSROOT/CSCOpx/databases/rmeng

Remove/rename rmeng.log (if exists)

Run dbsrv9 -f NMSROOT/CSCOpx/databases/rmeng/rmeng.db

Restart the daemon manager

yjdabear · ‎01-09-2008

Before I try this, could you explain what this does, particularly the "Run dbsrv9 -f NMSROOT/CSCOpx/databases/rmeng/rmeng.db" step?

Is there any way to clean out the Syslog*db, except syslogs from the last 24-48 hours, without impacting the rest of RME?

yjdabear · ‎01-09-2008

I don't see a duplicate SyslogDefaultPurge job.

Joe Clarke · ‎01-09-2008

Dave's right that this is typically caused by two SyslogDefaultPurge jobs running. It could also be caused by stale DB locks. You might try restart dmgtd which will force the database down, and all processes that may be holding locks as well.

If that doesn't work, Dave's instructions to force all of the pending transactions to rollback, and mark the database as closed may also be effective. However, before removing the transaction log, you might try:

env LD_LIBRARY_PATH=/opt/CSCOpx/lib:/opt/CSCOpx/objects/db/lib /opt/CSCOpx/objects/db/bin/dbsrv9 -a /opt/CSCOpx/databases/rmeng/rmeng.log /opt/CSCOpx/databases/rmeng/rmeng.db

The -a will tell dbsrv9 to apply the transaction log as a guide. If the log is not corrupt, this should work.

yjdabear · ‎01-13-2008

After stopping dmgtd, rmeng.log disappeared, which I think happens everytime dmgtd shuts down.

Not too surprisingly, I got:

env LD_LIBRARY_PATH=/opt/CSCOpx/lib:/opt/CSCOpx/objects/db/lib /opt/CSCOpx/objects/db/bin/dbsrv9 -a /opt/CSCOpx/databases/rmeng/rmeng.log /opt/CSCOpx/databases/rmeng/rmeng.db

ld.so.1: dbsrv9: fatal: libdbserv9_r.so: open failed: No such file or directory

Killed

Then:

/opt/CSCOpx/objects/db/bin/dbsrv9 -f /opt/CSCOpx/databases/rmeng/rmeng.db

ld.so.1: dbsrv9: fatal: libdbserv9_r.so: open failed: No such file or directory

Killed

Joe Clarke · ‎01-13-2008

When all is good, the log SHOULD disappear when the database engine is shutdown. As for the library errors, re-read my initial instructions. You need to set LD_LIBRARY_PATH correctly to tell dbsrv9 where to find its required shared objects.

yjdabear · ‎01-13-2008

Actually, the lib error was the output with the LD_LIBRARY_PATH env set--I just updated my previous post to reflect that.

Joe Clarke · ‎01-13-2008

I just copy and pasted what you typed, and that command works. You may have had a typo on your command line. If the library was actually missing, nothing would be working.

yjdabear · ‎01-13-2008

You're right. Here's the output:

env LD_LIBRARY_PATH=/opt/CSCOpx/lib:/opt/CSCOpx/objects/db/lib /opt/CSCOpx/objects/db/bin/dbsrv9 -a /opt/CSCOpx/databases/rmeng/rmeng.log /opt/CSCOpx/databases/rmeng/rmeng.db

Adaptive Server Anywhere Network Server Version 9.0.0.1364

This software contains confidential and trade secret information of

iAnywhere Solutions, Inc.

Use, duplication or disclosure of the software and documentation

by the U.S. Government is subject to restrictions set forth in a license

agreement between the Government and iAnywhere Solutions, Inc. or

other written agreement specifying the Government's rights to use the

software and any applicable FAR provisions, for example, FAR 52.227-19.

iAnywhere Solutions, Inc., One Sybase Drive, Dublin, CA 94568, USA

Networked Seat (per-seat) model. Access to the server is limited to 100 seat(s).

This server is licensed to:

"Cisco User"

"Cisco"

Running on SunOS 5.8 Generic_117350-46

1048576K of memory used for caching

Minimum cache size: 8192K, maximum cache size: 1048576K

Using a maximum page size of 4096 bytes

Starting database "rmeng" (/opt/CSCOpx/databases/rmeng/rmeng.db) at Sun Jan 13 2008 23:20

Database recovery in progress

Last checkpoint at Sun Jan 13 2008 23:20

Checkpoint log...

Transaction log: /opt/CSCOpx/databases/rmeng/rmeng.log...

Error: Cannot open transaction log file -- No such file or directory

Cannot open transaction log file -- No such file or directory

Database server stopped at Sun Jan 13 2008 23:20

So now I should just restart dmgtd and see if the purge job goes through or not?

Joe Clarke · ‎01-13-2008

Yes.

yjdabear · ‎01-14-2008

Looks like the job is going?

tail /var/adm/CSCOpx/files/rme/jobs/syslog/1394.log

[ Sat Jan 12 01:00:38 EST 2008 ],ERROR,[main],Purge not successful

[ Sun Jan 13 01:00:01 EST 2008 ],INFO ,[main],Starting purge job 1394

[ Sun Jan 13 01:00:01 EST 2008 ],INFO ,[main], Its a default purge job

[ Sun Jan 13 01:00:11 EST 2008 ],ERROR,[main],Drop table failed:ASA Error -210: User 'DBA' has the row in 'SYSLOG_20071116' locked 8405 42W18

[ Sun Jan 13 01:00:19 EST 2008 ],ERROR,[main],Drop table failed:ASA Error -210: User 'DBA' has the row in 'SYSLOG_20071117' locked 8405 42W18

[ Sun Jan 13 01:00:27 EST 2008 ],ERROR,[main],Drop table failed:ASA Error -210: User 'DBA' has the row in 'SYSLOG_20071118' locked 8405 42W18

[ Sun Jan 13 01:00:35 EST 2008 ],ERROR,[main],Failed to purge syslogs

[ Sun Jan 13 01:00:36 EST 2008 ],ERROR,[main],Purge not successful

[ Mon Jan 14 01:00:01 EST 2008 ],INFO ,[main],Starting purge job 1394

[ Mon Jan 14 01:00:01 EST 2008 ],INFO ,[main], Its a default purge job

Joe Clarke · ‎01-14-2008

Yes, it appears to be running as of 1 AM this morning.

yjdabear · ‎01-14-2008

I thought the job was still running at the time of my previous posting, since I didn't see a "job completing" note in the log. Now that I checked the job status through the Web GUI, I see the purge was supposedly finished in 27 secs, which seemed awfully swift for clearing up all the crud built up since 11/16. In the meantime, my nightly dbbackup is still gaining >300MB a day. I originally suspected it was the failed syslog purges that's the contributing factor, but now I'm not so sure. Any idea how I could track down the growth leader(s) and to curb this increase?