Solved: Re: Logger Service keeps shutting down and restarting

saharhanna · ‎05-23-2013

Hello,

I have upgraded my lab UCCE call center to v9.0

I have a simplexed environment

Ever since, the logger service keeps shutting down and restarting.

Recently, the node manager issued an error message and shut down the server

I tried to check some logs but I point out the problem, there are errors in different processes.

Attached are the logs.

Gergely Szabo · ‎05-23-2013

Hi,

if the NM tries to reload the server, it indicates a Problem (capital inteded).

Indeed, there's something interesting going on:

08:29:11:822 la-rpl Trace: Starting Recovery Key for Admin table is 6717118506000.0

08:29:11:822 la-rpl Trace: The largestkey = 7201232308043.0 >= startkey = 6717118506000.0

08:29:11:823 la-rpl Trace: To correct this problem: Stop logger service. Use ICMDBA tool to sync configuration data from its partner logger database. Restart logger service.

08:29:11:823 la-rpl Fail: Assertion failed: largestkey < startkey. File: ICRDB.CPP. Line 742

08:29:11:860 la-rpl Trace: CExceptionHandlerEx::GenerateMiniDump -- A Mini Dump File is available at logfiles\replication.exe_20130523082911824.mdmp

08:29:12:074 la-rpl Unhandled Exception: Exception code: 80000003 BREAKPOINTFault address: 754A3219 01:00012219 C:\Windows\syswow64\KERNELBASE.dllRegisters:EAX:00000000EBX:00000000ECX:00001890EDX:E1043F00ESI:015F8AB0EDI:00000005CS:EIP:0023:754A3219SS:ESP:002B:003CE2D8 EBP:003CE2E0DS:002B ES:002B FS:0053 GS:002BFlags:00000246Call stack:Address Frame754A3219 003CE2E0 DebugBreak+26EB1459C 003CE2EC EMSAbortProcess+C6EB1ACD1 003CF7F8 EMSReportCommon+1A16EB1ADBB 003CF818 EMSFailMessage+2B013BBE5A 003CF8A8 ICRDb::ICRDb+44A013B2FE2 003CF9B8 main+582015D96C2 003CF9FC NtCurrentTeb+174767333AA 003CFA08 BaseThreadInitThunk+1277449EF2 003CFA48 RtlInitializeExceptionChain+6377449EC5 003CFA60 RtlInitializeExceptionChain+36

The short version: configuration data is corrupt.

The longer version: the above message in red informs about the result of a sanity check. Each configuration change creates a new row in one of the tables holding the config info, and each row contains a RecoveryKey which is usually a large number incremented by the insertion. The error message says the largest key (= last key) contains a value that is lower than the first key. Naturally, this is something to consider for a lonely philosopher, but the rigid world of Cisco ICM does not allow metaphysical phenomena. Lower numbers are supposed to be lower than higher numbers.

This, of course, raises an exception and the Logger service restarts. If there are too many restarts, the Node Manager kicks in and restarts the machine - this is just a mechanism that prevents a larger extent of data corruption.

Now, if there's an other side Logger - fine, as the error message suggests, you can initiate manual replication (provided the other Logger database contains valid information).

Unfortunately, as you have written, this is a side A only environment. This may mean:

- accepting the situation, stopping ICM, throwing out the logger database, recreating it and reinstalling the Logger service,

- poking around in various tables to check what may be saved - this may mean the beginning of an adventure.

G.

View solution in original post

Gergely Szabo · ‎05-23-2013

Hi,

if the NM tries to reload the server, it indicates a Problem (capital inteded).

Indeed, there's something interesting going on:

08:29:11:822 la-rpl Trace: Starting Recovery Key for Admin table is 6717118506000.0

08:29:11:822 la-rpl Trace: The largestkey = 7201232308043.0 >= startkey = 6717118506000.0

08:29:11:823 la-rpl Trace: To correct this problem: Stop logger service. Use ICMDBA tool to sync configuration data from its partner logger database. Restart logger service.

08:29:11:823 la-rpl Fail: Assertion failed: largestkey < startkey. File: ICRDB.CPP. Line 742

08:29:11:860 la-rpl Trace: CExceptionHandlerEx::GenerateMiniDump -- A Mini Dump File is available at logfiles\replication.exe_20130523082911824.mdmp

08:29:12:074 la-rpl Unhandled Exception: Exception code: 80000003 BREAKPOINTFault address: 754A3219 01:00012219 C:\Windows\syswow64\KERNELBASE.dllRegisters:EAX:00000000EBX:00000000ECX:00001890EDX:E1043F00ESI:015F8AB0EDI:00000005CS:EIP:0023:754A3219SS:ESP:002B:003CE2D8 EBP:003CE2E0DS:002B ES:002B FS:0053 GS:002BFlags:00000246Call stack:Address Frame754A3219 003CE2E0 DebugBreak+26EB1459C 003CE2EC EMSAbortProcess+C6EB1ACD1 003CF7F8 EMSReportCommon+1A16EB1ADBB 003CF818 EMSFailMessage+2B013BBE5A 003CF8A8 ICRDb::ICRDb+44A013B2FE2 003CF9B8 main+582015D96C2 003CF9FC NtCurrentTeb+174767333AA 003CFA08 BaseThreadInitThunk+1277449EF2 003CFA48 RtlInitializeExceptionChain+6377449EC5 003CFA60 RtlInitializeExceptionChain+36

The short version: configuration data is corrupt.

The longer version: the above message in red informs about the result of a sanity check. Each configuration change creates a new row in one of the tables holding the config info, and each row contains a RecoveryKey which is usually a large number incremented by the insertion. The error message says the largest key (= last key) contains a value that is lower than the first key. Naturally, this is something to consider for a lonely philosopher, but the rigid world of Cisco ICM does not allow metaphysical phenomena. Lower numbers are supposed to be lower than higher numbers.

This, of course, raises an exception and the Logger service restarts. If there are too many restarts, the Node Manager kicks in and restarts the machine - this is just a mechanism that prevents a larger extent of data corruption.

Now, if there's an other side Logger - fine, as the error message suggests, you can initiate manual replication (provided the other Logger database contains valid information).

Unfortunately, as you have written, this is a side A only environment. This may mean:

- accepting the situation, stopping ICM, throwing out the logger database, recreating it and reinstalling the Logger service,

- poking around in various tables to check what may be saved - this may mean the beginning of an adventure.

G.

saharhanna · ‎05-27-2013

Hi Gergely,

Thank you for the reply.

I guess that the problem occurred due to the fact that the clock on my server was set to a future date, I fixed it to the current date and this is what messed up the database.

I did as you suggested, threw away the logger db, uninstall the logger service, and re-install it again

This solved the problem.

Thank you,

Sahar

Gergely Szabo · ‎05-27-2013

Hi,

yes, this happened to me in my lab, too, for some strange reason the clock was running ahead (must be something related to the fact that it was a virtual machine) and it had killed the configuration database.

Anyway, since then the first thing I usually do is installing a reliable NTP client. A bit of an advertisement:

http://www.meinbergglobal.com/english/sw/ntp.htm

Plus a highly customisable monitoring tool (sends alarms, draws graphs etc):

http://www.meinbergglobal.com/english/sw/ntp-server-monitor.htm

G.

david_legrand · ‎05-15-2017

Hi all,

I faced the same in my lab. I fixed the issue by synchronizing sideA on itself and purging message log through icmdba (ucce 11.5)

David