07-05-2010 02:50 PM
Hi.
I'm after a reboot on LMS 2.6 server, the CM home page started showing the following message: "Cannot connect to ANI Server since it is down". I followed the procedures as in the thread "Ciscoworks ANI server cannot connect, Joe Clark please help", but, in the end, after I restarted yhe Damon Manager, all seervices came up, except for the ANI database engine. After I started this serrvice manually, it came up, but the ANI.log file reappeared and the message on the CM home became "Please wait... ANIServer is still initializing."
The result for the pdreg command is below:
C:\>pdreg -l ANIDbEngine
Process = ANIDbEngine
Path = C:\PROGRA~1\CSCOpx\objects\db\win32\dbsrv9.exe
Flags =
Startup = Started automatically at boot.
Dependencies = Not applicable
The result for the pdshow command is attached.
Hope Joe Clark or anyone can help me to.
Thanks in advance.
07-12-2010 01:44 PM
What about the output of the HOSTNAME command from the DOS shell?
07-13-2010 05:52 AM
It is correct:
C:\>hostname
cwsctx03rjorp
Is it possible to restore a backup only for the JRM, without loosing DCR or RME information?
07-13-2010 02:27 PM
It looks like there might be a timing issue. Shutdown Daemon Manager, then empty out the contents of the jrm.log. Restart Daemon Manager, then when pdshow returns valid data, post that output along with the new jrm.log.
07-14-2010 06:36 AM
Please find attached the jrm.log, the old jrm.log (as jrm.old.log) and the pdshow output.
After restarting the Daemon Manager, the processe sm_server.exe started consuming a great amount of CPU time. This has happened before and a new restart in Daemon Manager solved it. Do you think this is somehow related to the jrm problem?
07-15-2010 07:47 PM
There is no jrm problem according to this pdshow and this log. In fact, I see no process problem at all. As for sm_server, it is expected for these processes to take a lot of CPU time on restart. These are the DFMServer processes that have to rebuild the internal topology of the network. They will also take considerable CPU time if you are sending lots of traps to LMS, or if they are currently doing polling of the devices.
DFM has a very particular profile when it comes to scalability. You need to make sure you are not managing too many interfaces/ports and that are you are not sending too many traps to DFM.
07-22-2010 06:18 AM
For the record:
I restored a backup on a new LMS instalation and the problem persisted. I had no more time and, so, I reinitialized the database and started everything form zero manually. Now, it is working fine.
Regarding the DFM scalability, I have only 1600 devices, do you think this number could be too much?
Thank you for the help.
07-22-2010 10:14 PM
Possibly. Post the output of the following command:
NMSROOT/objects/smarts/bin/sm_tpmgr -s DFM --sizes
07-23-2010 06:03 AM
This command asks for a DFM User and none of the users I created when installing LMS worked. Is there anyway I can retrieve or reset this credentials?
C:\>"c:\Program Files\CSCOpx\objects\smarts\bin\sm_tpmgr" -s DFM --sizes
Server DFM User: admin
admi's Password: XXXXXXXXXXXX
[23-Jul-2010 9:44:29 AM+836ms] t@11124
ASL-E-ERROR_INIT_BACKEND-While initializing server connection to 'DFM'
SM-EREFUSED-No process is connected to the specified location
07-23-2010 10:22 AM
The password is probably just admin. However, it can be found in the NMSROOT/objects/smarts/conf/serverConnect.conf file.
07-23-2010 12:45 PM
That is what serverConnect.conf says. But the output for the command is still the same.
07-23-2010 02:17 PM
Post the output of the pdshow command.
07-26-2010 06:23 AM
Hi.
Although I have installed a fresh LMSinstalation, after rediscovering all devices, the new installation of LMS is showing the same error in RME homepage: "JRM Service could be down. Check whether JRM services are running.".
By the pdshow results, I can see JRM is still "Waiting to initialize" since last fryday. The sm_server process is no longer live and the CPU activity is low righnow. The pdshow results and jrm.log are attached.
Do you think DFM is causing this problem? My great concerns are RME and CM, so I could reduce DFM functionality, or even disable it, if necessary.
07-26-2010 09:58 AM
Jrm is taking way too long to start. It is only allowed 30 seconds, but it looks like it's taking over four minutes. Assuming the server is idle, you can try doing:
pdterm jrm
pdexec jrm
To see if it starts. If it does, it could be that the server is too busy during Daemon Manager start time, and the CPU load is preventing jrm from starting correctly. What are the specs of this server?
07-26-2010 01:08 PM
Before your last message, I've disabled all DFM polling and SNMP trap receiving. After that, the RME home started working normally and I was able to manage jobs once again. You were right, the server coudn't handle all the traps and the polling, but even after disabling those, the CM home still shows the JRM down message. I tried stopping and restarting the JRM, like you suggested, but the problem persisted.
The system is a Windows 2003 on a VMWare ESXi. The hardware is a dual Xeon with 4GB of dedicated RAM to the LMS server. I don't have the clock informations now, but I can provide it tomorrow.
P.S.: Is there any clean way to prevent DFM from processing SNMP traps? For test purpooses, I changed the SNMP port, but it makes Windows to generate lots of ICMP port unreachable packets and I don't intend to leave it this way.
07-26-2010 10:07 PM
Unfortunately, VMWare is not supported in LMS 2.6. That could be contributing to your performance issues. If you need to use VMWare, you will have to upgrade to LMS 3.x. The only way to stop DFM from processing the traps is to stop sending them to DFM. You can change your device configs to only send traps which DFM understands. See http://www.cisco.com/en/US/partner/docs/net_mgmt/ciscoworks_device_fault_manager/2.0_IDU_2.0.6/user/guide/TrapFwd.html for the list.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide