02-12-2009 12:51 AM
Hello,
I have a DFM issue in LMS 3.1. I upgraded from LMS 3.0 2007 Update to LMS3.1, and after a while, DFM started to produce interesting things. It "stops" sending alerts and notifications, and unable to load the alerts and activities page. All the DFM related services seems to be running. I already made a DFM database reinit, installed DFM 3.1.1 Patch, and both of them repaired the issue, but after 1-2 days it rise again.
The only thing which seems to be strange to me, that the memory usage is 5.8GB, and in LMS 3.0 it was only about 4GB. At the first occasion of the problem there were "java.lang.OutOfMemoryError: Java heap space" messages in adapterserve.log, and AdapterServe and AdapterServe1 processes were stopped.
Any ideas what can cause the problem?
Thanks,
Imre
02-12-2009 08:21 AM
How many devices are you managing in DFM? How many alerts do you typically have in AAD?
02-13-2009 12:40 AM
There is ~250 Known devices in DFM. There is ~1500 records for 1 day in Fault History. There is a lot of BAckupActivated Alert, because there is Voice/Data E1 lines, and in LMS 3.0 I had a filter for this type of messages. But now it seems to me, that during the upgrade process this filter (disabling backup activated messages) somehow disappeared. Is it possible, that the huge number of alerts causes my problem? As far as I can remember in LMS 3.0 I used DFM without filtering for a while, but there was no such problem. I try to regenerate the filtering (if I am able to find out what was my solution for this half a year ago...)
02-13-2009 07:08 AM
It's certainly possible. The AdapterServer is responsible for shuttling events from the backend DfmServer to the EPM database. If there are a huge number of events, it could exhaust memory (note: alerts can contain multiple events).
Certain events can be disabled within DFM > Configuration > Polling and Thresholds > Managing Thresholds. You can also unmanage certain interfaces under DFM > Device Management > Device Details. There are even steps documented here for unmanaging interfaces in bulk.
02-14-2009 06:16 AM
Thanks jclarke, helped a lot.
I will check the interfaces and events that is unimportant, and will unmanage/disable them. It is a planned task for me, but because there were no such problems with lms 3.0, it was not so urgent. Just one more question: is there any differencies between 3.0 and 3.1 in the way they handle DFM alerts/events? Because I am sure that there were the same amount of alerts in 3.0 without problems...
Thanks again for the very fast response,
Regards,
Imre
02-14-2009 11:28 AM
The event handling piece is shared code between Cisco and EMC. We don't have complete visibility into all the backend pieces of DFM. Therefore, I cannot say for certain what the engine changes were between DFM 3.0 and 3.1.
That said, since the OutOfMemoryError was only seen once, memory may not be the root cause. Without debugging logs, it's hard to know exactly why you're seeing daemon crashes.
02-17-2009 01:28 AM
Ok, I see. Well, I would like to clarify what exactly happens. I know that under CS I can set the log level for debugging. This is what you mean, when mention 2debugging logs"? What log files can help me to find out what happens? I know the function ofa few log files, but not all of them. Thanks,
Imre
02-17-2009 08:43 AM
The debugging is enabled under DFM > Configuration > Other Configurations > Logging. You need to enable Event Promulgation Module and Event Processing Adapters debugging. The logs are under NMSROOT/dfmLogs/EPM and epa.
02-19-2009 06:17 AM
I enabled debugging for EPM and EPA today. Two days ago I had disabled BackupActivate and HighDiscard rate alerts, so only a few alerts remained.
There were a few huge lg files, so I made a Logrot. But this morning DFM "died" again. The last alert is at about 3am, and if i click on any event id, or try to run Fault History it doesnt works.
I reload LMS and try to find out from the EPA&EPM logs what happens.
02-19-2009 02:28 PM
It would have been more useful to troubleshoot the server when the problem is occurring. You might also try opening a TAC service request the next time these daemons die so that some live analysis can be done.
02-23-2009 12:53 AM
it crashed again o the 21st of Feb. Debugging for EPM and EPA was running, so I got the log files. I think I open a TAC Case. Thanks for your help.
REgards,
Imre
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide