cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
459
Views
0
Helpful
6
Replies

Unix director's sapd push functionality broken after S24 application?

brok3n
Level 1
Level 1

I just applied S24 today to my hpux director and everything seems to work ok -- but the sapd log serialization (ftp push) doesnt seem to be working. The config files look the same, there are no sapd errors -- it moved the logfile to /usr/nr/var/tmp and it just sits there. I have bounced the director software (which previously would have forced the serialization and a ftp push) to no avail.

Any ideas Cisco?

Thx,

brkn!

6 Replies 6

brok3n
Level 1
Level 1

FYI -- heres the truss output from the running sapd, which doesnt look right:

It loops like this forever --

>truss -p 2826

Received signal #14, SIGALRM, in lwp_sema_wait() [caught]

lwp_sema_wait(0xDF9FB344) Err#91 ERESTART

sigprocmask(SIG_SETMASK, 0xDF504DF0, 0x00000000) = 0

gettimeofday(0xDF504A14) = 0

lwp_sema_post(0xDF403E3C) = 0

setitimer(ITIMER_REAL, 0xDF504988, 0x00000000) = 0

lwp_sema_wait(0xDF403E3C) = 0

sigprocmask(SIG_SETMASK, 0xDFA05E98, 0x00000000) = 0

setcontext(0xDF50485C)

times(0xDF403B0C) = 14731406

sigprocmask(SIG_BLOCK, 0xDF9FB334, 0x00000000) = 0

gettimeofday(0xDF403CE0) = 0

gettimeofday(0xDF403C54) = 0

gettimeofday(0xDF504D18) = 0

lwp_mutex_lock(0xDFA00E7C) = 0

setitimer(ITIMER_REAL, 0xDF504CEC, 0x00000000) = 0

sigprocmask(SIG_UNBLOCK, 0xDF9FB334, 0x00000000) = 0

The directory /usr/nr/var/tmp is a staging area; sapd should attempt the ftp

transfer and then either move the log file to /usr/nr/var/dump (success) or

/usr/nr/var/new (failure). If the file remains in /usr/nr/var/tmp, then sapd

may be unable to execute the scripts that perform the ftp transfer. Are

there any entries in the file /usr/nr/var/errors.sapd.* ? Do the contents

of /usr/nr/var/messages.sapd look reasonable? Are there any zombie

or defunct processes that are owned by nr.sapd?

Or perhaps you have configured a very long poll time (PollingDelay) in

sapd.conf? This token uses units of 10 seconds, so for example,

a value of 100 would cause nr.sapd to wait about 16 minutes before

attempting the transfer.

There are a number of actions we can take to investigate and resolve

the problem, but some of them are complex and may require multiple

steps. I would recommend that you open a TAC case and ask the

TAC engineer to contact stleary@cisco.com as the DE.

brok3n
Level 1
Level 1

Rebooted the director after much frustration -- works fine now. Go figure.

-brkn!

Hello, I have a similar problem, after upgrading to S24 om my Solaris 2.6 director, sapd refuses to load the logfiles into the database. Sapx_main even dumps core and messages.sapd reads: "W deriveContext, no non-space characters in alarm details." Upon inspection of the logfiles, the context field seems normal. A reboot didn't solve the problem.

Any ideas?

rp

The Warning: "W deriveContext, no non-space characters in alarm details" is being seen because the details field (the field just before the context field) doesn't have any non-space characters. In the cases I've seen this in the Alarm Details field had only a single space.

Normally this should just be a warning and the software should continue functioning properly. But there is a known bug. When nrdirmap gets this Warning there is a software bug which causes nrdirmap to stop. This same problem may also be in Sapx_main??

What to do:

Look in the log file that was being processed for any alarms where there is Context Data and an empty Alarm Details field. Remove these alarms from the file and then type nrstop and nrstart to try loading the file again.

(Or you can edit the log file and put something into the Alarm Details field for those alarms to keep them from being empty)

To prevent this from reocurring in new alarms you will need to fix the alarms that you found causing the problem.

We have most often seen this with user generated Custom Signatures.

The data from the SigStringInfo parameter for the alarm is what gets placed in the Alarm Details field. The SigStringInfo parameter is optional for packetd, but because of the bug in nrdirmap should be considered mandatory. Use nrConfigure to tune that signature and add something into the SigStringInfo parameter.

If the signature is a Cisco standard signature, then you can still do the above to fix it.

But please also notify the TAC which signature is causing the issue, as well as a copy of the alarm from your log file. This way they can let development know and they can get it fixed for the next signature update.

The above is the reason for the Warning in messages.sapd, but may or may not be the reason for the Sapx_main core dumps.

If Sapx_main continues to core then validate your configuration entries in sapd.conf and ensure that user netrangr can access the database with it's default environment.

If your environment, and sapd.conf are correct, and you've corrected the previous warning then you will likely need to call the TAC and get engineering support from Cisco.

thank you very much, your suggestions solved my problems. There was a custom signature with an empty sigstringinfo field, and after rectifying this in the logfiles and the signature-definition the problems disappeared. No core, no messages.sapd warnings and a happy db full of fresh data.

rgds

rp