Re: Cisco ISE 60057 A PSN node went down

mlaurencik · ‎10-07-2020

Hello, I'd like to ask a question about ISE syslog message 60057. We are trying to setup new monitoring in our network and we get this alarm very often. This message is being generated by all the peers of the node which having the issue actually. We have deployments with like 10-15 nodes, what means we get alarms from all the nodes despite only one is having issue potentially. However right after we check the node, it us up and running and all looks clear, all services up etc...

In order to troubleshoot whole issue and adjust our monitoring in the right way, I'd need to have clear and full understanding of this alarm. Does anyone know, if:

1. Is message 60057 being sent once or multiple times (by the same peer) after some time?

2. what are the condition for this alarm to be generated?

3. are there any options to tune this alarm in any way?

Message Code: 60057
Severity: NOTICE
Message Text: A PSN node went down
Message Description: One of the PSN nodes in the node group has gone down
Local Target Message Format: <timestamp> <seq_num> 60057 NOTICE PSN-Heartbeat: A PSN node went down, <log details>
Remote Target Message Format: <pri_num> <timestamp> <IP address/hostname> <CISE_logging category> <msg_id> <total seg> <seg num><timestamp> <seq_num> 60057 NOTICE PSN-Heartbeat: A PSN node went down, <log details>

thanks,

Martin

marce1000 · ‎10-07-2020

- Moved to Network Access Control ,

M.

-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Mike.Cifelli · ‎10-07-2020

1. Is message 60057 being sent once or multiple times (by the same peer) after some time?

AFAIK the message is coming from each PSN tied to the same node group where it has been determined that a PSN node went down.

2. what are the condition for this alarm to be generated?

The heartbeat has stopped for a PSN in the node group which typically means loss of network connectivity from my experience. Remember that Node group members will check on the availability of peer group members.

3. are there any options to tune this alarm in any way?

IMO I would not recommend tuning such an alarm as this could create bigger issues down the road for your deployment. Not sure of what version you are running, but I know there have been bugs in certain releases relating to node groups. For example, I got bit by this one in a production environment: https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvj47301

My suggestion would be to take a deeper look into why the node continues to go down. However, you can enable and configure alarms here: Administration->System->Settings->Alarm Settings

Good luck & HTH!

mlaurencik · ‎10-15-2020

Hi Mike,

thanks for your reply. Still, I'd like to know the logic behind. All those nodes are in the same rack, despite I'm not saying network issue is impossible, I'd day the chance is really low, as we have several separated customers/deployments that are not related/connected to each other in any way, plus the alarm always reports issue with other node (the affected node is always "random"). The network problem is very unlikely an issue here.

I know the alarm is being triggered by a peer of the node which having an issue, still I don't know if it is triggered more times in a row within s specified time, or if it is a single-time alarm.

Which debug logs to enable in order to see detailed logs related to heart-beat exchange? how often are the heartbeat messages exchanged?