Re: ISE Alarm : Critical : High Load Average: Server=ISE32P3SANMNT

adamscottmaster2013 · ‎10-02-2023

I have a cluster of ISE 3.2 patch-3:

node1: PAN/MNT

node2: SAN/SMNT

node3: PSN

node4: PSN

There are no activity on this cluster and yet I received this message on the secondary Admin/Secondary MNT;

ISE Alarm : Critical : High Load Average: Server=ISE32P3SANMNT

Any ideas?

adamscottmaster2013 · ‎10-02-2023

Furthermore, I am also getting this message: ISE Alarm : Critical : NTP Sync Failure : Server=ISE32P3SANMNT. However, when I do a "show ntp", everything looks legit:

ISE32P3SANMNT/admin#show ntp
Configured NTP Servers:
ntp1.cisco.com
ntp2.cisco.com
ntp3.cisco.com
Reference ID : 0A072896 (ntp1.cisco.com)
Stratum : 2
Ref time (UTC) : Mon Oct 02 20:48:01 2023
System time : 0.010108152 seconds fast of NTP time
Last offset : -0.034693789 seconds
RMS offset : 0.129994914 seconds
Frequency : 38.751 ppm slow
Residual freq : -1538.192 ppm
Skew : 0.067 ppm
Root delay : 0.000583669 seconds
Root dispersion : 0.046592101 seconds
Update interval : 25.0 seconds
Leap status : Normal

210 Number of sources = 2
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* ntp1.cisco.com 1 6 377 9 +5651us[ -29ms] +/- 311us
^- ntp2.cisco.com 1 6 377 34 +5263us[ +48ms] +/- 3277us

M indicates the mode of the source.
^ server, = peer, # local reference clock.

S indicates the state of the sources.
* Current time source, + Candidate, x False ticker, ? Connectivity lost, ~ Too much variability

Warning: Output results may conflict during periods of changing synchronization.
ISE32P3SANMNT/admin#

Milos_Jovanovic · ‎10-02-2023

Hi @adamscottmaster2013,

Regarding topic #1 (High Load Average), have you checked Reports / Diagnostics / Health Summary? What values do you see there? I had a similar issue where one node was constantly reported warnings, and it turned out that mistake was made with resource reservation (instead of reservation, it was limitation).

For topic #2 (NTP), I can see that you have 3 NTP servers configured, but only 2 are synced. I would assume alarm is for 3rd one.

Kind regards,

Milos

adamscottmaster2013 · ‎10-03-2023

@Milos_Jovanovic:

#1: I checked and it looks normal. Haven't had time to open a TAC case yet because Cisco TAC is not very helpful

#2: Yes, I have three NTP servers configured but always two shows up. I see that with both version 3.0, 3.1 and 3.2 with the latest patches. Therefore, I don't think your assumption is a correct one.

Milos_Jovanovic · ‎10-03-2023

#1 I meant to check on your VM infra, assuming it is VM. TAC is next option.

#2 Quite possible. I never had more than 2 NTP servers on my installation, so I have no idea how it is visible in that case, but it was my first idea. When I see this alarm, I analyze where is NTP, and how does traffic traverse there. Most often, it is behind multiple hops and behind FW. You need to lose some traffic (NTP is UDP based) and this alarm would pop out. Since you are using Internet NTP servers, I'm assuming sync at some point doesn't happen against one server at least, which is enough to raise this alarm. In my case, that was explanation that I gave to myself, since alarms are irregular and nothing else can be concluded.

Kind regards,

Milos

adamscottmaster2013 · ‎10-03-2023

- VM infrastructure checked out and there was no issue during the time of the alarm in ISE,

- I had the span port to capture traffic between ISE and NTP servers and confirmed that the NTP traffics DID get to the NTP server and back to the ISE (It got to the ISE interface) so the communication. I saw this issue quite often with ISE 3.2 patch-2 but not as much in patch-3 so I assume the issue is resolved but I guess NOT. I put Internet NTP servers to mask out my internal NTP servers. The ISE and NTP servers are on the same network with NTP servers being Stratum 1 server.

hslai · ‎10-06-2023

@adamscottmaster2013 MNT performs housekeeping tasks on its data in early morning hours each day so you may ignore the alarms if they come around 3 or 4 AM.

As to NTP servers, I hope you are not using Windows Servers for such as they are not as reliable.

adamscottmaster2013 · ‎10-07-2023

@hslai:

#1: It is NOT in the early morning hours, it is actually in the afternoon

#2: I am using Stratum 1 NTP servers from Microsync, not Windows Servers.

hslai · ‎10-07-2023

@adamscottmaster2013 If they have no obvious impact, then you may ignore them. If they surface for a prolong period of time, please engage Cisco TAC to troubleshoot.

On 1, we make get some better idea by "show tech". Below are sample outputs from my lab:

...
*****************************************
IO On Host  - under threshold
*****************************************
Threshold %     : 20
Actual IO Wait% : 0.35

Linux 4.18.0-372.9.1.el8.x86_64 (hslai-i32d) 	10/07/2023 	_x86_64_	(4 CPU)

12:00:00 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
12:10:01 AM     all      7.49      0.00      3.47      3.07      0.00     85.97
12:20:01 AM     all      4.13      0.00      3.01      0.49      0.00     92.37
...

Scheduler Jobs:
==================================================

JOB_NAME             REPEAT_INT ENABLED    STATE      LAST_START_DATE      LAST_RUN_DURATION
-------------------- ---------- ---------- ---------- -------------------- --------------------
STATS_JOB            freq=daily TRUE       SCHEDULED  07-OCT-23 02.00.00.8 +000000000 00:00:02.
                     ;byhour=2;                       11640 AM ETC/UTC     041135
                     byminute=0
                     ; bysecond
                     =0

NORMALISING_RACC_JOB freq=minut TRUE       SCHEDULED  07-OCT-23 06.51.00.5 +000000000 00:00:00.
                     ely;byseco                       70271 PM ETC/UTC     192542
                     nd=0;

NORMALISING_RAUTH_JO freq=minut TRUE       SCHEDULED  07-OCT-23 06.51.00.6 +000000000 00:00:00.
B                    ely;byseco                       03777 PM ETC/UTC     012001
                     nd=0;

HOURLY_STATS_JOB     freq=hourl TRUE       SCHEDULED  07-OCT-23 06.15.58.1 +000000000 00:00:00.
                     y;byminute                       45313 PM UTC         030927
                     =15

COLLATION_JOB        freq=minut TRUE       SCHEDULED  07-OCT-23 06.51.00.3 +000000000 00:00:00.
                     ely;byseco                       66588 PM ETC/UTC     287541
                     nd=0;

COLLATIONPURGE_JOB   freq=hourl TRUE       SCHEDULED  07-OCT-23 06.45.00.6 +000000000 00:00:00.
                     y;byminute                       89337 PM ETC/UTC     032142
                     =0,15,30,4
                     5;bysecond
                     =0;


...

*****************************************
Hourly Database Metrics
*****************************************

DAY                        Avg Redo Per Sec-MB    Avg TPS Avg Read IOPS Avg Write IOPS Avg Read MBPS Avg Write MBPS Max Redo Per Sec-MB    Max TPS Max Read IOPS Max Write IOPS Max Read MBPS Max Write MBPS
-------------------------- ------------------- ---------- ------------- -------------- ------------- -------------- ------------------- ---------- ------------- -------------- ------------- --------------
06-OCT-2023 00:00                            0        .09           .91            .41           .06              0                 .01        .89          3.15           1.53            .3            .03
06-OCT-2023 01:00                            0        .11           .95            .42           .07              0                 .01       1.16          2.76            1.5           .29            .03
06-OCT-2023 02:00                            0
...

*****************************************
db_log info for last 48 hours
*****************************************

TIMESTAMP                      COMPONENT                 TEXT
------------------------------ ------------------------- --------------------------------------------------------------------------------
...
06-OCT-23 04.41.48.623840 AM   collation                 Total Data max_space = 154, threshold_space = 123, total_space = 3, free_space =
06-OCT-23 04.41.48.624914 AM   purge_audit               Total Data threshold_space = 123 GB, used_space = 2 GB
06-OCT-23 04.41.48.974386 AM   purge_audit               purge skipped; no data available for 06-SEP-23
06-OCT-23 04.41.49.073688 AM   purge_tbl                 MNT_AAA_DIAGNOSTICS purging skipped
06-OCT-23 04.41.49.131767 AM   purge_tbl                 MNT_SYSTEM_DIAGNOSTICS purging skipped
06-OCT-23 04.41.49.133845 AM   purge_tbl                 MNT_SECURESYSTEM_DIAGNOSTICS purging skipped
06-OCT-23 04.41.49.463200 AM   purge_tbl                 dropping partition...SYS_P3509 in DB_LOG for 29-SEP-23
06-OCT-23 04.41.49.464342 AM   purge_tbl                 DB_LOG purging completed successfully
06-OCT-23 04.41.49.469887 AM   purge_tbl                 RADIUS_AUTH_AGGR purging skipped
06-OCT-23 04.41.49.470978 AM   purge_tbl                 MISCONFIGURED_NAS purging skipped
06-OCT-23 04.41.49.471978 AM   purge_tbl                 MISCONFIGURED_SUPPL_MONTH purging skipped
06-OCT-23 04.41.49.472986 AM   purge_tbl                 RADIUS_ERRORS_MONTH purging skipped
06-OCT-23 04.41.49.475525 AM   purge_tbl                 RADIUS_ERRORS_48 purging skipped
06-OCT-23 04.41.49.476574 AM   purge_tbl                 MISCONFIGURED_SUPPLICANTS_48 purging skipped
06-OCT-23 04.41.49.477562 AM   purge_tbl                 RADIUS_AUTH_SUPPRESSED purging skipped
06-OCT-23 04.41.49.478815 AM   purge_tbl                 COLLATION_RADIUS_AUTH purging skipped
06-OCT-23 04.41.49.673046 AM   purge_tbl                 dropping partition...SYS_P3485 in RADIUS_AUTH_48_LIVE for 28-SEP-23
06-OCT-23 04.41.49.674725 AM   purge_tbl                 RADIUS_AUTH_48_LIVE purging completed successfully
06-OCT-23 04.41.49.677111 AM   purge_tbl                 RADIUS_ACC_48_LIVE purging skipped
06-OCT-23 04.41.49.859627 AM   purge_tbl                 dropping partition...SYS_P3486 in RADIUS_AUTH_DETAILS for 28-SEP-23
06-OCT-23 04.41.49.859969 AM   purge_tbl                 RADIUS_AUTH_DETAILS purging completed successfully
06-OCT-23 04.41.49.861922 AM   purge_tbl                 ALARM_EVALUATION_DETAILS purging skipped
06-OCT-23 04.41.49.863059 AM   purge_tbl                 TACACS_ACC_48_LIVE purging skipped
06-OCT-23 04.41.49.864088 AM   purge_tbl                 TACACS_AUTH_48_LIVE purging skipped
06-OCT-23 04.41.49.867339 AM   purge_audit               purging Tacacs data older than 06-SEP-23
06-OCT-23 04.41.49.956713 AM   purge_tbl                 TACACS_ACC_MONTH purging skipped
06-OCT-23 04.41.49.957934 AM   purge_tbl                 TACACS_AUTHZ purging skipped
06-OCT-23 04.41.49.959337 AM   purge_tbl                 TACACS_ACC_DETAILS purging skipped
06-OCT-23 04.41.49.960372 AM   purge_tbl                 TACACS_AUTH_MONTH purging skipped
06-OCT-23 04.41.49.961379 AM   purge_tbl                 TACACS_ACC_ARCHIVE purging skipped
06-OCT-23 04.41.49.962411 AM   purge_tbl                 TACACS_AUTH_48_LIVE purging skipped
06-OCT-23 04.41.49.963423 AM   purge_tbl                 TACACS_AUTH_DETAILS purging skipped
06-OCT-23 04.41.49.964397 AM   purge_tbl                 TACACS_ACC_48_LIVE purging skipped
06-OCT-23 04.41.49.965379 AM   purge_tbl                 TACACS_AUTH_AGGR purging skipped
06-OCT-23 04.41.49.966387 AM   purge_tbl                 TACACS_AUTH_ARCHIVE purging skipped
...