01-09-2026 05:15 AM
Morning Cisco Forums,
I'm looking for some guidance on the process of replacing ISE administration nodes, in a 4 node cluster. In our current environment, we're running ISE version 3.4p4, and have a primary/secondary administration node, and 2 policy service nodes.
After upgrading from 3.3p8 to 3.4p4 (GUI method, not rebuild/restore), we're running into a bunch of problems specifically the administration nodes. Disk space (/OPT) is filling up within 2 weeks, going from 20% utilization, to over 90%. This requires an M&T reset bi-weekly. The M&T database shows very low utilization, even though the /opt directory is filling up. Also, services like logstash, and elasticsearch are crashing periodically. We've engaged TAC, and although they were helpful getting disk space restored by clearing M&T data, they haven't been able to figure out what's going on behind the scenes. They've checked the linux subsystems for old logs that may be causing the issues, etc, but haven't found anything.
My thought process is to build a new 3.4P4 PAN from scratch, de-register our secondary PAN, and re-register it to the environment. Let data sync, then do the same for our primary. I've run into conflicting information on the forums as to whether or not I should restore my configuration backup to the new node prior to joining it to the environment, and am just looking for some clarification on the "right" way of replacing the two administration nodes.
As always, any help is appreciated!
01-09-2026 06:19 AM
If you're also seeing crashes with logstash and elastisearch, could you be logging too much data? What are your session re-authentications and your accounting updates? I mostly use session re-authentications of 8 hours, and the same for accounting updates. I saw a post here this week where someone was set to use accounting updates every 5 minutes is 12 /hr * 24/hrs = 288 times a day per endpoint. The point is if you happen to have low session re-auth timers and/or low accounting timers, you can overload logging systems. Do that for 250k endpoints and you'd fine you're logging 36 million logging lines a day just for one of those (session or accounting) timers.
Regards,
David
01-09-2026 06:26 AM
Thanks for the reply!
Session re-authentication timers are 1 hour currently, Accounting updates are: aaa accounting update newinfo periodic 2880
Not a large deployment, we're talking less than 1K endpoints. Also, just want to be clear, this was not happening prior to the 3.4p4 upgrade in our environment. It's only an issue post-patch/upgrade.
01-09-2026 11:24 AM
@Inq_J ,
If I understand correctly, everything worked fine in 3.3 P8, and the issues started in 3.4 P4, am I correct ?
What is your Hardware (VM or SNS, 36xx or 37xx model, HD space is: 200 GB, 300 GB, 600 GB or 2TB, etc) ?
In your PPAN/PMnT and SPAN/SMnT, what is the result of the following command ?
ise/admin# tech top
Invoking tech top. Press Control-C to interrupt.
top - 16:20:53 up 44 days, 19:55, 1 user, load average: 1.82, 1.68, 1.81
Tasks: 954 total, 3 running, 951 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.1 us, 2.0 sy, 0.0 ni, 93.7 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 257403.3 total, 110243.4 free, 61461.1 used, 85698.8 buff/cache
MiB Swap: 8001.0 total, 7637.0 free, 364.0 used. 122545.2 avail Mem
...
Note: please take a look at: ISE - What we need to know about SNS / VM .
Hope this helps !
01-09-2026 11:35 AM - edited 01-09-2026 11:43 AM
Thanks for the response, see below answers to your questions:
Yes, we were not seeing any of these /opt disk utilization errors on 3.3P8. It just happened once we upgraded the environment to 3.4.P4. That said, there are some "ghosts" we're still in the process of identifying (replication issues, service crashes, etc), that we are attributing to the upgrade as well. At this point, I don't know if replacing the PPAN, and SPAN will make a difference or not, but initially, that's what our thought process was (to see if we still are running into the utilization space problem).
ISE VM - "Medium" sizing (600GB Disks for PPAN / SPAN).
PPAN:
Invoking tech top. Press Control-C to interrupt.
top - 19:33:30 up 31 days, 17:28, 1 user, load average: 3.75, 4.60, 2.42
Tasks: 716 total, 1 running, 715 sleeping, 0 stopped, 0 zombie
%Cpu(s): 6.9 us, 3.1 sy, 0.0 ni, 89.3 id, 0.2 wa, 0.2 hi, 0.2 si, 0.0 st
MiB Mem : 96127.9 total, 21548.1 free, 27647.8 used, 46932.1 buff/cache
MiB Swap: 7999.9 total, 7994.5 free, 5.4 used. 33427.3 avail Mem
SPAN:
Invoking tech top. Press Control-C to interrupt.
top - 19:31:42 up 34 days, 23:30, 1 user, load average: 2.61, 2.04, 1.64
Tasks: 686 total, 1 running, 685 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.3 us, 1.3 sy, 0.0 ni, 97.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 96127.9 total, 30885.3 free, 35153.8 used, 30088.8 buff/cache
MiB Swap: 7999.9 total, 7992.7 free, 7.2 used. 41894.5 avail Mem
Thanks!
01-10-2026 04:07 AM - edited 01-10-2026 04:07 AM
@Inq_J ,
you have an ISE 3.4P4, Medium Deployment, with 4x Nodes (2x PAN/MnT and 2x PSNs), and installed on a VM compatible with the SNS-3755 (40 vCPUs, 96GB RAM and 600GB HD).
The /OPT (/dev/sda7) is filling up quickly on both PAN/MnT, am I correct ?
What is the result of the following command ?
ise/admin# show disk
disks
Internal filesystems:
Filesystem Size Used Avail Use% Mounted on
...
/dev/sda7 550G 166G 357G 32% /opt
...
You can
ise/admin# application reset-config ise
Initialize your Application configuration to factory defaults? (y/n): y
Leaving currently connected AD domains if any...
Please rejoin to AD domains from the administrative GUI
Retain existing Application server certificates? (y/n): y
...
if you notice an improvement, you can
In this way, we will check if the problem is solved by redoing the PANs/MnTs.
Hope this helps !
01-12-2026 05:09 AM
Thank you very much for the recommendation. I tried this over the weekend, and I'm still seeing /dev/opt directory filling up relatively quickly (went from 13% to 24% in a matter of 24 hours). The other "ghosts" post-upgrade that I'm seeing are errors like, Cannot find device "podman2", when stopping / starting services. Services (logstash/elasticsearch) are still crashing every few hours.
I'm going to work on deploying a fresh PPAN/SPAN today to see if some of the issues we're seeing go away, or if they persist.
Much appreciated!
01-12-2026 06:25 AM
I would recommend opening a TAC case. You may be hitting this bug: https://bst.cloudapps.cisco.com/bugsearch/bug/CSCws61409
01-12-2026 06:34 AM
Marvin,
Thanks! I'll read through the bug, and pass it along to our TAC engineer. We already have a case opened.
Much appreciated!
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide