cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
629
Views
13
Helpful
13
Replies

CSCwk06817 - Known OS Bug causing swap to increase on ISE nodes

amunozmo
Cisco Employee
Cisco Employee

Linear Swap Memory growth on two ISE nodes cluster having all personas on 3715 appliances. High swap and RAM memory usage, memory leaks in jsvc and Java admin processes, the critical point here for this discussion is that this issue is still happening on ISE version 3.3 patch 6 when the known fixed Release is 3.3 patch 4 according to this bug https://bst.cisco.com/bugsearch/bug/CSCwk06817 

How is possible to still have linear Swap Memory growth on ISE nodes when the defect was fixed on ISE 3.3p4? TAC confirmed evidence of defect CSCwo82042 (LSD MAC handling) in logs but there is no evidence of what is causing swap memory leak and growth. 

13 Replies 13

Greg Gibbs
Cisco Employee
Cisco Employee

This is not something the Community can help with. You would need to work with TAC to investigate and determine if a new bug needs to be opened.

Hi Greg, 

Thank you very much for your prompt response and thoughtful comments.

While I fully understand and appreciate your perspective, and in fact, we've been actively working with the TAC to address this issue, I started this conversation within the community to help raise awareness and encourage a broader, constructive dialogue. The idea was to understand whether others in the community are also observing a progressive increase in swap memory usage over time, despite the workaround that was expected to resolve this in version 3.3 Patch 4, and my customer is now in version 3.3 Patch 6.

I believe this could be a valuable opportunity to share insights and to have a different approach that may benefit customers globally. After all, if someone in the community has identified the root cause and is willing to share a solution, it would greatly support others facing similar challenges.

Thank you once again an i would love to hear your thoughts on this approach.

I am having the same issue on 3.3p6 and I can watch the increase in memory and swap climb slowly over time.  it looks to be approx 1% increase give or take each day.  Initially I did a reload on the entire deployment just to make sure it wasnt something funky going on.   but since it is still increasing slowly. im going to wait until it hits near 70% usage before creating a new TAC on it.  

Hi @amunozmo ,

 I have Customers with ISE 3.3 P4 and ISE 3.3 P6, using RADIUS only, with 3755 and 3795 (VMs) and I have not noticed the memory leak related issues that we had in ISE 3.3 P2.

 Are you able to test your scenario on a 3755 VM ?

 

Best regards.

 

Hi,

We have the Same problem 3 months ago, and we are working with TAC and the only solution today is ADD 2 nodes more to separate PAN and MnT, we will update to 3.4 patch 2 to test, if the issues get solved i will tell you. Today we are running ISE 3.3p5, and we have to reload each 15 day aprox.

Hi @scastrilonospina ,

 what is your Hardware ? Appliance or VM ?

 

Note: I also recommend you the: ISE - What we need to know about SNS / VM.

 

Hope this helps !!!

 

Hi Marcelo,

Appliance

Hi @scastrilonospina ,

 thanks.

 What is the output of the tech top command before reloading the Node ?

 

JuanVelez
Level 1
Level 1

I've been experiencing this same issue consistently. I've been monitoring system processes daily using the "tech top" command and have noticed memory usage increasing by approximately 100 MB per day. The jsvc process appears to be the primary contributor to this memory drain.

Over time, I've opened multiple support cases regarding this problem. The recommended solutions have included disabling certain features or applying various patches, but none have resolved the issue. The most recent explanation I received pointed to system sizing. We're currently running two SNS-3715 nodes handling RADIUS, TACACS, and Posture Assessment. No self-provisioning, no CWA, no pxGrid, no SGT, no API integrations. While TAC believes the hardware is insufficient, the official sizing guide indicates that our deployment should be well within supported limits, even oversized for our needs.

We’re currently on version 3.3p5. I plan to test 3.4p2 soon to see if the problem is addressed in that release.

Has anyone else encountered a similar memory problem with jsvc or found a lasting fix?

Hi @JuanVelez ,

 please take a look at: ISE - What we need to know about SNS / VM, pay special attention to the Particularities topic.

 

Are you "running the SNS-3715" as an Appliance or a VM ?

 

Hope this helps !!!

 

Hi @Marcelo Morais, thanks a lot for replying!

We're running ISE on two physical SNS-3715 appliances (no virtual machines involved).

I reviewed the section you referenced: "SNS (Secure Network Server) Appliance > Particularities", and compared it with the official "Performance and Scalability Guide for Cisco Identity Services Engine", which I often read thanks to this issue. According to that guide, our deployment falls under the "small" category. We handle around 2000 to 3000 concurrent sessions at peak, and according to the sizing tables, a shared PSN on an SNS-3715 is rated to support up to 25000 sessions. Both our nodes run PAN/SAN, PSN, and MnT personas.

In theory, we're well under capacity — operating at <20% of the rated limit — yet we’re seeing continuous memory growth, mostly tied to the jsvc process.

So, this leads me to some broader questions:

  • Under what real-world conditions does a two-node architecture for small deployments with shared personas actually remain stable long-term?
  • Are there memory behavior differences between physical appliances and virtual machines that are not reflected in the official documentation?
  • Could there be hardware or firmware issues in the SNS‑3715 platform contributing to memory leaks?
  • Could disk I/O issues on the SNS‑3715 appliances (e.g., logs, database operations, backups) gradually increase memory consumption over time?
  • Are there specific patches or configuration adjustments for the jsvc process recommended for physical appliances, similar to those applied in virtual machine deployments?

Even though we appear to be within specs, we’re still encountering this issue. I really appreciate the pointer — and I'd welcome any further insights from you or others in the community

Hi @JuanVelez ,

 keep in mind the following, described in the Particularities part of the ISE - What we need to know about SNS / VM.

" ... Additional hardware resources such as RAMCPU, or HDD cannot be added to an SNS Appliance, but additional Power Supplies can be ordered separately for SNS 3615 and SNS 3715 ... "

" ... The SNS 3x15 acting as PAN/MnT are highly recommended for either RADIUS only or TACACS+ only workloads. If a Deployment requires both RADIUS and TACACS+ at scaled workloads, it is recommended to use SNS 3x55 or higher models ... "

 

In other words ... it is easier to make some changes to a VM than to an Appliance, for testing purpose (in the past, we managed to find a very annoying issue that became a Cisco Field Notice through VM testing, please take a look at ISE - Slow Replication).

 

About your questions ...

  • Yes, a Small Deployments with Two-Node Architecture can remain stable long-term, but let's take a look at your case with RADIUS & TACACS+.
  • IMHO, no, as far as differences in the Memory behavior between Appliances and VMs.
  • Memory Leaks are primary a Software issue, but keep your Hardware update is important (please take a look at the Update part of the ISE - What we need to know about SNS / VM).
  • Yes, actions like generate logs, backups, Support Bundle, and others, increases Memory usage.
  • Let's talk 1st about configuration:
    1. At Operations > Reports > Reports > Endpoints and Users > Authentication Summary, what is your Total AuthC per Day (at Authentications by Day and Quick Link) ?
    2. At Administration > System > Settings > Profiling, your MFC Profiling and AI Rules and Endpoint Analytics Settings are disabled ?
    3. At Operations > System 360 > Settings, can you disable Monitoring and Log Analytics just for testing ?
    4. At Administration > System > Maintenance > Operational Data Purging, what about your Database Utilization ?

 

Hope this helps !!!

 

Hi @amunozmo , @JuanVelez and @scastrilonospina ,

 keeping in mind the CSCwk06817 Known OS Bug causing swap to increase on ISE Nodes .

CSCwk06817.png

 

When you run the tech top command, what is the result of your SNS-3715 ?

Examples of a 3795-VM-PAN and 3755-VM-PSN:

 

3795-VM-PAN/admin# tech top
Invoking tech top. Press Control-C to interrupt.
top - 11:41:47 up 11 days, 16:44, 1 user, load average: 0.54, 0.71, 0.82
Tasks: 819 total, 1 running, 818 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.4 us, 0.4 sy, 0.0 ni, 97.9 id, 0.1 wa, 0.0 hi, 0.1 si, 0.0 st
MiB Mem : 257403.3 total, 161230.0 free, 45260.3 used, 50913.0 buff/cache
MiB Swap: 8001.0 total, 8001.0 free, 0.0 used. 190306.5 avail Mem
...

 

3755-VM-PSN/admin# tech top
Invoking tech top. Press Control-C to interrupt.
top - 11:43:57 up 11 days, 15:29, 1 user, load average: 1.51, 1.63, 1.61
Tasks: 878 total, 2 running, 876 sleeping, 0 stopped, 0 zombie
%Cpu(s): 7.9 us, 3.1 sy, 0.0 ni, 88.4 id, 0.0 wa, 0.2 hi, 0.3 si, 0.0 st
MiB Mem : 96124.0 total, 46568.2 free, 30508.5 used, 19047.3 buff/cache
MiB Swap: 7999.9 total, 7999.9 free, 0.0 used. 58621.5 avail Mem
...

 

Best regards.