cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
330
Views
4
Helpful
10
Replies

ISE deployment stability

arane0001
Level 1
Level 1

We are currently experiencing ongoing instability issues during the deployment of Cisco Identity Services Engine (ISE) across multiple environments, including both new and previous deployments. These issues appear to be recurring and are impacting our ability to ensure smooth operations.

Given that similar instability was experienced with previous deployments, we suspect the underlying cause may be systemic, possibly related to resource limits, database synchronization, or network configuration issues. However, further investigation is required to identify the exact root cause.

critical issues so far we had in the past >>>

ise not backing up , PAN /opt folder suddenly crossing threshold above 75%, queue link error, queue memory high, log collection error, application services stuck in initializing, SAN server crash, not able to see live logs

I have faced these issues in ISE 2.2, 2.7 and 3.2. We have gone through multiple VM changes across different vlans, the above issues still remain.

I would also appreciate any best practices or configuration recommendations to help mitigate such issues moving forward.

10 Replies 10

Big thing with ISE, do not thin provision, do not vmotion or take snapshots. I have used ISE since 2.1 and has been extremely stable.

I dont think we do snapshots I ain't sure of the rest but will check with the VM team. how many endpoints you have in your environment? 

Due to the size they didn't want to do VMs, so we have appliances. 2x3755s

ammahend
VIP
VIP

make sure you have correct resource reservation

follow this guide

section :

Virtual Machine Appliance Size Recommendations for Cisco ISE

-hope this helps-

Yes, we do 

Arne Bier
VIP
VIP

Depending on the complexity of the Policy Sets and whether you are deeply embedded with DNAC (SDA) etc, it might benefit you to start from a clean slate. Deploy a new pair of PANs (new IPs) from ISE 3.3 OVA images, patch to latest patch and then build up the config step by step. You can import user accounts, endpoints, network devices, profiling policies. The Guest Portals (if any) must be rebuilt by hand, and so do the RADIUS/TACACS+ Policy Sets. But it's so, so worth it.  You get rid of years of crud in the system and any potential file/database corruption.  There is no reason for an ISE deployment to be so unstable - it's symptomatic of some underlying technical debt that will be hard to find and eradicate.

A few Best practices that come to mind (in addition to what has already been said ... I'll say it again):

  • No VM snapshots EVER! No need for it, but you MUST inform/configure your VM platform to NOT make snapshots
  • Live vMotion is supported from ISE 3.2 (and later) - but only the compute, not the storage
  • Monitor the VM Resources (e.g. in vSphere) - if you are always hammering to the limit then increase
  • Disk thick provision - always
  • Use VMXNET3 on vSphere instead of E1000 adapters  
  • Reduce logging with ISE Collection Filters
  • Create efficient Policy Sets (break them into binary trees) and put most frequent Rules above less frequent Rules
  • Enable targeted Alarm emails to monitor for top 10 issues (certs, disk, sync) and READ and Action those!

  • Create efficient Policy Sets (break them into binary trees) and put most frequent Rules above less frequent Rules ....can you elaborate? We have pretty standard rule sets 802.1x and MAB >>> 802.1x > windows and macintosh and MAB across >> phones, cameras, WAP, infrastructure devices >> everyone has its own policy...how do I avoid repeated authentications for a device which has been authenticated already, i know there is an option to suppress that but I ain't sure that working effeciently. Also all my stuff is loadbalance across the F5 VIP >> 3 PSN node.

Regarding the RADIUS Policy Set structure, each customer setup might be different. If we assume that the single ISE deployment must handle wired MAB&802.1X and Wireless MAB&802.1X, as well as a bit of PAP Auth (e.g for Cisco FMC AAA) I tend to go with the following Top-Level structure:

  • Wired MAB
  • Wired 802.1X
  • Wireless 802.1X
  • Wireless MAB (used only for iPSK)
  • PAP Auth (e.g. AAA for devices that don't support TACACS+)

Each one of the above bullet points in a separate Policy Set. Some folks like to combine Wired and Wireless into the same Policy Set - I find that this is not great because in the Authorization Rules, you have to test the condition of "is it wireless or is it wired" and in a large Policy Set, this can be very messy, and causes a lot of redundant checking.  What I meant by "binary tree" is that if you check the condition once, at the top of the Policy Set, then there is no need to check again.  Wired MAB is wired MAB, etc. Your Policy Sets will be simpler to maintain, and ISE will have fewer conditions to process and fail through. It's the same logic a programmer would use when creating an efficient if/then or switch/case statement - there are further enhancements (like Boolean short circuit evaluations) mentioned from page 36 onwards in Cisco Live BRK-3699 (I don't recall the BRK number for the more recent ones - but this one always stuck in my mind)

As for F5 load balancers - if you have IOS-XE network devices, you can use the built in load balancing feature - it does an amazing job and you don't need the F5.  The F5 has some advantages still (like pool membership management for patching/upgrading ISE nodes) - but functionally, RADIUS load balancing can be achieved without a central load balancer.

Yes that's how I have it separated on the policys . all my policies are unique for wired and wireless and then 802.1X and MAB and for VPN and for guest portal

Hi @arane0001 ,

 please take a look at:

ISE - Slow Replication

ISE - Queue Link Error

Navigating Security in a Chaotic Environment - Part II, search for Periodically Reevaluate your Deployment.

 

Note: ISE 3.3 P4 fixes many Memory Leak issues

CSCwh05464
CSCwi47249
CSCwh92614
CSCwm33110
CSCwh92320
CSCwm48867
CSCwk06817
CSCwj72586

 

Hope this helps !!!