ISE 2.7 patch 6 - large number of devices in disconnected state

Stuart Patton · ‎06-04-2024

Hi,

Got a weird problem and looking to the community for help.

I happened to spot on the dashboard that the number of active endpoints was significantly lower than normal. Digging a little deeper, there are loads of devices in this state such as Cisco desk phones, however on the switch itself the device/port is showing as being authenticated and authorised. I'm not aware that anything has changed on the switchport config (we're using SDA so this would've had to have been done through DNA Center). We've not done any switch upgrades and devices were working just fine for months since we rolled out and migrated C9300/SDA...it's just happened to lots of devices almost overnight.

I'm just worried that the endpoints in this state, some of which have static policy group assignment or static identity group assignment, will eventually get purged. The majority of devices should be ok as we have fairly well-developed profiling policies.

If I bounce one of the affected switchports, the device reauths and shows in the connected state again but it's not feasible to do this given the size of our estate. If I do a "clear auth sess int x/y/z" the device also comes back in the connected state, but it's impactive to the endpoint as it has to go through the dot1x timeout if it's MAB.

I can also select one of the devices and send a CoA reauth and that works too (weirdly it fixes immediately and is not disruptive to the endpoint), but I'm limited by the fact you can't select more than one device to send the CoA request.

Anyone got any ideas? I'm conscious that patch 10 is out and that we probably need to do that. If it fixes, great, but I don't want to make the issue any worse.

Thanks,

Stuart

Greg Gibbs · ‎06-04-2024

Where are you seeing these disconnected sessions? If it's in the Context Visibility, there may be a mismatch between the CV and endpoints databases. You might try resetting the CV sync using this method to see if that clears the issues.

Stuart Patton · ‎06-05-2024

Hi Greg,

Thanks for the reply. Yes, it is in context visibility that it is showing. I don't see any errors on the endpoint page like in the first screenshot in the URL you sent - for all intents and purposes everything else looks ok from what I can tell. And the CoA reauth from context visibility does not appear to actually reauth the device - MAB devices do not lose connectivity like they would whilst waiting for dot1x timeouts.

Thanks,

Stuart

ahollifield · ‎06-05-2024

https://www.cisco.com/c/en/us/products/collateral/security/identity-services-engine/bulletin-c25-2943876.html

Stuart Patton · ‎06-05-2024

Yep, fully aware of this. Got plans to upgrade imminently.

Arne Bier · ‎06-05-2024

I had a similar issue to this in ISE 3.2 (lots of stale info in Context Visibility that didn't reflect the real status of the endpoints) - I did what Greg suggested and it cleared it. Who knows why this happens? ISE relies on SYSLOGs to send information between PSNs and MNTs and perhaps something went screwy. I also wonder if ISE is processing the RADIUS Accounting correctly. If so, then we should never be in that state.

Stuart Patton · ‎06-06-2024

Hi Arne, yes, this is what I couldn't understand. I can see on "show auth sess int x/y/z" that the accounting periodic update timer is decreasing for the endpoints, so I am fairly sure the switches are sending this to ISE.

We'll try what Greg suggested in a maintenance window and see how we get on.