Solved: ISE Reauth timers and stale endponits

Josh Morris · ‎08-23-2022

I have a ton of statically assigned devices. I've tried to get them moved to profile based, but the endpoints are all over the place and the system owners refuse to try and standardize. These statically assigned devices typically get a dynamically assigned VLAN and DACL. I had an issue a few years ago where I lost my entire ISE environment for hours during a failed patch. At the time I had an authentication reauthentication timer hardcoded on each switchport. This led to many devices trying to reauth while ISE was down and not being able to get their VLAN assignment, and they lost access. Yes, I have a critical VLAN assigned, but it is assigned to the default VLAN, not the dynamic VLAN.

To try and keep this from happening again, I removed the hardcoded reauth timer from the switches and told them to use ISE for a reauth timer instead. Currently, any device that does NOT need a dynamic VLAN has an ISE provided reauth timer. These work fine. But for any devices that gets a dynamic VLAN, ISE does NOT send a reauth timer. I think this helps protect me against another situation where the devices would fail to connect during an ISE outage. But I've found it has led to ISE having a lot of 'stale' endpoints. I have a lot of statically assigned endpoints that are active on the network but haven't had a reauth in a long time, so they show up in my stale reports (endpoints with inactive days > 180). So I have to identify those and go to their switchport and manually clear authentication session.

Besides the obvious "move them from static to profiled", do you know of a better way for me to tackle this?

Arne Bier · ‎08-23-2022

Hi Josh

I hear you! The debate of whether or not to re-auth a wired connection also bothered me for a long time. Not only because of the issue you'd face if there was no RADIUS comms, but also, because wired connections stay wired (until link goes down). So why bother to re-auth, right? But. Stale sessions. I see them too. Especially on NIC Teaming arrangements - device has two links, both are UP/UP but the MAC address lives on one of the interfaces only - so that interface gets a NAC Session - but when NIC Teaming flips to the other link, then the MAC moves - and a new Session is created. The switch retains both sessions - only one is valid - the other is a zombie.

I am of the opinion that if you have two RADIUS servers configured on the switch then you should make sure that the redundancy is such that you will most likely never lose comms to both. But - if the branch has a single WAN router out to the DC then you're snookered.

One potential solution to make the situation "better" (not bullet proof) is to use IBNS 2.0 Role-Based Critical Authorization.

Not only will you return an Access-Accept and VLAN/dACL to the switch, but you will have to return a "Role" to the session. The switch caches this Role along with the EndpointID - and the cool thing is, that in the event of the RADIUS being down, a re-auth will use the last cached Role for that endpoint. I don't think it helps you with NEW connections while RADIUS is down though.

There should be a diagram somewhere on the internet that shows the inversely proportional relationship of Security vs Convenience. As long as there is some central "brains" making the decisions, we will have to make sure that we stay connected to the brains

View solution in original post

Arne Bier · ‎08-23-2022

Hi Josh

I hear you! The debate of whether or not to re-auth a wired connection also bothered me for a long time. Not only because of the issue you'd face if there was no RADIUS comms, but also, because wired connections stay wired (until link goes down). So why bother to re-auth, right? But. Stale sessions. I see them too. Especially on NIC Teaming arrangements - device has two links, both are UP/UP but the MAC address lives on one of the interfaces only - so that interface gets a NAC Session - but when NIC Teaming flips to the other link, then the MAC moves - and a new Session is created. The switch retains both sessions - only one is valid - the other is a zombie.

I am of the opinion that if you have two RADIUS servers configured on the switch then you should make sure that the redundancy is such that you will most likely never lose comms to both. But - if the branch has a single WAN router out to the DC then you're snookered.

One potential solution to make the situation "better" (not bullet proof) is to use IBNS 2.0 Role-Based Critical Authorization.

Not only will you return an Access-Accept and VLAN/dACL to the switch, but you will have to return a "Role" to the session. The switch caches this Role along with the EndpointID - and the cool thing is, that in the event of the RADIUS being down, a re-auth will use the last cached Role for that endpoint. I don't think it helps you with NEW connections while RADIUS is down though.

There should be a diagram somewhere on the internet that shows the inversely proportional relationship of Security vs Convenience. As long as there is some central "brains" making the decisions, we will have to make sure that we stay connected to the brains

Massimo Baschieri · ‎08-31-2022

Isn't a STALE endpoint the same of what ise calls INACTIVE endpoint?

If I'm not mistaken, for ise an inactive endpoint is the one for which it never receives accounting stop OR accounting updates from nads, so if you enable interim accounts on nads you should avoid the inactive issue.

About nic teaming mac phantoms, issueing an inactivity timeout on switch port doesn't help?

Arne Bier · ‎09-01-2022

Not sure who defines STALE/INACTIVE. ISE creates an entry in its Endpoint database every time a NAD authenticates a new endpoint that ISE has never seen before. From that point onwards, the Endpoint is Active/Inactive (based on RADIUS Accounting). If the RADIUS Accounting Stop is never received, then the Session will eventually for into Inactive state after 5 days.

I think the problem I was talking about is Authorized sessions on switch interfaces, but no record of that in ISE. Those sessions are troublesome and I think I solved that by using Session-Timeout values returned by ISE during Authorization.

Re-auth comes with its own problems though. In a bad failure scenario, where the NAD has no RADIUS servers (e.g. WAN link down) then the re-auth will fail. Of course you can plan for that with critical auth VLAN etc. but it involves more config. The benefit of not doing re-auth is that the Sessions stays up as long as the Ethernet link is up. Which is ideal because the session won't drop if RADIUS servers are unavailable.

Massimo Baschieri · ‎09-01-2022

I think ISE sets endpoints as inactive if it doesn't receive accounting stop OR accounting updates in 5 days.

I have deployments without reauthentication for wired hosts and purge policies in place for inactive endpoints, never seen ISE purging a working endpoint, in my mind this is due to interim updates configured on nads.

I've read about a preconfigured report for inactive endpoints, cannot find it, do you know where I can find it?

Arne Bier · ‎09-01-2022

you can log into the PAN CLI, and export the endpoint database

application configure ise

select the option that dumps all the endpoints. I can't remember which number it is.

and then load that CSV file into Excel. You can then filter on the InactiveDays field

Massimo Baschieri · ‎09-02-2022

That's clear thanks, I can also use ISE EAT for that, I was simply hoping there was something in the reports section of the GUI.