Solved: Re: Cisco ISE 2.7 - profiling issues - possible corruption of database

AigarsK · ‎02-26-2023

Hi All,

Apologies for such hysterical title, but I am seeing something odd.

We have two physical (3515) Cisco ISE deployment running version 2.7 Patch 9. Node 1 (PPAN, PMnT, PSN, PxGrid) and Node 2 (SPAN, SMnT, PSN, PxGrid)

I am seeing authentication failures for devices which had worked before and no changes had been made. If I was to take one of affected Endpoints and have a closer look at it in Context Visibility - Endpoints I see that its Endpoint Profile is listed as Unknown and same for Identity Group.

However if I Edit the Endpoint, in Edit page it does list dynamic Assignments of both Policy and Identity Group which would match my Profiling Policy.

It appears that by some reason these both locations displaying information is not in Sync and Policy processing uses one listed in Endpoint Context and not the one of actual Endpoint.

Hope someone has an idea as to what can help to resolve this issue.

davidgfriedman · ‎02-26-2023

1. Are the endpoints going OUI Unknown and/or Total certainty 0?
2. Have you added any policy service nodes, or completely rebuilt any recently?
3. Do you see the OUI going unknown if you take one of these "changing" endpoints go the Operations->Reports->Endpoint and Users->Profiled Endpoints Summary report?

We recently had something similar, where a few thousand endpoints using MAB (not 802.1x) were reprofiled to unknown and getting booted. It turns out we had just added many new PSNs on a version 2.7 ISE cube and the build / join / sync had not updated the new PSN's OUI database properly - it was still running on those nodes with the copy from the ISO, i.e. out of date. So, when the devices DHCP / Accounting data (DHCP / device-sensor / etc.) went to one of the other / new nodes, the OUI went unknown, the total certainty went to 0 and boom, they were booted at the next authentication or CoA (Change of Authorization).

We have the profiler feed off (we update it periodically, but not every day because that breaks something each month), so we couldn't try to run the profiler feed again to see if it would sync all nodes. We actually needed TAC to identify this problem (we had no clue) and they had to go in as root to the underlying Linux OS, copy an OUI update related file from one of our original members, and put it on every one of our newly built members. After that was done to every new PSN, after we stopped / restarted services, the issues started clearing up and were gone by morning. We've been stable since that fix.

View solution in original post

balaji.bandi · ‎02-26-2023

How Long are these ISE up and running, if they have sync issues you should see Alarms. do you see any?

you looking ISE point of you nothing changes, you need to look broader level what is changed when the issue started ?

since its high availability, Turn off One of the ISE nodes and see is that resolve the issue. do the same on other ISE nodes to confirm all working as expected ?

Note: as you suspect the database issue, that will eliminate your guess.

some Troubleshoot guide lines :

https://ciscocustomer.lookbookhq.com/iseguidedjourney/ISE-troubleshooting

check one of the thread that may help you :

https://community.cisco.com/t5/network-access-control/ise-posture-status-compliant-to-unknown/td-p/4302481

Posture config reference :

https://community.cisco.com/t5/security-knowledge-base/ise-posture-prescriptive-deployment-guide/ta-p/3680273#toc-hId--232251767

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

AigarsK · ‎02-26-2023

Thanks for reply.

I am still working with Cisco TAC on this.

For brief moment it looked like issue was resolved by logging onto PPAN CLI and doing application configuration ise and then selection options 5 and 21. THis lasted only 5 minutes or so and then again I was met with the issue where Context Visibility of endpoints did not match what Endpoint reported for its Identity Group Assignment.

While waiting for another callback I managed to locate and article which explained process of doing Context Visibility reset:

https://www.cisco.com/c/en/us/support/docs/security/identity-services-engine-23/213610-ise-2-3-rest-sync-context-visibility.html

This appears to match my issues where Endpoint attributes are out of Sync with Oracle Database. Things which might be further hindering this sync is that my Primary PAN is not the one processing authentication, in fact network devices contact Secondary PAN first (poor mans way of load balancing) which would need to sync endpoint attributes over to Primary PAN.

Either way, it looks like some of profiling policies are no longer working and I need to delete endpoint and allow for it to be rediscovered.

davidgfriedman · ‎02-26-2023

1. Are the endpoints going OUI Unknown and/or Total certainty 0?
2. Have you added any policy service nodes, or completely rebuilt any recently?
3. Do you see the OUI going unknown if you take one of these "changing" endpoints go the Operations->Reports->Endpoint and Users->Profiled Endpoints Summary report?

We recently had something similar, where a few thousand endpoints using MAB (not 802.1x) were reprofiled to unknown and getting booted. It turns out we had just added many new PSNs on a version 2.7 ISE cube and the build / join / sync had not updated the new PSN's OUI database properly - it was still running on those nodes with the copy from the ISO, i.e. out of date. So, when the devices DHCP / Accounting data (DHCP / device-sensor / etc.) went to one of the other / new nodes, the OUI went unknown, the total certainty went to 0 and boom, they were booted at the next authentication or CoA (Change of Authorization).

We have the profiler feed off (we update it periodically, but not every day because that breaks something each month), so we couldn't try to run the profiler feed again to see if it would sync all nodes. We actually needed TAC to identify this problem (we had no clue) and they had to go in as root to the underlying Linux OS, copy an OUI update related file from one of our original members, and put it on every one of our newly built members. After that was done to every new PSN, after we stopped / restarted services, the issues started clearing up and were gone by morning. We've been stable since that fix.

AigarsK · ‎02-27-2023

Thanks David,

This might be the culprit here, I tried to keep this off the main post, but we indeed just went through full blown recovery by performing rebuild of both nodes, patching it direct to Patch 9 and performing restore of Primary PAN, Secondary PAN was just reset, clean install performed and Patch 9 installed before it was just joined the ISE Deployment, so it could be classed as new PSN.

I have done Feed Update and noticed that it had loads of new OUI entries added. We indeed had some devices as unknown and some devices which had passed DHCP attributes classified as Windows Workstations and Linux Workstations with few IOT devices listed only as Android.

Since last session with Cisco TAC and performing Feed Update, Context Visibility reset we are OK for now, but if issue repeats, I will advised them on issue You encountered and what fix was applied.
Guess that no BUG ID was provided to you?

davidgfriedman · ‎02-27-2023

I have no BugID. I also I need to correct myself. We were on with our Cisco Professional Services related teams. Not TAC. They are the ones who suggest what might be the problem. One of them knew how to obtain root access, checked for the file, found it lacking, and assisted us in getting it from a "known good" unit over to the new units, where the app stop ise, app start ise process resolved the issue.

I am glad your issue has cleared up. A slick(ly running) ISE cube is a good ISE cube.