Solved: Re: Cisco ISE AWS upgrade 3.1 -> 3.2

sroic · ‎02-22-2023

Hello,

I'm interested has someone already done this and what would be the best way to approach this. We are running cloud native AWS deployment and according to docs this deployment cannot be upgraded but needs to be redeployed and configuration restored from backup. We have small HA deployment of 2 nodes which are currently running 3.1 version. We made configuration backup on primary ISE and exported certificates.

Our idea was to destroy the secondary ISE instance and redeploy it with 3.2 version from marketplace. After that we planned to do initial CLI config (ips, routes, etc.), import one trusted and one system certificate and restore the backup but without ADE-OS config because we are restoring on different node (different IPs). If this goes well we would check the authentication works on new secondary ISE, then destroy the primary, redeploy with 3.2 and add to deployment to sync with secondary. This way we keep one ISE running all the time (so we don't have production downtime) and have an option to revert if upgrade doesn't work.

The thing is, when we restored the config on new 3.2 ISE it ended with a status "success" (#show restore status) but after the services restarted GUI wasn't accessible. I tried stopping and starting ise app(also in safe mode) but without success. #show ports command shows that port 443 isn't running despite the right process running:

ISE PROCESS NAME STATE PROCESS ID
--------------------------------------------------------------------
Database Listener running 3842027
Database Server running 129 PROCESSES
Application Server running 3856115
Profiler Database running 3848495
ISE Indexing Engine running 3857114
AD Connector running 3858478
M&T Session Database running 3852639
M&T Log Processor running 3856336
Certificate Authority Service running 3858223
EST Service running 3858653
SXP Engine Service disabled
TC-NAC Service disabled
PassiveID WMI Service disabled
PassiveID Syslog Service disabled
PassiveID API Service disabled
PassiveID Agent Service disabled
PassiveID Endpoint Service disabled
PassiveID SPAN Service disabled
DHCP Server (dhcpd) disabled
DNS Server (named) disabled
ISE Messaging Service running 3844843
ISE API Gateway Database Service running 3847436
ISE API Gateway Service not running
ISE pxGrid Direct Service running 3870101
Segmentation Policy Service disabled
REST Auth Service running 3878601
SSE Connector disabled
Hermes (pxGrid Cloud Agent) disabled
McTrust (Meraki Sync Service) disabled
ISE Node Exporter running 3859115
ISE Prometheus Service running 3860287
ISE Grafana Service running 3863957
ISE MNT LogAnalytics Elasticsearch disabled
ISE Logstash Service disabled
ISE Kibana Service disabled

I tried to reload/restart whole VM and after that I cannot login neither to SSH or console in AWS. SSH just closes the windows after authentication and console after login shows:

Failed to log in 1 time(s)
Last failed login on Wed Feb 22 13:41:23 2023 from ttyS0
Failed to connect to server
Exit

So I'm cut out of everything right now and only option is to destroy this instance and start again. But I don't understand what we did wrong? Is it against the best practice to restore backup of 2 node HA on the "secondary" node, even though this is the primary (and only) node in deployment at this moment. Is the backup somehow connected to the instance?

We can't afford to destroy our whole production deployment and have downtime while we deploy new one. Also it would be a huge issue to deploy 2 more ISE for a seperate 3.2 deployment in paralel. We would need to change IPs of radius servers on all our devices that use it for authentication. Also I don't see what would be different to restoring on some new VM comparing to what we did now.

Maybe we missed something else but just wondering if our approach is right? Is anyone else running AWS native deployment?

EDIT: Also what's interesting is that radius authentication test pointing to new ISE 3.2 is successfull when run from switch

Charlie Moreton · ‎02-22-2023

If you restored from CLI, you might be hitting this bug: CSCwe13974 Otherwise, I'd suggest calling TAC to troubleshoot.

View solution in original post

Charlie Moreton · ‎02-22-2023

When deploying the 3.2 node, did you complete the User data step?

sroic · ‎02-22-2023

Hi Charlie,

yes we did, the same as on ISE 3.1 and the login works as expected when it's fresh installed (only Cisco changed default admin username to iseadmin).

Access to ISE breaks after the restore of config

Charlie Moreton · ‎02-22-2023

If you restored from CLI, you might be hitting this bug: CSCwe13974 Otherwise, I'd suggest calling TAC to troubleshoot.

sroic · ‎03-03-2023

Just to update if anyone else stumbles upon this issue. We fixed GUI issue with TAC by generating a new self-signed server certificate from CLI using the menu “application configure ise” then choosing option “31”. Unfortunately after connecting to GUI we run into an issue where deployment menu cannot be opened with message:
“java.lang.NullPointerException: null"
This is what we noticed at first and have a new ticket opened for it, waiting for feedback from TAC