Solved: Re: ISE 3.0 - Node Recovery - DR Simulation

matt.w · ‎08-22-2022

Howdy All,

Hopefully this has not been asked too many times previously. We currently have a two node ISE 3.0 deployment; Primary and Secondary Node. Have been asked to document for DR purposes how we would recover from losing both Nodes and rebulding.

We do export daily operational backups to a sftp server, so they are available to restore back to the node etc but I have a couple of questions?

To test the scenario we would like to do it in an isolated test Lab environment, where the networking is in a different address range, and is isolated from the rest of the network etc.

If I were to build a new Primary Node from the OVA deployment, configure up the networking to suit the network segment so its listening correctly. If I then attempt to do a restore from a backup file on the repository (server cloned into same network segment), will the restore over write the network configuration with the Production addresses?

would I need to re-join the Primary to AD etc prior to restoring the operational backup file? or is all this stored inside that?

Are there documents available somewhere, that will answer these questions?

We have simulated a single node failure, and have that process down, so thats not an issue, just trying to work out the best way we can simulate losing both nodes?

Also, if we were to lose both nodes, on switches that are configured for 802.1x and WLC doing the same with links to ISE as the AAA service, does this mean the entire network will be down for anything that requires ISE to be present?

Thanks in advance, hopefully it makes sense.

Arne Bier · ‎08-22-2022

Hello @matt.w

Make sure you perform Config Backups. The Operational Backups are the AAA logs (live logs). You can back those up too if you like, and then restore them to the newly built boxes - but it's the Config Backup that holds the configs.

About the networking. The config backup won't care about the IPv4 addressing on the node on which you restore the config.

So you build a new ISE 3.0 VM and give it a new IPv4 address. And ensure the node stays in STANDALONE mode (i.e. don't promote it to PRIMARY). Then restore the latest Config Backup onto that new node - ISE will restore all the config and leave the IP networking untouched. You will have to put the Admin and EAP certs back on that node. Either create new ones or export them from the old system and keep them somewhere safe (cert and private key). But. Beware. DNS. The Admin cert's Subject will relate to a FQDN - e.g. ise01.company.com. If your prod is called ise01.company.com and DNS points to 10.10.10.10 - but your DR rebuild is on 10.10.20.10 then you will have to change the DNS to point to the new node in the DR.

After a config restore you do typically have to rejoin the AD.

If you rely on 802.1X/MAB in yoour WLC and Switches, then the following will happen

- authorized sessions on switch ports with no session timeout will not notice anything - but if you have set a session timeout then the session will not get re-authorized when it counts down to 0sec and AAA is dead. If you design it correctly, you can authorize those sessions to a Critical Auth VLAN. But you only get one VLAN for everything. And you can also authorize the Voice DOMAIN.

- new clients connecting while AAA is dead will also authorize them to the critical auth VLAN if setup.

View solution in original post

Arne Bier · ‎08-22-2022

Hello @matt.w

Make sure you perform Config Backups. The Operational Backups are the AAA logs (live logs). You can back those up too if you like, and then restore them to the newly built boxes - but it's the Config Backup that holds the configs.

About the networking. The config backup won't care about the IPv4 addressing on the node on which you restore the config.

So you build a new ISE 3.0 VM and give it a new IPv4 address. And ensure the node stays in STANDALONE mode (i.e. don't promote it to PRIMARY). Then restore the latest Config Backup onto that new node - ISE will restore all the config and leave the IP networking untouched. You will have to put the Admin and EAP certs back on that node. Either create new ones or export them from the old system and keep them somewhere safe (cert and private key). But. Beware. DNS. The Admin cert's Subject will relate to a FQDN - e.g. ise01.company.com. If your prod is called ise01.company.com and DNS points to 10.10.10.10 - but your DR rebuild is on 10.10.20.10 then you will have to change the DNS to point to the new node in the DR.

After a config restore you do typically have to rejoin the AD.

If you rely on 802.1X/MAB in yoour WLC and Switches, then the following will happen

- authorized sessions on switch ports with no session timeout will not notice anything - but if you have set a session timeout then the session will not get re-authorized when it counts down to 0sec and AAA is dead. If you design it correctly, you can authorize those sessions to a Critical Auth VLAN. But you only get one VLAN for everything. And you can also authorize the Voice DOMAIN.

- new clients connecting while AAA is dead will also authorize them to the critical auth VLAN if setup.

matt.w · ‎08-23-2022

Howdy thanks for that, see I have lots going on at the moment and I got confused between operational and config backups but we do both, and steps make sense, so now I will go and have a go to document the steps.

Massimo Baschieri · ‎08-23-2022

@Arne Bier "But you only get one VLAN for everything"

In critical environments, if you create a template for each use case which refers to different service policies you can have each host getting the right critical vlan, isn't it?

Not much flexible and quite error prone, but when survival is a must it can help

Arne Bier · ‎08-24-2022

Hi @Massimo Baschieri

How are you detecting the endpoint type in order to assign it the correct critical VLAN? e.g. printer, phone, WAP, etc.? Unless you create a Service Policy for each of these things and then intentionally apply it to the interface? Wow! That is dedication! The IOS-XE Device Classifier allows you to do some free profiling on the switch but it's not as detailed as the ISE Profiling.

In fact, in situations where we ignore dynamic VLAN override, and just use ISE to return an Access-Accept, then the situation could be even simpler. We don't need a critical VLAN defined at all - all we need is the IBNS 2.0 to "authorize" the interface in the even of AAA unavailable. I don't know if the syntax allows that but it would be like a "fail-open" mode for emergencies without regard for the VLAN ID.

Massimo Baschieri · ‎08-24-2022

As you have understood I don't make much use of dynamic vlan on production environments, each department/host-type has its own vlan usually associated to its own virtual profile.

Since at the end they are usually not too much I create an ibns2 template for each vlan where critical vlan is the same as access vlan, all in all is a simple copy/paste/modify game.

Local ISE PAN

No reauthentication timers so that in case of outage only power cycled devices triggers critical vlan

Not very scalable but it's the strongest environment I can think of