Re: Cisco ISE upgrade failed due to Disk Space sanity check failure

dthomaz77 · ‎03-13-2019

I'm working on an ISE upgrade from 2.2 to 2.4.

While running the URT tool (upgrade readiness tool) it shows failed under Disk Space Sanity Check.

The error is "% Error: Need at least 47 GB free disk space in /opt"

I created a ticket with Cisco TAC yesterday, and I still don't have feedback.

Does anyone have any documentation on to how to fix this?

Thanks,

-Dave

dthomaz77 · ‎03-13-2019

SEL-ISE02/admin# application install ise-urtbundle-2.4.0.357-1.0.0.SPA.x86_64.tar.gz backupFTP
Save the current ADE-OS running configuration? (yes/no) [yes] ?
Generating configuration...
Saved the ADE-OS running configuration to startup successfully

Getting bundle to local machine...
Unbundling Application Package...
Verifying Application Signature...
Initiating Application Install...

###########################################
# Installing Upgrade Readiness Tool (URT) #
###########################################

Checking ISE version compatibility
- Successful

Checking ISE persona
- Successful

Along with Administration, other services (MNT,PROFILER,SESSION) are enabled on this node. Installing and running URT might consume additional resources.
Do you want to proceed with installing and running URT now (y/n):y

Checking if URT is recent(<45 days old)
- Note: URT is 356 days old and its version is 1.0.0. There might be a recent URT bundle on CCO, please verify on CCO
Do you want to proceed with this version which is 356 days old (y/n):y
Proceeding with this version of URT itself

Installing URT bundle
- Successful

########################################
# Running Upgrade Readiness Tool (URT) #
########################################
This tool will perform following tasks:
1. Pre-requisite checks
2. Clone config database
3. Copy upgrade files
4. Data upgrade on cloned database
5. Time estimate for upgrade

Pre-requisite checks
====================
Disk Space sanity check
- Failed
% Error: Need at least 47 GB free disk space in /opt
NTP sanity
- Successful
Appliance/VM compatibility
- Successful
Trust Cert Validation
- Successful
System Cert Validation
- Successful
Invalid MDMServerNames in Authorization Policies check
- Successful
5 out of 6 pre-requisite checks passed
Some pre-requisite checks have failed. Hence exiting...

Final cleanup before exiting...

Collecting log files ...
- Encrypting logs bundle...
Please enter encryption password:
Please enter encryption password again to verify:
Encrypted URT logs(urt_logs.tar.gpg) are available in localdisk. Please reach out to Cisco TAC to debug
% Post-install step failed. Please check the logs for more details.
SEL-ISE02/admin# sh version

Cisco Application Deployment Engine OS Release: 3.0
ADE-OS Build Version: 3.0.2.216
ADE-OS System Architecture: x86_64

Copyright (c) 2005-2014 by Cisco Systems, Inc.
All rights reserved.
Hostname: SEL-ISE02

Version information of installed applications
---------------------------------------------

Cisco Identity Services Engine
---------------------------------------------
Version : 2.2.0.470
Build Date : Wed Jan 25 19:52:23 2017
Install Date : Fri Jun 2 17:13:50 2017

Cisco Identity Services Engine Patch
---------------------------------------------
Version : 1
Install Date : Mon Jun 05 11:06:34 2017

Cisco Identity Services Engine Patch
---------------------------------------------
Version : 3
Install Date : Mon Sep 11 11:55:14 2017

dthomaz77 · ‎03-14-2019

I finally got it fixed last night, and I really dislike the approach that TAC has done fix the issue.

Clearly the issue was the accumulations of log files under the /opt directory.

The TAC support act like he was doing something secretly. It start by applying 2 special patches:

Dev Patch VERSION INFORMATION

-----------------------------------

Version : 1.0.0 Vendor: Cisco Systems, Inc.

Build Date : February 18 2019 15:07IST

Root Patch VERSION INFORMATION

-----------------------------------

Version : 1.4.0 Vendor: Cisco Systems, Inc.

Build Date : October 29 2015 12:21PDT

After that we connect to console and start deleting different types of log files.

When I asked what those files were, he could not explain exactly what he was doing, but it looks like this could be a simple script that instead of taking two hours to delete the logs it would take 1 minute to build and run the script.

The thing that I don't like is that because there is no documentation, and you keep thinking something is going to break and you may have to rebuild/restore your entire ISE environment. Well a few hours later all servers were updated and updated to v2.4 patch 6. But I tell you, have patience and document everything.

TM13 · ‎07-13-2020

Any idea how to remove these Dev patches? our ISE having issue which having this patch

Greg Gibbs · ‎07-13-2020

It sounds like the root patch application is still installed from prior troubleshooting with TAC. It's important to note that root access is only possible within the validity period of the root key that was installed with the root patch and I believe that is typically around 30 days from generation of the key itself.

You can remove the root patch by using the 'show application' command to find the name (not 'ise') and remove it with the 'application remove <name>' command.

TM13 · ‎07-13-2020

Hi Greg,

Thanks, removed, but unfortunately this issue hasn't fixed :(

Damien Miller · ‎07-13-2020

I worked through this same issue in the past with a customer deployment, the GB required to be free changed slightly for every node in the deployment. To compound this, the disk space requirement it reports is misleading at best. We needed quite a bit more than it was asking for, ex. it asked for 47 GB in one case but required 53 free in /opt before the node would inline upgrade from 2.2 to 2.4.

2.2 is particularly bad for this issue because of the sheer number of patches released. Each patch installed takes an increasingly large number of GB of disk space, if you have too many of them installed, your patch files and backups for each is massive.

There are some options here.

You can contact TAC and have them analyze the disk space usage from root, there is a significant amount of disk space you can free up by dumping logs. If you are only a few GB short, this is an OK option.
You can reimage the node with the 2.2 iso, install the same top level patch, join it back to the deployment, then upgrade.
I don't recommend this but you can remove patches, there are plenty of past issues encountered while backing patches out so I wouldn't go down this path in production.
Perform an alternate upgrade process. Reimage the secondary PAN with 2.4+, and restore the 2.2 backup to it, then build out the deployment replacing a node at a time. You end with the current primary PAN, keeping two deployments running in parallel for the process. Nodes can use the same names, IP's, certs, etc.

Here is an example of what happens with the patches, this is the view from root. Every patch builds on space used from the previous, so as the deployment moves on with it's operations, it just compounds.

ade # cd /opt/
ade # du -h . | grep [0-9][0-9]*G | sort -n -r
96G     .
54G     ./oracle
53G     ./oracle/base
43G     ./oracle/base/oradata/cpm10
43G     ./oracle/base/oradata
25G     ./storeddata/Installed/ise
25G     ./storeddata/Installed
25G     ./storeddata
9.1G    ./localdisk
6.8G    ./oracle/base/product/12.1.0.2/dbhome_1
6.8G    ./oracle/base/product/12.1.0.2
6.8G    ./oracle/base/product
6.7G    ./storeddata/Installed/ise/14
5.2G    ./storeddata/Installed/ise/14/backup
4.3G    ./CSCOcpm
4.0G    ./oracle/base/diag
3.9G    ./oracle/base/diag/tnslsnr/xrdclpidmise01/listener/alert
3.9G    ./oracle/base/diag/tnslsnr/xrdclpidmise01/listener
3.9G    ./oracle/base/diag/tnslsnr/xrdclpidmise01
3.9G    ./oracle/base/diag/tnslsnr
2.8G    ./storeddata/Installed/ise/12
2.5G    ./storeddata/Installed/ise/8
2.4G    ./storeddata/Installed/ise/9
2.3G    ./storeddata/Installed/ise/6
2.3G    ./storeddata/Installed/ise/12/backup
2.2G    ./storeddata/Installed/ise/3
2.1G    ./storeddata/Installed/ise/4
2.0G    ./storeddata/Installed/ise/8/backup
1.9G    ./storeddata/Installed/ise/9/backup
1.8G    ./storeddata/Installed/ise/6/backup
1.7G    ./storeddata/Installed/ise/4/backup
1.7G    ./storeddata/Installed/ise/3/backup
1.4G    ./TimesTen

Arne Bier · ‎07-14-2020

Very interesting. Makes my blood boil...

It's been a fact for a while that ISE 2.2 (and others) drag a lot of technical dept around with them and you can see all this garbage in the config backups (if you expand them with a tool like gpg). But during an upgrade you have the additional burden of all the legacy backups and patches ... this is the main reason I never upgrade. I bite the bullet and rebuild.

One day perhaps Cisco will add some cron jobs to ISE to clean up all the garbage that's left lying around. The way things are now, is probably great for someone who wants to perform a forensic investigation (what fun!!) but for most customers, who just want a smooth upgrade, it's a nightmare. I have to contrast the bad experience I have with ISE upgrades against other products and I can't think of a single instance where I would rather rebuild than upgrade.

I look forward to the day where I don't have to build/maintain another appliance like this (including Prime/ISE/etc) - this stuff should be running in the cloud by now - we should not have to care about this level of detail. Clear example is Meraki - they are encroaching on ISE territory with their Adaptive Policy. It's not as powerful as ISE (yet) but it's heading that way. It's a better outcome for customers. Looking under the hood to delete /opt files is how we operated in 1991.