Just wanted to share my experience so far while upgrading to CUC 8.6 and the issues we faced and are still facing.
- Two physical CUC servers (pub/sub)
- Currently running 188.8.131.52016-1
- Upgrading to 184.108.40.20600-1 (8.6.1a)
We read through the release notes and found that our MCS servers (7845i2) were lacking the minimum 6GB of ram so we took them from 4GB to 8GB with some upgrade kits in the early morning hours before anything else was done. This took about 30-45 minutes due to shutdowns and power ups of each server separately. Once everything was back up we did some quick tests to make sure they were working as they were before.
At this point we forgot that we needed to install the COP file and actually started to just try to install the 8.6 upgrade on the publisher. The install started and seemed to run for about 30 minutes and then came back with an error saying we needed the refresh COP file installed before the upgrade could run. At this point we rebooted the publisher one more time as there was a warning that a reboot was needed to ensure all the service are back up properly. After this we installed the refresh upgrade COP file on both servers which took approximately an hour or so due to reboots. Once the servers were up we once again tested for functionality and everything seemed to be fine. We then started the actual upgrade to 8.6 again on the publisher and this time it went to completion and took about 2.5 hours.
This is where we ran into the issue that we worked on for the rest of the 12 hours we were there and forced us to switch back to the inactive partition...
The publisher came back up and we logged into it via ssh and waited for all the services to show that they were in the started state. We then tried to log into the web page and upon typing in our credentials we received an error in the top of the log on screen that said "database communication error" and we could not log in no matter what we tried local or LDAP credentials. We could however still log into the OS admin and any other page that needed the platform admin credentials. We also tried dialing into voicemail since the publisher seemed to take control back and any way we tried it would just return a fast busy.
We now called Cisco and opened a TAC case to figure out what was happening. After quite a few reboots on the publisher we were still in the same spot we started. While the publisher was down the secondary worked fine and was able to service voicemail calls. At one point we tried taking the secondary down and bringing the publisher up by it's self thinking it might be trying to communicate somehow, but it still had the same results. Our TAC engineer said that he believed this issues was due to a corruption in the database and the only suggested fix was a complete rebuild of the server and restore. I'm not sure how he came to this conclusion since we did not look at anything besides the logged ssh session we had while we were installing the upgrade which contained all success messages for each step.
As a last ditch effort since we were about 11 hours in we wanted to switch versions back to the inactive 7.1.3 partition to see if it would come up this way. We did this and it did come up and started working without any issues at all. This is where we left it for now.
My questions for all of you:
1. Has anyone else experienced anything like this?
2. The suggested fix is to rebuild the server with 7.1.3, then restore from backups, then go through upgrade process again. What doesn't make sense to me on this is that we would potentially be restoring the same exact data (which seems to have been corrupted by the install?) and then do the same exact things that caused our issues. How is this going to work if there are no changes in the process besides taking the time to rebuild and restoring the same exact data?
3. Would the rebuild/restore method clean up the databases any?
4. Any other thoughts or suggestions?
I tried a clean install of 8.6.1a onto a 7845H2 2333 over the weekend. I was upgrading from version 1.2.
Original HP server hardware not MCS
The install recognised the hardware, but Unity Connection wasn't presented as an install option.
It's interesting you mention 6gb of ram. The document I read stated 4gb but I might not have been looking at the latest.
The install identified the 4 * 146gb disks as 4 * 72gb which may also have been a factor.
I subsequently managed to install 8.0.3 without any problems.
The 6GB of RAM just became requirement for 8.6 anything prior to it only needed 4GB. That was specific requirements for our servers yours may be different.
Connection 8.6 and later only:
All MCS 7845 servers must contain 6 GB of RAM.
All MCS 7845 servers must contain four hard disks that are the same size and that are at least 146 GB each.
i can somewhat believe TAC saying you should start fresh. Do you know the history of this database? It could have been an old Unity Connection early version that was upgraded over time or even an early COBRAS that was imported from Unity that may have brought issues over... its hard to tell. I know in the past, I have have Unity Windows versions (completely different of course from yours) go sideways on an upgrade because the customer did not tell me this was restored or repair a couple years back. Turns out, it was hanging on by pins and needles and when I went to upgrade it, it blew up. (of course right?!)
It might be worth looking into to a restore and then checking the RTMT for any signs of database issues, although, you may never find it now. (or you uncovered a big TAC bug)
Yes, I do know the history of this server.
It is actually fairly new (2 years) and has never had any health issues before this upgrade issue. It was installed with the version it is still on now and there was no migration of old data as it was a band new install. This is the first time we are upgrading it.
My hesitation on doing a restore is that I will be restoring the same database that was backed up from this server. Won't I just be putting the same exact database back on the server or is there some sort of clean-up that occurs when it is restored? I would be using DRS to do the restore.