Troubleshooting: Error: Sync after switch version failed (from 8.x to 9.x)

Arundeep Nagaraj · ‎06-19-2013

Problem:

Upgrading and doing a switch version from 8.x to 9.x.

The switch version fails saying Error: Sync after switch version failed.

Logs:

Collect the install/upgrade logs and have a look at the point where it failed. The most important logs files in this are uccx-install.log and system-history.log.

Issue:

In 9.x, there is a minor change in the Informix IDS version which leads to a couple of problems during the switch version. The above error may also occur when there is a reboot during the middle of switch version which cannot be compromised and can lead to a lot of issues with the DB and may involve the system to be rebuilt. This is not supported by Cisco.

Root Cause:

There may be 3 causes for the above error:

1. CSCue38031 - This occurs when the database on tyhe inactive partition (9.x) is started after moving the dbspaces which leads to heuristic transactions.

Workaround: Contact TAC to remove these transactions from the DB.

uccx-install.log:

Validating chunks...succeeded

Initialize Async Log Flusher...succeeded

Starting B-tree Scanner...succeeded

Initializing DBSPACETEMP list...succeeded

oninit: Fatal error in shared memory initialization

WARNING: server initialization failed, or possibly timed out

shared memory not initialized for INFORMIXSERVER 'sdipccxprd01_uccx'

online.uccx.log (from partB):

04:48:07 Open transaction detected when changing log versions.

04:48:07 Cannot Rollforward from Checkpoint.

04:48:07 oninit: Fatal error in shared memory initialization

2. Timeout when starting the IDS on partB. This is documented in CSCuf48469

uccx-install.log

Initializing DBSPACETEMP list...succeeded

Checking database partition index...succeeded

Initializing dataskip structure...succeeded

Checking for temporary tables to drop...succeeded

Forking onmode_mon thread...succeeded

Creating periodic thread...succeeded

Verbose output complete: mode = 1

WARNING: server initialization failed, or possibly timed out (if -w was used).

Check the message log, online.log, for errors.

1809: Server rejected the connection.

online.uccx.log (from partB)

listener-thread: err = -27010: oserr = 0: errstr = : Only an administrative user or informix can connect in single user mode.

Workaround: Contact TAC for applying the patch

3. Reboot during the middle of switch version - The Switch version process takes a long time depending on the size of the Database. When I say size, it does not involve only the configuration, it involves the Historical DB as well. (CDS, HDS, RDS, ADS).

So we tend to reboot the server during the middle of switch verison which can corrupt the DB. The system will go for an automatic reboot after the switch version and thus wait for it to complete.

Workaround: Fastest method will be rebuilding the server and restoring from a DRS backup

I have seen it take 15 minutes to as huge as 8 hours!!!

sanotto · ‎07-16-2013

I tried to upgrade am UCCX HA cluster from 8.0(2)SU4 to 9.0(2)SU1 and got the "Error: Sync after switch version failed." when doing the switch version on the secondary/subscriber node. First node completed fine.

It seems like the outstanding errors on the install log for this case, do not match with any of the possible causes you mention in your post above. Below the errors I got during this process:

As a side note, I upgraded this same system in a test environment (isolated vlan, same database, just like 4 days older than the one in production, in which no calls were received during the upgrade) last week, and the switch version completed successfully on both nodes. Should I attempt to do this in production?, Could it be a possble cause of failure the fact that in production, the system were receiving a few calls on the IVR during the upgrade?

Would you please help in determine how can I fix this issue?. Please let me know if you need the complete install log and the online.uccx.log files.

I have a TAC case opened, but I havent got a response in three days now, on what happened and the best way forward.

Thanks,

*****************************************************************

.

12100000 Row(s) loaded so far to table contactcalldetail.

12200000 Row(s) loaded so far to table contactcalldetail.

Table contactcalldetail had 12206558 row(s) loaded into it.

Data Migration done ..

------------ Done ----------------

Sun Jul 14 01:17:56 CDT 2013 :: /opt/cisco/uccx/sql/rds_delta_802_to_803.sql is not available

Sun Jul 14 01:17:56 CDT 2013 :: /opt/cisco/uccx/sql/fds_delta_802_to_803.sql is not available

Sun Jul 14 01:17:56 CDT 2013 :: running file /opt/cisco/uccx/sql/cra_delta_803_to_804.sql

Applying the migration script: /opt/cisco/uccx/sql/cra_delta_803_to_804.sql

Sourcing IDS environment variables

Database selected for migration: db_cra

Removing the older migration command file: cmd_cra_delta_803_to_804_db_cra_20130714

old command file does not exist

alter table contactcalldetail add(dialinglistid int);

SQL here --- output to pipe cat without headings select distinct cdrserver from contactcalldetail

The table contactcalldetail has the replication set on it

SQL command used for unloading the data: unload to "/tmp/contactcalldetail.dat" delimiter ";" select * from contactcalldetail

Unload of the data from the table : contactcalldetail has succeeded

Truncating the table: contactcalldetail

Truncating the table contactcalldetail failed

Sun Jul 14 01:26:49 CDT 2013 :: Error updating schema

Sun Jul 14 01:26:49 CDT 2013 :: Stopping DB

Sun Jul 14 01:26:49 CDT 2013 :: ------Stopping uccx database-------

Sun Jul 14 01:27:03 CDT 2013 :: Waiting for port to be released