cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1266
Views
0
Helpful
6
Replies

Failed CSPC Upgrade to 2.8.1.2 and .4

Jazz
Level 1
Level 1

                My CSPC system has not collected anything since March when it failed the automatic upgrade to 2.8.1.2.  Not realizing that, I have tried to perform a manual upgrade to 2.8.1.4, but this has not gone well. 

  • The VM will not boot cleanly failing to maintenance mode.
  • Initially, the disk was full.  Logging in as root, I was able to delete old upgrade ZIP files, pare down large log files, and free almost 10 GB of space.
  • Booting still fails to maintenance mode.
  • Using Ctrl+D to continue, the VM appears to boot, but after 10-30 minutes, it seems to reload on its own to a new system.
  • SSH works, the web GUI works some, but anything to do with updating/upgrading locks up in the GUI.
  • During this time, both CPU’s are running at 40-70% constantly, but a listing of processes using top (as root) does not show enough CPU in the listed processes to make the math work.
  • When trying to manually install the upgrade, it errors with “adminshell service is not up , please try installation after some time”.
  • After leaving the system in this state overnight (at least 15 hours), the CPU is still running high without an identifiable process in top, and the adminshell is still not running.

Is there any way to fix this current installation such that it works as expected?

If I need to start from scratch with a new image, is there any way to save the configuration from the existing installation first such that it may be applied to the new installation and avoid the long process of configuring everything?

 

Thank you for your help.

 

Sincerely,

 

Michael

6 Replies 6

adias
Cisco Employee
Cisco Employee

Hello Michael,

 

let me get some data on it to see if can  be helped

on the maintenance mode have you run the check disk command ?

 

From root send the following commands:

 

# ls  -larth /var/log

# df -k

# ps -ef

# tail - 50 /opt/LCM/logs/install

 

 

 

adias
Cisco Employee
Cisco Employee

Hello Michael,

 

let me get some data on it to see if can  be helped

on the maintenance mode have you run the check disk command ?

 

From root send the following commands:

 

.# ls  -larth /var/log

.# df -f

.# ps -ef

.# tail - 50 /opt/LCM/logs/install

 

 

 

No, I have not run an fsck.  I guess I should have thought of that one after I created some free space.

Here is the putty log:

=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2019.07.18 10:29:36 =~=~=~=~=~=~=~=~=~=~=~=

ls -larth /var/log
total 1.3G
drwx------+ 2 root root 4.0K Mar 22 2017 aide
drwxr-xr-x. 18 root root 4.0K Mar 19 2018 ..
drwx------+ 4 root root 4.0K Jun 19 2018 samba
drwxr-xr-x+ 2 ntp ntp 4.0K Dec 19 2018 ntpstats
-rw-r-x---+ 1 root root 12K Feb 11 08:14 yum.log-20190717
-rw-r-x---+ 1 root root 0 Jul 17 03:07 spooler-20190718
-rw-r-x---+ 1 root root 0 Jul 17 03:07 unused.log-20190718
-rw-r-xr--+ 1 root root 72K Jul 17 12:35 dmesg.old
-rw-r-x---+ 1 root utmp 768 Jul 17 12:50 btmp-20190717
-rw-r-xr--+ 1 root utmp 34K Jul 17 12:50 wtmp-20190717
-rw-r-x---+ 1 root utmp 0 Jul 17 13:02 btmp
-rw-r-----+ 1 root root 197K Jul 17 14:08 dracut.log-20190717
-rw-r-x---+ 1 root root 15K Jul 17 14:11 secure.old
-rw-r-----+ 1 root root 197K Jul 17 14:12 dracut.log
-rw-r-x---+ 1 root root 23K Jul 17 14:27 yum.log
-rw-r--r--+ 1 root root 72K Jul 17 14:45 dmesg
-rw-r-----+ 1 root root 32K Jul 17 14:45 vmware-vgauthsvc.log.0
-rw-r--r--+ 2 root root 22K Jul 17 14:57 boot.log-20190718
-rw-r-x---+ 1 root root 4.2K Jul 17 16:01 maillog-20190718
-rw-r-x---+ 1 root root 20K Jul 18 03:32 kern.log-20190718.gz
-rw-r-x---+ 1 root root 7.3K Jul 18 03:32 syslog-20190718
-rw-r-x---+ 1 root root 9.7M Jul 18 03:33 daemon.log-20190718
-rw-r-----+ 1 root root 83K Jul 18 03:33 secure-20190718
-rw-r-x---+ 1 root root 90K Jul 18 03:33 cron-20190718
-rw-r-x---+ 1 root root 102M Jul 18 03:33 messages-20190718.gz
-rw-r-----+ 1 root root 0 Jul 18 03:33 spooler
-rw-r-----+ 1 root root 0 Jul 18 03:33 boot.log
-rw-r-----+ 1 root root 0 Jul 18 03:33 unused.log
drwxr-xr-x+ 6 root root 4.0K Jul 18 03:33 .
-rw-r-----+ 1 root root 620 Jul 18 03:33 syslog
-rw-r-----+ 1 root root 590 Jul 18 04:05 maillog
-rw-r-----+ 1 root root 136K Jul 18 07:39 vmware-vmsvc.log
-rw-r-----+ 1 root root 284 Jul 18 10:24 kern.log
-rw-rw-r--+ 1 root utmp 9.7M Jul 18 10:24 wtmp
-rw-r--r--+ 1 root root 144K Jul 18 10:24 lastlog
-rw-r-----+ 1 root root 1.3M Jul 18 10:24 tallylog
-rw-r-----+ 1 root root 48K Jul 18 10:29 secure
-rw-r-----+ 1 root root 48K Jul 18 10:29 cron
-rw-r-----+ 1 root root 8.6K Jul 18 10:29 daemon.log
drwxr-x---+ 2 root casusers 4.0K Jul 18 10:29 audit
-rw-r-x---+ 1 root root 1.2G Jul 18 10:29 messages
[root@cspc ~]#
[root@cspc ~]#
[root@cspc ~]#
[root@cspc ~]# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup-lv_root
32549752 26963868 3925776 88% /
tmpfs 1476020 0 1476020 0% /dev/shm
/dev/sda1 487652 53253 408799 12% /boot
[root@cspc ~]#
[root@cspc ~]#
[root@cspc ~]#
[root@cspc ~]# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Jul17 ? 00:00:18 /sbin/init
root 2 0 0 Jul17 ? 00:00:00 [kthreadd]
root 3 2 0 Jul17 ? 00:00:04 [migration/0]
root 4 2 0 Jul17 ? 00:00:00 [ksoftirqd/0]
root 5 2 0 Jul17 ? 00:00:00 [stopper/0]
root 6 2 0 Jul17 ? 00:00:00 [watchdog/0]
root 7 2 0 Jul17 ? 00:00:04 [migration/1]
root 8 2 0 Jul17 ? 00:00:00 [stopper/1]
root 9 2 0 Jul17 ? 00:00:00 [ksoftirqd/1]
root 10 2 0 Jul17 ? 00:00:00 [watchdog/1]
root 11 2 0 Jul17 ? 00:00:37 [events/0]
root 12 2 0 Jul17 ? 00:00:22 [events/1]
root 13 2 0 Jul17 ? 00:00:00 [events/0]
root 14 2 0 Jul17 ? 00:00:00 [events/1]
root 15 2 0 Jul17 ? 00:00:00 [events_long/0]
root 16 2 0 Jul17 ? 00:00:00 [events_long/1]
root 17 2 0 Jul17 ? 00:00:00 [events_power_ef]
root 18 2 0 Jul17 ? 00:00:00 [events_power_ef]
root 19 2 0 Jul17 ? 00:00:00 [cgroup]
root 20 2 0 Jul17 ? 00:00:00 [khelper]
root 21 2 0 Jul17 ? 00:00:00 [netns]
root 22 2 0 Jul17 ? 00:00:00 [async/mgr]
root 23 2 0 Jul17 ? 00:00:00 [pm]
root 24 2 0 Jul17 ? 00:00:00 [sync_supers]
root 25 2 0 Jul17 ? 00:00:00 [bdi-default]
root 26 2 0 Jul17 ? 00:00:00 [kintegrityd/0]
root 27 2 0 Jul17 ? 00:00:00 [kintegrityd/1]
root 28 2 0 Jul17 ? 00:00:06 [kblockd/0]
root 29 2 0 Jul17 ? 00:00:00 [kblockd/1]
root 30 2 0 Jul17 ? 00:00:00 [kacpid]
root 31 2 0 Jul17 ? 00:00:00 [kacpi_notify]
root 32 2 0 Jul17 ? 00:00:00 [kacpi_hotplug]
root 33 2 0 Jul17 ? 00:00:00 [ata_aux]
root 34 2 0 Jul17 ? 00:00:00 [ata_sff/0]
root 35 2 0 Jul17 ? 00:00:00 [ata_sff/1]
root 36 2 0 Jul17 ? 00:00:00 [ksuspend_usbd]
root 37 2 0 Jul17 ? 00:00:00 [khubd]
root 38 2 0 Jul17 ? 00:00:00 [kseriod]
root 39 2 0 Jul17 ? 00:00:00 [md/0]
root 40 2 0 Jul17 ? 00:00:00 [md/1]
root 41 2 0 Jul17 ? 00:00:00 [md_misc/0]
root 42 2 0 Jul17 ? 00:00:00 [md_misc/1]
root 43 2 0 Jul17 ? 00:00:00 [linkwatch]
root 46 2 0 Jul17 ? 00:00:00 [khungtaskd]
root 47 2 0 Jul17 ? 00:00:00 [lru-add-drain/0]
root 48 2 0 Jul17 ? 00:00:00 [lru-add-drain/1]
root 49 2 0 Jul17 ? 00:00:02 [kswapd0]
root 50 2 0 Jul17 ? 00:00:00 [ksmd]
root 51 2 0 Jul17 ? 00:00:00 [khugepaged]
root 52 2 0 Jul17 ? 00:00:00 [aio/0]
root 53 2 0 Jul17 ? 00:00:00 [aio/1]
root 54 2 0 Jul17 ? 00:00:00 [crypto/0]
root 55 2 0 Jul17 ? 00:00:00 [crypto/1]
root 62 2 0 Jul17 ? 00:00:00 [kthrotld/0]
root 63 2 0 Jul17 ? 00:00:00 [kthrotld/1]
root 64 2 0 Jul17 ? 00:00:00 [pciehpd]
root 66 2 0 Jul17 ? 00:00:00 [kpsmoused]
root 67 2 0 Jul17 ? 00:00:00 [usbhid_resumer]
root 68 2 0 Jul17 ? 00:00:00 [deferwq]
root 101 2 0 Jul17 ? 00:00:00 [kdmremove]
root 102 2 0 Jul17 ? 00:00:00 [kstriped]
root 135 2 0 Jul17 ? 00:00:00 [ttm_swap]
root 280 2 0 Jul17 ? 00:00:00 [scsi_eh_0]
root 281 2 0 Jul17 ? 00:00:00 [scsi_eh_1]
root 352 2 0 Jul17 ? 00:00:00 [mpt_poll_0]
root 353 2 0 Jul17 ? 00:00:00 [mpt/0]
root 354 2 0 Jul17 ? 00:00:00 [scsi_eh_2]
root 388 2 0 Jul17 ? 00:00:00 [kdmflush]
root 390 2 0 Jul17 ? 00:00:00 [kdmflush]
root 408 2 0 Jul17 ? 00:01:05 [jbd2/dm-0-8]
root 409 2 0 Jul17 ? 00:00:00 [ext4-dio-unwrit]
root 499 1 0 Jul17 ? 00:00:00 /sbin/udevd -d
root 654 2 0 Jul17 ? 00:00:00 [vmmemctl]
root 806 499 0 Jul17 ? 00:00:00 /sbin/udevd -d
root 814 499 0 Jul17 ? 00:00:00 /sbin/udevd -d
root 839 2 0 Jul17 ? 00:00:00 [jbd2/sda1-8]
root 840 2 0 Jul17 ? 00:00:00 [ext4-dio-unwrit]
root 864 2 0 Jul17 ? 00:00:05 [flush-253:0]
root 879 2 0 Jul17 ? 00:00:54 [kauditd]
root 1118 1 0 Jul17 ? 00:00:41 /usr/sbin/vmtoolsd
root 1130 1 0 Jul17 ? 00:00:00 /usr/lib/vmware-vgauth/VGAuthSer
rpcuser 1325 1 0 Jul17 ? 00:00:00 rpc.statd
ntp 1389 1 0 Jul17 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd
501 4152 23619 0 09:07 tty1 00:00:07 top
root 6280 1 0 Jul17 ? 00:01:38 /bin/bash /opt/LCM/bin/Resume.sh
root 7486 6280 0 10:29 ? 00:00:00 /bin/sh /opt/LCM/bin/sqliteToDer
root 7490 7486 0 10:29 ? 00:00:00 /bin/sh /opt/LCM/bin/sqliteToDer
root 7491 7490 0 10:29 ? 00:00:00 /usr/bin/expect /opt/LCM/bin/db.
root 7492 7490 0 10:29 ? 00:00:00 grep -vE ij>|ij version|spawn
root 7493 7490 0 10:29 ? 00:00:00 sed 1,2d
root 7494 7490 0 10:29 ? 00:00:00 head -n -2
root 7495 7491 28 10:29 pts/0 00:00:00 java org.apache.derby.tools.ij
root 7509 12635 0 10:29 pts/1 00:00:00 ps -ef
casuser 8036 1 0 Jul17 ? 00:01:13 /opt/cisco/ss/adminshell/applica
root 8169 1 0 Jul17 ? 00:00:00 /sbin/dhclient -H cspc -1 -q -lf
root 8230 1 0 Jul17 ? 00:04:38 auditd
root 8232 8230 0 Jul17 ? 00:05:15 /sbin/audispd
rpc 8308 1 0 Jul17 ? 00:00:00 rpcbind
root 8392 1 0 Jul17 ? 00:00:00 /usr/sbin/sshd
root 8397 1 0 Jul17 ? 00:00:01 crond
root 8413 1 0 Jul17 ? 00:00:00 /usr/sbin/atd
casuser 8446 1 0 Jul17 ? 00:00:55 /opt/java/jre/bin/java -Djetty.h
root 9024 8392 0 10:24 ? 00:00:00 sshd: collectorlogin [priv]
root 11002 1 0 Jul17 ? 00:00:00 su -
root 11541 11002 0 Jul17 ? 00:00:00 -bash
501 11565 9024 0 10:24 ? 00:00:00 sshd: collectorlogin@pts/1
501 11566 11565 0 10:24 pts/1 00:00:00 -bash
casuser 11686 1 0 Jul17 ? 00:01:09 /opt/java/jre/bin/java -Djava.li
root 11816 1 0 Jul17 ? 00:00:00 /usr/libexec/postfix/master
postfix 11838 11816 0 Jul17 ? 00:00:00 qmgr -l -t fifo -u
root 11859 11566 0 10:24 pts/1 00:00:00 su -
root 12635 11859 0 10:24 pts/1 00:00:00 -bash
root 20771 1 0 Jul17 ? 00:00:00 login -- collectorlogin
root 21832 1 0 03:33 ? 00:01:30 /sbin/rsyslogd -i /var/run/syslo
501 23619 20771 0 08:47 tty1 00:00:00 -bash
postfix 23766 11816 0 10:21 ? 00:00:00 pickup -l -t fifo -u
root 26223 11541 0 Jul17 ? 00:00:00 tail -f /opt/LCM/logs/apply
[root@cspc ~]#
[root@cspc ~]#
[root@cspc ~]#
[root@cspc ~]# tail -50 /opt/LCM/logs/install
inflating: /opt/LCM/bin/autoupdate-init.exp
inflating: /opt/LCM/bin/cmd-exec.tcl
inflating: /opt/LCM/bin/decryptor.sh
inflating: /opt/LCM/bin/downloadResultUpdate.sh
inflating: /opt/LCM/bin/getDependencyList.sh
inflating: /opt/LCM/bin/ida-download.sh
inflating: /opt/LCM/bin/lcmagent-apply-check.tcl
inflating: /opt/LCM/bin/lcmagent-apply-count.tcl
inflating: /opt/LCM/bin/lcmagent-apply.tcl
inflating: /opt/LCM/bin/lcmagent-apply-wrapper.tcl
inflating: /opt/LCM/bin/lcmagent-download.tcl
inflating: /opt/LCM/bin/lcmagent-system.tcl
inflating: /opt/LCM/bin/LCMDecryptor.jar
inflating: /opt/LCM/bin/packaging.sh
inflating: /opt/LCM/bin/restart.sh
inflating: /opt/LCM/bin/Resume.sh
inflating: /opt/LCM/bin/status_update.sh
inflating: /opt/LCM/bin/updateApplyCheckData.sh
inflating: /opt/LCM/bin/updator.sh
lcmuptime.sh cronjob doesnot exit , so creating lcmuptime.sh cronjob
remotelogin exists!
removing remotelogin from ssh config
============================
Rebooting the appliance now
============================

============================================================================
Finished execution of post-install scripts in package jeos-30.1.1-4-lnx64 ...
============================================================================

INSTALLER: Tail-end Gateway installed successfully!!
INSTALLER: Starting the Tail-end Gateway with the command '/sbin/service concsotgw start'

============================================================================
Started execution of pre-install scripts in package sp-30.1.1-4-0-lnx64 ...
============================================================================

Inside preinstall script
Fetching Version info
adminshell service is not up , please try installation after some time
use 'service adminshell status' command to check the status of the adminshell service

============================================================================
Started execution of pre-install scripts in package sp-30.1.1-4-0-lnx64 ...
============================================================================

Inside preinstall script
Fetching Version info
adminshell service is not up , please try installation after some time
use 'service adminshell status' command to check the status of the adminshell service
[root@cspc ~]#

adias
Cisco Employee
Cisco Employee

thank you for your detailed Information.

 

based on the output - You have deployed the tiny version of the collector which is only using for testing ( just a few devices and not ongoing process. so when you add the process for upgrade the box cannot handle it if it was "sort" of working with the tiny version.

please let me know what version of Vmware ESxi host you are running and please do the command below and let me know

# free -m

 

if it shows only 2 gig of mem then you do have the tiny version and not enough overall resources to run an official SNTC CSPC
Please confirm and let me know.

smallcspc.JPG

Hmm, that could be, but it looks like I have 3 GB of RAM, only 2 CPU's and a 40 GB HD.

[root@cspc ~]# free -m
total used free shared buffers cached
Mem: 3011 2726 285 0 45 175
-/+ buffers/cache: 2505 505
Swap: 8031 24 8007
[root@cspc ~]#

I also found that MySQL would not launch until I deleted the zero byte tc.log file.

adias
Cisco Employee
Cisco Employee

 

 

       Total memory is showing less than the required.  So my recommendation is to redeploy to the "small" version of the collector. Because even if you fix it to work temporarily this problem will reoccur with frequency,  I did not see your response of the Vmware ESxi host software version.

     I think for  a deployment the only thing you need essentially to export is the credentials and the list of managed devices  then you should be able to redeploy.  Just in case just power off the original redeploy a new image and if needed to return to the old one you can.  If not after the redeploy you can delete it.  the selection of the resources  of the appliance occur on the same file install for all types.

export device list : page 8 chapter 3

export credentials: page 2 chapter 6  (is not on the headers but the button is just beside the import one)

https://www.cisco.com/c/dam/en/us/support/docs/cloud-systems-management/common-services-platform-collector-cspc/CSPC-User-Guide.pdf