01-30-2018 04:31 AM - edited 03-04-2019 02:27 AM
Hello! We have 2 x UCS-FI-6248UP + 1 x Cisco UCS 5108 Chassis (7 Servers) with 2 x Nexus 5548UP Switches with L3 Modules (Data Ceter Routet Access Scheme).
In UCS Manager I see FI A with state "inapplicable" and Fi B is currently "primary".
Using CLI (connect local-mgmt A context) I can see:
NSX6248UP1-A(local-mgmt)# show cluster extended-state Cluster Id: 0x5a479a88bbd711e2-0xbab0002a6a07b044 Start time: Tue Jan 30 11:42:09 2018 Last election time: Tue Jan 30 11:46:41 2018 A: UP, INAPPLICABLE, (Management services: DOWN) B: UP, PRIMARY A: memb state UP, lead state INAPPLICABLE, mgmt services state: DOWN B: memb state UP, lead state PRIMARY, mgmt services state: UP heartbeat state PRIMARY_OK INTERNAL NETWORK INTERFACES: eth1, UP eth2, UP HA NOT READY Management services are unresponsive on local Fabric Interconnect Waiting for response from device. Device count, expected: 1, active: 0 Detailed state of the device selected for HA storage: Chassis 1, serial: FOX1702GT4F, state: inactive Fabric A, Unable to connect to local chassis-shared-storage management interface : FOX1702GT4F Warning: there are pending management I/O errors on one or more devices, failover may not complete
Additional pmon data:
NSX6248UP1-A(local-mgmt)# show pmon state SERVICE NAME STATE RETRY(MAX) EXITCODE SIGNAL CORE ------------ ----- ---------- -------- ------ ---- svc_sam_controller running 0(4) 0 0 no svc_sam_dme failed 5(4) 0 6 yes svc_sam_dcosAG running 0(4) 0 0 no svc_sam_bladeAG running 0(4) 0 0 no svc_sam_portAG running 0(4) 0 0 no svc_sam_statsAG running 0(4) 0 0 no svc_sam_hostagentAG running 0(4) 0 0 no svc_sam_nicAG running 0(4) 0 0 no svc_sam_licenseAG running 0(4) 0 0 no svc_sam_extvmmAG running 0(4) 0 0 no httpd.sh running 0(4) 0 0 no httpd_cimc.sh running 0(4) 0 0 no svc_sam_sessionmgrAG running 0(4) 0 0 no svc_sam_pamProxy running 0(4) 0 0 no dhcpd running 0(4) 0 0 no sam_core_mon running 0(4) 0 0 no svc_sam_rsdAG running 0(4) 0 0 no svc_sam_svcmonAG running 0(4) 0 0 no
And there is message "ERROR: MGMT partition has unrecoverable error" during boot.
Can not open TAC without service subscription.
Thanks a lot for any help.
Solved! Go to Solution.
01-30-2018 03:50 PM
Solved by full rebuild of FI A (Reference steps from https://supportforums.cisco.com/t5/data-center-documents/how-to-recover-from-a-software-failure-on-the-6120-fabric/ta-p/3121751).
Some tips:
1. Downloaded infrastructure firmware ucs-k9-bundle-infra.3.2.2d.A.bin (need exactly the same version as good FI to join cluster after rebuild) and extracted files (7-zip):
ucs-6100-k9-kickstart.5.0.3.N2.3.22c.bin
ucs-6100-k9-system.5.0.3.N2.3.22c.bin
ucs-manager-k9.3.2.2d.bin
2. After booting kickstart.bin from tftp used init system command (very destructive, especially if any licenses) to re-init file systems.
3. Copy all bin files to bootflash, copy/rename ucs-manager-k9.3.2.2d.bin to nuova-sim-mgmt-nsg.0.1.0.001.bin (special fixed name). Used Open TFTP Server (https://sourceforge.net/projects/tftp-server).
4. Need set boot version in UCS Manager for FI A or next boot will boot loader.
Some quick steps for copy-past for my verson and tftp ip here:
... reboot FI ... NSX6248UP1-A# connect local-mgmt A NSX6248UP1-A(local-mgmt)# reboot Boot into bootloader (when system begin boot): CTRL+L ... loader> set ip 10.1.253.241 255.255.255.0 loader> set gw 10.1.253.1 loader> boot tftp://10.3.11.68/ucs-6100-k9-kickstart.5.0.3.N2.3.22c.bin switch(boot)# init system switch(boot)# conf terminal switch(boot)(config)# interface mgmt 0 switch(boot)(config-if)# ip address 10.1.253.241 255.255.255.0 switch(boot)(config-if)# no shut switch(boot)(config-if)# exit switch(boot)(config)# ip default-gateway 10.1.253.1 switch(boot)(config)# exit switch(boot)# copy tftp://10.3.11.68/ucs-6100-k9-kickstart.5.0.3.N2.3.22c.bin bootflash: switch(boot)# copy tftp://10.3.11.68/ucs-6100-k9-system.5.0.3.N2.3.22c.bin bootflash: switch(boot)# copy tftp://10.3.11.68/ucs-manager-k9.3.2.2d.bin bootflash: switch(boot)# copy bootflash:ucs-manager-k9.3.2.2d.bin bootflash:nuova-sim-mgmt-nsg.0.1.0.001.bin switch(boot)# exit ... rebootint to loader ... loader> boot ucs-6100-k9-kickstart.5.0.3.N2.3.22c.bin ucs-6100-k9-system.5.0.3.N2.3.22c.bin ... wait init and complete initial config to rejoin cluster ... ... set boot version for FI in UCS Manager ...
01-30-2018 04:34 AM
You can attempt a reboot of the FI or a restart of pmon services from the CLI in an effort to recover FI-A. Unfortunately, if your management database has been corrupted or needs to be recovered, it would require a TAC case to load a debug image to conduct the repair.
01-30-2018 04:38 AM
01-30-2018 03:50 PM
Solved by full rebuild of FI A (Reference steps from https://supportforums.cisco.com/t5/data-center-documents/how-to-recover-from-a-software-failure-on-the-6120-fabric/ta-p/3121751).
Some tips:
1. Downloaded infrastructure firmware ucs-k9-bundle-infra.3.2.2d.A.bin (need exactly the same version as good FI to join cluster after rebuild) and extracted files (7-zip):
ucs-6100-k9-kickstart.5.0.3.N2.3.22c.bin
ucs-6100-k9-system.5.0.3.N2.3.22c.bin
ucs-manager-k9.3.2.2d.bin
2. After booting kickstart.bin from tftp used init system command (very destructive, especially if any licenses) to re-init file systems.
3. Copy all bin files to bootflash, copy/rename ucs-manager-k9.3.2.2d.bin to nuova-sim-mgmt-nsg.0.1.0.001.bin (special fixed name). Used Open TFTP Server (https://sourceforge.net/projects/tftp-server).
4. Need set boot version in UCS Manager for FI A or next boot will boot loader.
Some quick steps for copy-past for my verson and tftp ip here:
... reboot FI ... NSX6248UP1-A# connect local-mgmt A NSX6248UP1-A(local-mgmt)# reboot Boot into bootloader (when system begin boot): CTRL+L ... loader> set ip 10.1.253.241 255.255.255.0 loader> set gw 10.1.253.1 loader> boot tftp://10.3.11.68/ucs-6100-k9-kickstart.5.0.3.N2.3.22c.bin switch(boot)# init system switch(boot)# conf terminal switch(boot)(config)# interface mgmt 0 switch(boot)(config-if)# ip address 10.1.253.241 255.255.255.0 switch(boot)(config-if)# no shut switch(boot)(config-if)# exit switch(boot)(config)# ip default-gateway 10.1.253.1 switch(boot)(config)# exit switch(boot)# copy tftp://10.3.11.68/ucs-6100-k9-kickstart.5.0.3.N2.3.22c.bin bootflash: switch(boot)# copy tftp://10.3.11.68/ucs-6100-k9-system.5.0.3.N2.3.22c.bin bootflash: switch(boot)# copy tftp://10.3.11.68/ucs-manager-k9.3.2.2d.bin bootflash: switch(boot)# copy bootflash:ucs-manager-k9.3.2.2d.bin bootflash:nuova-sim-mgmt-nsg.0.1.0.001.bin switch(boot)# exit ... rebootint to loader ... loader> boot ucs-6100-k9-kickstart.5.0.3.N2.3.22c.bin ucs-6100-k9-system.5.0.3.N2.3.22c.bin ... wait init and complete initial config to rejoin cluster ... ... set boot version for FI in UCS Manager ...
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide