10-14-2013 09:51 PM - edited 03-01-2019 11:18 AM
Hello,
A colleague of mine performed a UCS firmware some days ago and found an odd situation.
He was upgrading from UCS 2.0(1w) to 2.1(1f).
He followed the guide posted here, manual option.
He activated UCS Manager.
Then He updated the IOM firmware with the "Set Startup Version Only" option enabled as indicated in the guide. After this he lost access to the UCSM GUI and found the following errors in the logs:
2013 Oct 2 22:40:54 MCM-ACS-UCS01-A %$ VDC-1 %$ %SYSMGR-2-SERVICE_CRASHED: Service "Ldap Daemon" (PID 4825) hasn't caught signal 9 (no core).
2013 Oct 2 22:41:19 MCM-ACS-UCS01-A %$ VDC-1 %$ %SYSMGR-2-SERVICE_CRASHED: Service "PMon" (PID 4842) hasn't caught signal 9 (no core).
2013 Oct 2 22:41:33 MCM-ACS-UCS01-A %$ VDC-1 %$ %SYSMGR-2-SERVICE_CRASHED: Service "snmpd" (PID 4829) hasn't caught signal 11 (core will be saved).
2013 Oct 2 22:41:50 MCM-ACS-UCS01-A %$ VDC-1 %$ %SYSMGR-2-SERVICE_CRASHED: Service "snmpd" (PID 27869) hasn't caught signal 9 (no core).
2013 Oct 2 22:42:29 MCM-ACS-UCS01-A %$ VDC-1 %$ %UCSM-2-VERSION_INCOMPATIBLE: [F0430][critical][version-incompatible][sys/mgmt-entity-A] Fabric Interconnect A, management services, incompatible versions
2013 Oct 2 22:42:29 MCM-ACS-UCS01-A %$ VDC-1 %$ %UCSM-2-VERSION_INCOMPATIBLE: [F0430][critical][version-incompatible][sys/mgmt-entity-B] Fabric Interconnect B, management services, incompatible versions
2013 Oct 2 22:42:29 MCM-ACS-UCS01-A %$ VDC-1 %$ %UCSM-2-MANAGEMENT_SERVICES_UNRESPONSIVE: [F0452][critical][management-services-unresponsive][sys/mgmt-entity-B] Fabric Interconnect B, management services are unresponsive
2013 Oct 2 22:43:04 MCM-ACS-UCS01-A %$ VDC-1 %$ %UCSM-2-MANAGEMENT_SERVICES_FAILURE: [F0451][critical][management-services-failure][sys/mgmt-entity-B] Fabric Interconnect B, management services have failed
2013 Oct 2 22:43:04 MCM-ACS-UCS01-A %$ VDC-1 %$ %UCSM-2-VERSION_INCOMPATIBLE: [F0430][cleared][version-incompatible][sys/mgmt-entity-A] Fabric Interconnect A, management services, incompatible versions
2013 Oct 2 22:43:04 MCM-ACS-UCS01-A %$ VDC-1 %$ %UCSM-2-VERSION_INCOMPATIBLE: [F0430][cleared][version-incompatible][sys/mgmt-entity-B] Fabric Interconnect B, management services, incompatible versions
2013 Oct 2 22:43:04 MCM-ACS-UCS01-A %$ VDC-1 %$ %UCSM-2-MANAGEMENT_SERVICES_UNRESPONSIVE: [F0452][cleared][management-services-unresponsive][sys/mgmt-entity-B] Fabric Interconnect B, management services are unresponsive
2013 Oct 2 23:04:09 MCM-ACS-UCS01-A %$ VDC-1 %$ %PFMA-2-PFM_SYSTEM_RESET: Manual system restart from Command Line Interface
2013 Oct 2 23:04:11 MCM-ACS-UCS01-A %$ VDC-1 %$ %PFMA-2-FEX_STATUS: Fex 2 is offline
2013 Oct 2 23:04:11 MCM-ACS-UCS01-A %$ VDC-1 %$ %NOHMS-2-NOHMS_ENV_FEX_OFFLINE: FEX-2 Off-line (Serial Number QCI1548A020)
2013 Oct 2 23:04:11 MCM-ACS-UCS01-A %$ VDC-1 %$ %NOHMS-2-NOHMS_ENV_FEX_OFFLINE: FEX-1 Off-line (Serial Number QCI1547A0WQ)
2013 Oct 2 23:04:11 MCM-ACS-UCS01-A %$ VDC-1 %$ %PFMA-2-FEX_STATUS: Fex 1 is offline
2013 Oct 2 23:04:13 MCM-ACS-UCS01-A %$ VDC-1 %$ Oct 2 23:04:13 %KERN-0-SYSTEM_MSG: Shutdown Ports.. - kernel
2013 Oct 2 23:04:13 MCM-ACS-UCS01-A %$ VDC-1 %$ Oct 2 23:04:13 %KERN-0-SYSTEM_MSG: writing reset reason 9, - kernel
It seems that the FI were rebooted after hitting some condition. It seems that he only indicator we have is the "SERVICE_CRASHED" messages that occurred some minutes before the FI reset. I found in the 2.1 release notes that a similar condition causes a similar behavior (both FI reset) and is fixed in 2.1(2a)A.
Could the situation we faced be a variant of the bug?
Unfortunately, we don´t have any dump.
We have reviewed the logs several times and We couldn´t find any additional information of what happened to the FI, is there any specific file in the tech support bundle to look for additional information?
Any advice will be apprecaited.
Thanks.
10-14-2013 10:32 PM
Hello Gabriel,
Any chance you can connect to nxos a/b and run a "show system reset-reason" ? (that is the command from the top of my head and at this time of the day)
-Kenny
10-15-2013 08:01 AM
Thank you Kenny,
From the tech support file:
`show system reset-reason`
----- reset reason for Supervisor-module 1 (from Supervisor in slot 1) ---
1) At 59578 usecs after Wed Oct 2 23:04:19 2013
Reason: Reset Requested by CLI command reload
Service:
Version: 5.0(3)N2(2.1w)
2) No time
Reason: Unknown
Service:
Version: 5.0(3)N2(2.1w)
3) No time
Reason: Unknown
Service:
Version: 5.0(3)N2(2.1w)
4) At 215436 usecs after Thu Apr 12 09:38:19 2012
Reason: Reset Requested by CLI command reload
Service:
Version: 5.0(3)N2(2.1q)
The person performing the upgrade told me that he lost access to both FI, so there was no way someone could reset them from the CLI.
10-15-2013 08:26 AM
Gabriel,
Does Wed Oct 2 23:04:19 2013 match with the time the reboot took place?
Also, that is from one of the FI perspective, do you see the same from the other FI? You may specify it if you do type the FI you want to connect to either connect nxos a OR connect nxos b.
By checking the logs we can determine if the behavior was expected from the update, however looking at the message below, we might need to check the process in depth:
UCSM-2-MANAGEMENT_SERVICES_FAILURE: [F0451][critical][management-services-failure][sys/mgmt-entity-B] Fabric Interconnect B, management services have failed
I recommend you to open a TAC case so we can gather a show tech and analyze this further, otherwise I might end up asking for so many commands here in the community.
Now you are supposed to be able to open cases from this threads, try it with this one.
-Kenny
Cisco Support Community is also present in Spanish:
https://supportforums.cisco.com/community/spanish/data_center
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide