It seems that FI didn't crash

Walter Dey · ‎06-29-2015

Customer running 2.2.1b; removed NTP server in the configuration because of the approaching leap issue.

Then, UCS Manager crashed ? check of NX-OS up time of both FI showed no reboot; therefore it seems only UCSM !

Q. is this normal behaviour ?

Q. has this seen before ?

Keny Perez · ‎06-29-2015

Did you see any interruption of the services?

Do you see any crash/core when going to the "local mgmt" and you do a "show pmon state"?

Any useful info from the "show log log" & "show nvram" in NXOS?

-Kenny

Walter Dey · ‎07-01-2015

It seems that FI didn't crash; I have no access to the systems. From that point of view, it would be cosmetic bug only ! No time to open a TAC case.

https://tools.cisco.com/bugsearch/bug/CSCus83447

doesn't mention anything ?

2015 Jun 29 16:19:48 ch01u201-A %UCSM-2-MANAGEMENT_SERVICES_FAILURE: [F0451][critical][management-services-failure][sys/mgmt-entity-B] Fabric Interconnect B, management services have failed

2015 Jun 29 16:20:12 ch01u201-A %UCSM-2-MANAGEMENT_SERVICES_FAILURE: [F0451][cleared][management-services-failure][sys/mgmt-entity-B] Fabric Interconnect B, management services have failed

2015 Jun 29 16:20:48 ch01u201-A %UCSM-2-MANAGEMENT_SERVICES_UNRESPONSIVE: [F0452][critical][management-services-unresponsive][sys/mgmt-entity-B] Fabric Interconnect B, management services are unresponsive

2015 Jun 29 16:21:08 ch01u201-A %UCSM-2-MANAGEMENT_SERVICES_UNRESPONSIVE: [F0452][cleared][management-services-unresponsive][sys/mgmt-entity-B] Fabric Interconnect B, management services are unresponsive

show pmon state hat keine crashes/cores, auf B aber zwei Signal 15:

ch01u201-B(local-mgmt)# show pmon state

SERVICE NAME STATE RETRY(MAX) EXITCODE SIGNAL CORE

------------ ----- ---------- -------- ------ ----

svc_sam_controller running 1(4) 0 15 no

svc_sam_dme running 1(4) 0 15 no

Keny Perez · ‎07-01-2015

Hard to tell Walter... you would need to check on the sam_dme file in UCSM tech support but what you see there will be just too much.. you will need to know the specific time and then see if it makes sense according to the behavior seen (hoping the logs are not rolled over)

-Kenny

ssumichrast · ‎07-01-2015

Interesting. I was configuring our domains for UCS Central last week and was adjusting their NTP servers. When I changed the NTP servers their time drifted and I lost connection to UCS-M. I could see a time drift causing that potentially, but it sure scared the crap out of me when I thought my FIs had reloaded. Luckily it was just UCS-M (I have to assume it's time sensitive for replication).

Keny Perez · ‎07-02-2015

In my little experience with Central, I have seen how Central and NTP are really sensitive together (I want to stress that is based on my very little experience with UCSC)

-Kenny

ssumichrast · ‎07-02-2015

Yes which is why I was fixing our ntp. UCSM did not work with our round robin DNS entry for NTP so I was setting then to some IP addresses instead. Our domains were unable to join UCS Central becuse the time was just different enough -- I'm talking less than 30 seconds.

When I removed the DNS entry and then supplied the IPs UCSM saved the config and then crashed about 20 seconds later, probably because of a clock adjustment. I don't think UCSM likes huge clock changes, which is understandable for the database synchronization.

Keny Perez · ‎07-02-2015

Gotcha, thanks for the feedback!

-Kenny

Leap second: UCSM crash after NTP removal