04-12-2020 12:21 AM
Hello,
On our ASR 9922, we see the following alarms from time to time on different Delta V3 PSUs. These alarms seem to quickly clear themselves 5 seconds later before any of us could log into the router and run the 'adm show env power' to actually verify accuracy of the alarm.
0/RP0/ADMIN0:Mar 18 14:36:07.485 EDT: envmon[4316]: %PKT_INFRA-FM-3-FAULT_MAJOR : ALARM_MAJOR :Power Module Output Disabled (PM_OUTPUT_EN_PIN_HI) :DECLARE :0/PT2-PM2: Power tray 2 power module 2 is under HW_OUTPUT_DISABLED condition. 0/RP0/ADMIN0:Mar 18 14:36:12.567 EDT: envmon[4316]: %PKT_INFRA-FM-3-FAULT_MAJOR : ALARM_MAJOR :Power Module Output Disabled (PM_OUTPUT_EN_PIN_HI) :CLEAR :0/PT2-PM2: Power tray 2 power module 2 condition HW_OUTPUT_DISABLED is cleared. 0/RP0/ADMIN0:Mar 20 03:09:16.662 EDT: envmon[4316]: %PKT_INFRA-FM-3-FAULT_MAJOR : ALARM_MAJOR :Power Module Output Disabled (PM_OUTPUT_EN_PIN_HI) :DECLARE :0/PT1-PM0: Power tray 1 power module 0 is under HW_OUTPUT_DISABLED condition. 0/RP0/ADMIN0:Mar 20 03:09:21.757 EDT: envmon[4316]: %PKT_INFRA-FM-3-FAULT_MAJOR : ALARM_MAJOR :Power Module Output Disabled (PM_OUTPUT_EN_PIN_HI) :CLEAR :0/PT1-PM0: Power tray 1 power module 0 condition HW_OUTPUT_DISABLED is cleared. 0/RP0/ADMIN0:Mar 21 02:31:35.212 EDT: envmon[4316]: %PKT_INFRA-FM-3-FAULT_MAJOR : ALARM_MAJOR :Power Module Output Disabled (PM_OUTPUT_EN_PIN_HI) :DECLARE :0/PT1-PM0: Power tray 1 power module 0 is under HW_OUTPUT_DISABLED condition. 0/RP0/ADMIN0:Mar 21 02:31:40.354 EDT: envmon[4316]: %PKT_INFRA-FM-3-FAULT_MAJOR : ALARM_MAJOR :Power Module Output Disabled (PM_OUTPUT_EN_PIN_HI) :CLEAR :0/PT1-PM0: Power tray 1 power module 0 condition HW_OUTPUT_DISABLED is cleared. 0/RP0/ADMIN0:Mar 22 20:22:07.844 EDT: envmon[4316]: %PKT_INFRA-FM-3-FAULT_MAJOR : ALARM_MAJOR :Power Module Output Disabled (PM_OUTPUT_EN_PIN_HI) :DECLARE :0/PT1-PM0: Power tray 1 power module 0 is under HW_OUTPUT_DISABLED condition. 0/RP0/ADMIN0:Mar 22 20:22:12.929 EDT: envmon[4316]: %PKT_INFRA-FM-3-FAULT_MAJOR : ALARM_MAJOR :Power Module Output Disabled (PM_OUTPUT_EN_PIN_HI) :CLEAR :0/PT1-PM0: Power tray 1 power module 0 condition HW_OUTPUT_DISABLED is cleared. 0/RP0/ADMIN0:Mar 23 08:13:29.090 EDT: envmon[4316]: %PKT_INFRA-FM-3-FAULT_MAJOR : ALARM_MAJOR :Power Module Output Disabled (PM_OUTPUT_EN_PIN_HI) :DECLARE :0/PT1-PM0: Power tray 1 power module 0 is under HW_OUTPUT_DISABLED condition. 0/RP0/ADMIN0:Mar 23 08:13:34.171 EDT: envmon[4316]: %PKT_INFRA-FM-3-FAULT_MAJOR : ALARM_MAJOR :Power Module Output Disabled (PM_OUTPUT_EN_PIN_HI) :CLEAR :0/PT1-PM0: Power tray 1 power module 0 condition HW_OUTPUT_DISABLED is cleared. 0/RP0/ADMIN0:Mar 30 11:25:29.879 EDT: envmon[4316]: %PKT_INFRA-FM-3-FAULT_MAJOR : ALARM_MAJOR :Power Module Output Disabled (PM_OUTPUT_EN_PIN_HI) :DECLARE :0/PT2-PM2: Power tray 2 power module 2 is under HW_OUTPUT_DISABLED condition. 0/RP0/ADMIN0:Mar 30 11:25:34.973 EDT: envmon[4316]: %PKT_INFRA-FM-3-FAULT_MAJOR : ALARM_MAJOR :Power Module Output Disabled (PM_OUTPUT_EN_PIN_HI) :CLEAR :0/PT2-PM2: Power tray 2 power module 2 condition HW_OUTPUT_DISABLED is cleared. 0/RP0/ADMIN0:Apr 12 00:12:37.884 EDT: envmon[4316]: %PKT_INFRA-FM-3-FAULT_MAJOR : ALARM_MAJOR :Power Module Output Disabled (PM_OUTPUT_EN_PIN_HI) :DECLARE :0/PT2-PM2: Power tray 2 power module 2 is under HW_OUTPUT_DISABLED condition. 0/RP0/ADMIN0:Apr 12 00:12:42.967 EDT: envmon[4316]: %PKT_INFRA-FM-3-FAULT_MAJOR : ALARM_MAJOR :Power Module Output Disabled (PM_OUTPUT_EN_PIN_HI) :CLEAR :0/PT2-PM2: Power tray 2 power module 2 condition HW_OUTPUT_DISABLED is cleared.
Initially, we saw this with eXR 6.6.3. Then we saw CSCvi75892 and CSCvi62014 on BST which describe the similar issues that I'm seeing above. Both bug cases state that eXR 7.1.1 is a fixed release for these two bugs.
So we upgraded the SW to eXR 7.1.1, and even then we're _still_ seeing above PSU alarms come and go away in 5 seconds from alarm being declared.
We checked with our data center facility staff and they swear that there was absolutely no power outage or voltage drop incidents in the data center. We checked other nearby equipment that's fed by the same UPS and distribution PDU and none of them were seeing any power events. We then sent a technician to the site where ASR9922 is located and had him unplug one of the AC power input feeds to 0/PT1-PM0 (one of the PSUs that kept alarming in the above logs), and when we disconnected power to it, the alarm condition error is different: NO_INPUT_DETECTED is the new error message when actually unplugging the data center power. But the alarms we keep seeing for 5 seconds once every while are PM_OUTPUT_EN_PIN_HI, which seem to suggest that it's not a power issue, but HW failure. But HW failure on 2-3 PSUs across different PM trays?
Are CSCvi75892 and CSCvi62014 *really* fixed in eXR 7.1.1? It doesn't seem to be the case, or should we open an RMA support case for all PSUs in the chassis? o_O
05-06-2020 11:27 AM
It looks like this is happening even on eXR 7.1.15. Time to open TAC case
03-23-2022 07:24 AM
Has this problem ever been fixed? According to BST it should be already fixed in 6.6.3, where you initially discovered it. I am seeing the issue also von 6.6.3 from time to time in rare cases.
03-30-2022 09:39 AM
In the case or "rare" or "infrequent" messages like those in this thread you can ignore it. They are simply cosmetic and non-service impacting. These power modules are very sensitive and can detect small changes to voltage or other attributes of the power coming into them.
In the case of the alarms coming every 5s like the original poster had that would need further investigation as that is not expected.
Sam
03-30-2022 10:00 AM
Just to close the loop on this, we do continue to see these messages, but they are infrequent -- we seem them occur once every week or once every 2-3 weeks.
We used to see them more often, once every week, or once every few days, in the past. It appears the frequency of these message became less frequent, when we started installing more line cards (thus, placed more load) on the chassis -- I'm not sure if that has anything to do with it, but nowadays, we see the message once every 2-3 weeks, so it is less frequent now.
Each time this message occurs, within 5 seconds or less, the error self-clears without us taking any action. So right now, we're filing this under Cosmetic issue and will continue to monitor.
I wanted to mention that we're seeing this only on ASR9922 in our installation. ASR9906s on the same PDU/RPU and circuit breaker panel (so, same power distribution circuits) do not have this issue, even though they're sitting on the same power distribution gear.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide