cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1942
Views
5
Helpful
2
Replies

A9K-MOD80-SE unexpected reboot

surasit.t1
Level 1
Level 1

My line-card mod80 self-reboot with unknown reason. Detail as below

Help me please!!

 

RP/0/RSP0/CPU0:PE_AYA#  show reboot history location 0/1/cpu0
No  Time                      Cause Code  Reason
--------------------------------------------------------------------------------
01  Fri Feb 23 00:37:35 2018  0x2c000015  Cause: Excess Machine Check Condition
                                                                              
                                            Process: p40x0mc    

 

RP/0/RSP0/CPU0:PE_AYA#dir harddisk:/dumper
25229       -rwx  47046       Fri Feb 23 00:37:39 2018  LC3.180222-173739.crashinfo.by.p40x0mc

 

 

-----------------------------Log----------------------------------

LC/0/1/CPU0:Feb 23 00:37:33.651 : p40x0mc[74]: PCI3 PEX_ERR_DR  : 0x80000020 [ME|LDDE] Count = 68
LC/0/1/CPU0:Feb 23 00:37:33.699 : p40x0mc[74]: %C-MC-CHECKER-3-DEBUG : Excess 128 MC errors
LC/0/1/CPU0:Feb 23 00:37:33.701 : p40x0mc[74]: 29/00/00/000: Vendor/Device ID           : ffffffff [-]
LC/0/1/CPU0:Feb 23 00:37:33.800 : p40x0mc[74]: 29/00/00/004: Command                    : 0000ffff [INTDIS|SERR|PERR|BM|MEM|IO|UNDOCUMENTED]
LC/0/1/CPU0:Feb 23 00:37:33.923 : p40x0mc[74]: 29/00/00/006: Status                     : 0000ffff [DETPE|SIGSE|RECMA|RECTA|SIGTA|MPEDET|CAPLST|INTST|UNDOCUMENTED]
LC/0/1/CPU0:Feb 23 00:37:33.983 : p40x0mc[74]: 00/02/00/000: Vendor/Device ID           : 04081957 [-]
LC/0/1/CPU0:Feb 23 00:37:34.060 : p40x0mc[74]: 00/02/00/10c: Uncorrectable Error Sev.   : 00062010 [MTLP|RXO|FCPE|DLPE]
LC/0/1/CPU0:Feb 23 00:37:34.125 : p40x0mc[74]: 00/02/00/114: Correctable Error Mask     : 00002000 [ADV_NFE]
LC/0/1/CPU0:Feb 23 00:37:34.214 : p40x0mc[74]: 00/02/00/118: Adv. Error Cap. & Control  : 000000a0 [ECRCCC|ECRCGC|FIRST_ERR_PTR=0]
LC/0/1/CPU0:Feb 23 00:37:34.276 : p40x0mc[74]: 00/02/00/12c: Root Error Command         : 00000004 [FERE]
LC/0/1/CPU0:Feb 23 00:37:34.336 : p40x0mc[74]: 00/02/00/404: LTSSM State Status Register: 00000004 [-]
LC/0/1/CPU0:Feb 23 00:37:34.444 : p40x0mc[74]: 00/02/00/054: Device Control             : 0000281f [NSE|RO|URR|FER|NFER|CER|MAX_READ=2|MAX_PAYLOAD=0]
LC/0/1/CPU0:Feb 23 00:37:34.536 : p40x0mc[74]: 00/02/00/05e: Link Status                : 0000c011 [LABS|LBMS|NEG_LINK_W=1|LINK_SP=1]
LC/0/1/CPU0:Feb 23 00:37:34.597 : p40x0mc[74]: 00/02/00/066: Slot Status                : 00000040 [PDS]
LC/0/1/CPU0:Feb 23 00:37:34.661 : p40x0mc[74]: 00/02/00/068: Root Command               : 00000004 [SEFEE]
LC/0/1/CPU0:Feb 23 00:37:34.746 : p40x0mc[74]: 00/02/00/004: Command                    : 00000547 [INTDIS|SERR|PERR|BM|MEM|IO]
LC/0/1/CPU0:Feb 23 00:37:34.811 : p40x0mc[74]: 00/02/00/006: Status                     : 00000010 [CAPLST]
LC/0/1/CPU0:Feb 23 00:37:35.820 : p40x0mc[74]: EISR0: 0x00040000 [PCI3]
LC/0/1/CPU0:Feb 23 00:37:35.874 : p40x0mc[74]: PCI3 PEX_ERR_DR  : 0x80000020 [ME|LDDE] Count = 69
LC/0/1/CPU0:Feb 23 00:37:35.949 : p40x0mc[74]: %C-MC-CHECKER-3-DEBUG : Shutting down node due to excess 129 MC errors
LC/0/1/CPU0:Feb 23 00:37:37.958 : p40x0mc[74]: reboot_internal: Incomplete graceful reboot cleanup (Connection timed out)
LC/0/1/CPU0:Feb 23 00:37:37.958 : p40x0mc[74]: Fri Feb 23 00:37:35 2018:sync start
LC/0/1/CPU0:Feb 23 00:37:37.958 : p40x0mc[74]: Fri Feb 23 00:37:35 2018:sync end
LC/0/1/CPU0:Feb 23 00:37:37.958 : p40x0mc[74]: Fri Feb 23 00:37:35 2018:platform_reboot_op start
RP/0/RSP0/CPU0:Feb 23 00:37:58.350 : shelfmgr[383]: %PLATFORM-SHELFMGR-6-NODE_CPU_RESET : Node 0/1/CPU0 CPU reset detected.
RP/0/RSP0/CPU0:Feb 23 00:37:58.351 : shelfmgr[383]: %PLATFORM-SHELFMGR-6-NODE_STATE_CHANGE : 0/1/CPU0 A9K-MOD80-SE state:BRINGDOWN

1 Accepted Solution

Accepted Solutions

Aleksandar Vidakovic
Cisco Employee
Cisco Employee

If this was a single occurrence you don't need to take any actions. If the card is continuously reporting this failure it should be replaced through RMA. It would be the best if you opened a TAC SR for this.

 

/Aleksandar

View solution in original post

2 Replies 2

Aleksandar Vidakovic
Cisco Employee
Cisco Employee

If this was a single occurrence you don't need to take any actions. If the card is continuously reporting this failure it should be replaced through RMA. It would be the best if you opened a TAC SR for this.

 

/Aleksandar

Thank you for your suggestion.