cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1614
Views
0
Helpful
5
Replies

I have question about Nexus's module reload issue.

lars96.song
Level 1
Level 1

symptom summary

 

1. same N9K version : 7.0(3)I6(1)

2. different module deployment :

    one is N9516 chassis (N9K-X97160YC-EX) other is N9516 chassis (N9K-X9732C-EX)

 

3. problem : N9516 chassis that deployed N9K-X97160YC-EX has suffering module's reload.

                   log is EOBC heartbeat failure.

                   but N9516 chassis that deployed N9K-X9732C-EX is normal.

 

 

question : is there any bugs about N9K-X97160YC-EX's driver or code etc..

                or has anyone same problem just like me?

5 Replies 5

Vinit Jain
Cisco Employee
Cisco Employee

Hi,

Can you paste the complete log message before reload.

Thanks
--Vinit

unfortunately, I have no logs before reload.

I have several logs after reload.

Can you provide your e-mail address?

because logs are include that business sensitive informations. 

 

I had a chance to look at the logs that you shared. Based on the logs provided, i think it is better to replace the module if the module is reloading continuously. If its not reloading / crashing continuously, then i would recommend to physically reseat the card (jack out and jack in the card).

Hope this helps.

Thanks
--Vinit

Andrea Testino
Cisco Employee
Cisco Employee

Hi Lars,

 

More than likely you saw something as follows:

 

`show logging log`
   Timestamp: 2017-10-01 07:55:51.000000
      2017 Oct  1 07:55:51 N9K Oct  1 07:55:51 %KERN-0-SYSTEM_MSG: [27786873.639221] [1506844550] EMON: module 6 is not responding on EOBC path. Reloading module. - kernel
      2017 Oct  1 07:55:51 N9K %MODULE-2-MOD_DIAG_FAIL: Module 6 (Serial number: XXXXXXXX) reported failure due to EOBC heartbeat failure in device DEV_EOBC_MAC (device error 0xc0a1b137)
      2017 Oct  1 07:58:53 N9K %PLATFORM-2-MOD_DETECT: Module 6 detected (Serial number XXXXXXXX) Module-Type 48x1/10G-T 4x40G Ethernet Module Model N9K-X9564TX
      2017 Oct  1 07:58:53 N9K %PLATFORM-2-MOD_PWRUP: Module 6 powered up (Serial number XXXXXXXX)
 
`show module internal exceptionlog`
   Timestamp: 2017-10-01 07:55:51.000000
      Module Slot Number: 6
      Device Name       : eobc
      System Errorcode  : 0x4042004e EOBC heartbeat failure
      Error Type        : FATAL error
      Time              : Sun Oct  1 07:55:51 2017

The EOBC is the Ethernet Out of Bounds Channel, which is used by the switch's supervisor engine to periodically send keepalive packets to all linecards. In some scenarios, keepalives can be missed, triggering the card to reload. If a single heartbeat was missed, it will be corrected automatically; however, when multiple heartbeats are lost simultaneously, the supervisor will power off and power on the module in an attempt to resolve the diagnostic issue.

Typically these issues are a one time occurrence and the module stabilizes on its own post-reload from the SUP as you can see in your environment. On rare occasions, this occurs due to true HW failure caused by multiple parity or uncorrectable errors.

 

I'd monitor the linecard after reseating it if it did not come back Online on its own. If you have multiple occurrences in the future, you should then open a TAC case to get it replaced.

 

Hope that helps.

 

- Andrea

- Andrea, CCIE #56739 R&S

Yes unfortunately if you reboot you will loose The config if its not saved , there is known bugs on certain versions , did you try wr and copy run start , or try and save config to another location so that way after reboot you can transfer file straight back in or last resort use putty capture the whole run config locally and then reapply it

Every bug not matter what ID says contact TAC regarding these issues , its a process that gets jammed in most cases and that's why it requires the reboot

I would though due to this being a 7K and such a serious effecting bug contact TAC , you don't want to go to all the trouble of rebooting and it doesn't fix it

If you reboot active sup and its not saved you loose the config as before switchover config needs to be synced to standby

Review Cisco Networking for a $25 gift card