11-01-2017 07:26 AM - edited 03-05-2019 09:24 AM
symptom summary
1. same N9K version : 7.0(3)I6(1)
2. different module deployment :
one is N9516 chassis (N9K-X97160YC-EX) other is N9516 chassis (N9K-X9732C-EX)
3. problem : N9516 chassis that deployed N9K-X97160YC-EX has suffering module's reload.
log is EOBC heartbeat failure.
but N9516 chassis that deployed N9K-X9732C-EX is normal.
question : is there any bugs about N9K-X97160YC-EX's driver or code etc..
or has anyone same problem just like me?
11-01-2017 07:33 AM
Hi,
Can you paste the complete log message before reload.
11-01-2017 08:44 AM
unfortunately, I have no logs before reload.
I have several logs after reload.
Can you provide your e-mail address?
because logs are include that business sensitive informations.
11-01-2017 02:58 PM
I had a chance to look at the logs that you shared. Based on the logs provided, i think it is better to replace the module if the module is reloading continuously. If its not reloading / crashing continuously, then i would recommend to physically reseat the card (jack out and jack in the card).
Hope this helps.
11-02-2017 07:24 AM
Hi Lars,
More than likely you saw something as follows:
`show logging log` Timestamp: 2017-10-01 07:55:51.000000 2017 Oct 1 07:55:51 N9K Oct 1 07:55:51 %KERN-0-SYSTEM_MSG: [27786873.639221] [1506844550] EMON: module 6 is not responding on EOBC path. Reloading module. - kernel 2017 Oct 1 07:55:51 N9K %MODULE-2-MOD_DIAG_FAIL: Module 6 (Serial number: XXXXXXXX) reported failure due to EOBC heartbeat failure in device DEV_EOBC_MAC (device error 0xc0a1b137) 2017 Oct 1 07:58:53 N9K %PLATFORM-2-MOD_DETECT: Module 6 detected (Serial number XXXXXXXX) Module-Type 48x1/10G-T 4x40G Ethernet Module Model N9K-X9564TX 2017 Oct 1 07:58:53 N9K %PLATFORM-2-MOD_PWRUP: Module 6 powered up (Serial number XXXXXXXX) `show module internal exceptionlog` Timestamp: 2017-10-01 07:55:51.000000 Module Slot Number: 6 Device Name : eobc System Errorcode : 0x4042004e EOBC heartbeat failure Error Type : FATAL error Time : Sun Oct 1 07:55:51 2017
The EOBC is the Ethernet Out of Bounds Channel, which is used by the switch's supervisor engine to periodically send keepalive packets to all linecards. In some scenarios, keepalives can be missed, triggering the card to reload. If a single heartbeat was missed, it will be corrected automatically; however, when multiple heartbeats are lost simultaneously, the supervisor will power off and power on the module in an attempt to resolve the diagnostic issue.
Typically these issues are a one time occurrence and the module stabilizes on its own post-reload from the SUP as you can see in your environment. On rare occasions, this occurs due to true HW failure caused by multiple parity or uncorrectable errors.
I'd monitor the linecard after reseating it if it did not come back Online on its own. If you have multiple occurrences in the future, you should then open a TAC case to get it replaced.
Hope that helps.
- Andrea
03-27-2018 01:17 AM
Yes unfortunately if you reboot you will loose The config if its not saved , there is known bugs on certain versions , did you try wr and copy run start , or try and save config to another location so that way after reboot you can transfer file straight back in or last resort use putty capture the whole run config locally and then reapply it
Every bug not matter what ID says contact TAC regarding these issues , its a process that gets jammed in most cases and that's why it requires the reboot
I would though due to this being a 7K and such a serious effecting bug contact TAC , you don't want to go to all the trouble of rebooting and it doesn't fix it
If you reboot active sup and its not saved you loose the config as before switchover config needs to be synced to standby
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide