cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
752
Views
14
Helpful
6
Replies

Module on N7K suddenly restarted

Hi Guys,

One of the module in cisco nexus 7010 suddenly restarted,

2014 Nov 21 11:47:39 WIB MASTER %MODULE-2-MOD_NOT_ALIVE: Module 3 not responding... resetting (Serial number: JAF1612ASPA)
2014 Nov 21 11:47:50 WIB MASTER %PLATFORM-2-MOD_DETECT: Module 3 detected (Serial number JAF1612ASPA) Module-Type 10/100/1000 Mbps Ethernet XL Module Model N7K-
M148GT-11L
2014 Nov 21 11:47:50 WIB MASTER %PLATFORM-2-MOD_PWRUP: Module 3 powered up (Serial number JAF1612ASPA)
2014 Nov 21 11:47:50 WIB MASTER %PLATFORM-5-MOD_STATUS: Module 3 current-status is MOD_STATUS_POWERED_UP
2014 Nov 21 11:50:04 WIB MASTER %PLATFORM-5-MOD_STATUS: Module 3 current-status is MOD_STATUS_ONLINE/OK
2014 Nov 21 11:50:04 WIB MASTER %MODULE-5-MOD_OK: Module 3 is online (Serial number: JAF1612ASPA)
2014 Nov 21 11:50:03 WIB MASTER %SYSMGR-SLOT3-5-MODULE_ONLINE: System Manager has received notification of local module becoming online.
2014 Nov 21 11:51:49 WIB MASTER %BIOS_DAEMON-SLOT3-5-BIOS_DAEMON_LC_PRI_BOOT:  System booted from Primary BIOS Flash

Please tell me what is the root cause which made the problem occurred.

Thank you 

Best regards,

hery

6 Replies 6

Hello Hery,

 

Generally, supervisor probe all the linecard by using hearbeat messages. Due to some reason supervisor did not hear any keepalive reply from any linecard then supervisor reset that linecard to recovery .

 

2014 Nov 21 11:47:39 WIB MASTER %MODULE-2-MOD_NOT_ALIVE: Module 3 not responding... resetting (Serial number: JAF1612ASPA)

 

From the above log, it looks like same thing happened in your case also . This is due to  multiple reason , But most of the time EOBC congestion, EOBC drop or bad fabric connection . This could be either hardware issue or transit issue.

 

you can set diag level complete and reset the linecard once . After reseat (physical reseat- means remove the Linecard and insert it back) if you see any diagnostic failure or crash again then you need to replace the card otherwise its transit issue .

 

HTH
Regards,
VS.Suresh.
*Plz rate the usefull posts.

Hi Suresh,

This is production environment , I cannot just reseating the module and get down all 48 connections. 

there was no any changed or activity before on the Nexus, just wondering how this could happen and I don't want to face this problem in the future.

what should I do if I want to know exactly what the problem is ?

thank you

Regards,

hery

Hi,

For this issue, I think the best thing to do is to open a ticket with TAC and send them the output and have them log on to the device and troubleshoot to figure out the root cause.

This way, they can help you with OS upgrade or RMA the module if you need to.

HTH

Hi Reza,

Thank you for your response , maybe I will

Regards,

hery

Hi Hery,

 

I would suggest you to open TAC case as they will check your core file and other internal logs to find the reason for the crash .

 

HTH
Regards,
VS.Suresh.

Generally, supervisor probe all the linecard by using hearbeat messages. Due to some reason supervisor did not hear any keepalive reply from any linecard then supervisor reset that linecard to recovery .
 WIB MASTER %MODULE-2-MOD_NOT_ALIVE: Module 3 not responding... resetting (Serial number: JAF1612ASPA)
From the above log, it looks like same thing happened in your case also . This is due to  multiple reason , But most of the time EOBC congestion, EOBC drop or bad fabric connection . This could be either hardware issue or transit issue.
you can set diag level complete and reset the linecard once . After reseat (physical reseat- means remove the Linecard and insert it back) if you see any diagnostic failure or crash again then you need to replace the card otherwise its transit issue .

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: