02-16-2011 04:15 AM
Hello
I have a module ACE10-6500-K9 inserted en module 8 of a catalyst 6509 that gave me this error yesterday.
The workaround is to manually reset the slot ¿ok? I try to reload and the problem persists ¿is neccesary hardware reset to solve this probem?
Is due to a bug o hardware problem?
%C6KPWR-SP-4-DISABLED: power to module in slot 8 set off (Module not responding to Keep Alive polling)
Thanks you very much
02-16-2011 10:45 PM
Hi, a.serrano
The meaning of the message is as it says. Sup, to be specific, Switch Processor of Sup sent continual keepalives through EOBC path and
did not hear back for keepalives from ACE in slot 8. So the Sup reset the ACE blade in slot 8.
I can only say that it could be h/w related or s/w related or due to slack inserted blade with the message.
If it is h/w related, whichever chassis slot, chassis eobc path, ACE blade, the first thing you need to check out is that
failures in generic on-line diagnostic (GOLD) from Sup side.
Let's see what diagnostic is running on ACE blade.
Router#show diagnostic content module 1
Module 1: Application Control Engine Module
Diagnostics test suite attributes:
M/C/* - Minimal bootup level test / Complete bootup level test / NA
B/* - Basic ondemand test / NA
P/V/* - Per port test / Per device test / NA
D/N/* - Disruptive test / Non-disruptive test / NA
S/* - Only applicable to standby unit / NA
X/* - Not a health monitoring test / NA
F/* - Fixed monitoring interval test / NA
E/* - Always enabled monitoring test / NA
A/I - Monitoring is active / Monitoring is inactive
R/* - Power-down line cards and need reload supervisor / NA
K/* - Require resetting the line card after the test has completed / NA
T/* - Shut down all ports and need reload supervisor / NA
Test Interval Thre-
ID Test Name Attributes day hh:mm:ss.ms shold
==== ================================== ============ =============== =====
1) TestEobcStressPing --------------> ***D*X**I*** not configured n/a
2) TestFirmwareDiagStatus ----------> M**N****I*** 000 00:00:15.00 10
3) TestAsicSync --------------------> ***N****A*** 000 00:00:15.00 10
With ACE blade, "3) TestAsicSync" has "A" flag which means "Monitoring is active".
SP of Sup is sending polling packets at a certain interval to check health of an Asic on ACE blade.
Now let's see failure count of that.
Router#show diagnostic result module 1 detail
___________________________________________________________________________
3) TestAsicSync --------------------> .
Error code ------------------> 0 (DIAG_SUCCESS)
Total run count -------------> 47297
Last test execution time ----> Feb 17 2011 05:52:34
First test failure time -----> n/a
Last test failure time ------> n/a
Last test pass time ---------> Feb 17 2011 05:52:34
Total failure count ---------> 0
Consecutive failure count ---> 0
___________________________________________________________________________
If you see failure counters incremented, check the same thing with other blades inserted in the chassis to know
if it is specific to slot 8 or seen with multiple slots. (different type of blade has different type of diagnostic contents)
Also, check dropped and retry counters SCP as below.
Router#remote command switch show scp status
Rx 22492903, Tx 11717042, scp_my_addr 0x5
Id Sap Channel name current/peak/retry/dropped/total time(queue/process/ack)
-- ---- ------------------- -------------------------------- ----------------------
0 20 SCP Unsolicited:20 0/ 0/ 0/ 0/ 0 0/ 0/ 0
1 0 SCP Unsolicited:0 0/ 3/ 0/ 0/8179027 0/ 0/10036
2 2 SCP Unsolicited:2 0/ 2/ 0/ 0/8205700 0/ 0/ 0
3 21 SCP Unsolicited:21 0/ 0/ 0/ 0/ 0 0/ 0/ 0
4 1 SCP Unsolicited:1 0/ 2/ 0/ 0/109393 0/ 0/ 252
5 18 SCP Unsolicited:18 0/ 0/ 0/ 0/ 0 0/ 0/ 0
6 17 SCP Unsolicited:17 0/ 0/ 0/ 0/ 0 0/ 0/ 0
7 16 SCP Unsolicited:16 0/ 0/ 0/ 0/ 0 0/ 0/ 0
8 33 SCP async: LCP#6 0/ 37/ 0/ 0/1779208 172/ 240/ 28
9 32 SCP async: LCP#4 0/ 24/ 0/ 0/2234291 296/ 604/ 236
10 37 SCP async: LCP#5 0/ 61/ 0/ 0/1381933 1040/ 716/ 236
11 36 SCP async: LCP#1 0/ 1008/ 0/ 0/455925 1192/1184/ 236
12 39 SCP async: LCP#2 0/ 150/ 0/ 0/252763 696/ 456/ 224
Router#
LCP# means that "Line Card Processor of slot #".
If you see counters mentioned above incremented continualy with the ACE blade in slot 8,
try removing / re-inserting the blade. If it persists, consider moving the ACE blade to other slot.
Even it persists after that, now consider h/w replace.
If moving slot or h/w replace do not fix the reset due to keepalive failure, or those counters incrementing,
it might be s/w related issue.
I do not know what s/w version you use, however we always recommend to take the latest
version to have bug fixes and enhancements.
Actually we had control plane issue with ACE that could cause not responding to keepalive
some times ago.
Let's isolate possibility of bad chassis and slack inserted blade, then try s/w upgrading.
If all those effort fails, pls consider h/w replace.
If s/w upgrade is not easy option for you, try replacing ACE blade instead of s/w upgrade
and keep s/w upgrade as the last option based on your environment.
Regards,
Kim
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide