10-16-2013 09:31 AM - edited 03-07-2019 04:04 PM
Dear Sirs!
I have module WS-X6708-10GE, it is working good but when I run all test by cimmand "diagnostic start module 3 test all" I see next messages:
Router#sh mod
Mod Ports Card Type Model Serial No.
--- ----- -------------------------------------- ------------------ -----------
3 8 CEF720 8 port 10GE with DFC WS-X6708-10GE SAL16084SR5
5 2 Supervisor Engine 720 (Active) WS-SUP720-3B SAD081500MC
Mod MAC addresses Hw Fw Sw Status
--- ---------------------------------- ------ ------------ ------------ -------
3 f0f7.556a.05b0 to f0f7.556a.05b7 3.5 12.2(18r)S1 15.1(2)S Ok
5 000d.ed92.a2c0 to 000d.ed92.a2c3 1.0 8.5(3) 15.1(2)S Ok
Mod Sub-Module Model Serial Hw Status
---- --------------------------- ------------------ ----------- ------- -------
3 Distributed Forwarding Card WS-F6700-DFC3CXL SAL1436T3MM 1.6 Ok
5 Policy Feature Card 3 WS-F6K-PFC3BXL SAD084503T4 1.8 Ok
5 MSFC3 Daughterboard WS-SUP720 SAD102307UN 0.101 Ok
Mod Online Diag Status
---- -------------------
3 Pass
5 Pass
Router#diagnostic start module 3 test all
*Oct 16 10:41:29.899: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestOBFL{ID=1} ...
*Oct 16 10:41:29.899: %DIAG-SP-6-TEST_OK: Module 3: TestOBFL{ID=1} has completed successfully
*Oct 16 10:41:29.899: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestFabricCh0Health{ID=2} ...
*Oct 16 10:41:30.263: %DIAG-SP-6-TEST_OK: Module 3: TestFabricCh0Health{ID=2} has completed successfully
*Oct 16 10:41:30.263: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestFabricCh1Health{ID=3} ...
*Oct 16 10:41:30.631: %DIAG-SP-6-TEST_OK: Module 3: TestFabricCh1Health{ID=3} has completed successfully
*Oct 16 10:41:30.631: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestTransceiverIntegrity{ID=4} ...
*Oct 16 10:41:30.631: %DIAG-SP-3-TEST_SKIPPED: Module 3: TestTransceiverIntegrity{ID=4} is skipped
*Oct 16 10:41:30.631: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestLoopback{ID=5} ...
*Oct 16 10:41:33.435: %DIAG-SP-6-TEST_OK: Module 3: TestLoopback{ID=5} has completed successfully
*Oct 16 10:41:33.435: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestScratchRegister{ID=6} ...
*Oct 16 10:41:33.459: %DIAG-SP-6-TEST_OK: Module 3: TestScratchRegister{ID=6} has completed successfully
*Oct 16 10:41:33.459: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestSynchedFabChannel{ID=7} ...
*Oct 16 10:41:33.459: %DIAG-SP-6-TEST_OK: Module 3: TestSynchedFabChannel{ID=7} has completed successfully
*Oct 16 10:41:33.459: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestDontLearn{ID=8} ...
*Oct 16 10:41:34.707: %DIAG-SP-6-TEST_OK: Module 3: TestDontLearn{ID=8} has completed successfully
*Oct 16 10:41:34.707: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestConditionalLearn{ID=9} ...
*Oct 16 10:41:35.399: %DIAG-SP-6-TEST_OK: Module 3: TestConditionalLearn{ID=9} has completed successfully
*Oct 16 10:41:35.399: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestNewLearn{ID=10} ...
*Oct 16 10:41:36.475: %DIAG-SP-6-TEST_OK: Module 3: TestNewLearn{ID=10} has completed successfully
*Oct 16 10:41:36.475: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestStaticEntry{ID=11} ...
*Oct 16 10:41:37.255: %DIAG-SP-6-TEST_OK: Module 3: TestStaticEntry{ID=11} has completed successfully
*Oct 16 10:41:37.255: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestIndexLearn{ID=12} ...
*Oct 16 10:41:38.031: %DIAG-SP-6-TEST_OK: Module 3: TestIndexLearn{ID=12} has completed successfully
*Oct 16 10:41:38.031: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestCapture{ID=13} ...
*Oct 16 10:41:39.135: %DIAG-SP-6-TEST_OK: Module 3: TestCapture{ID=13} has completed successfully
*Oct 16 10:41:39.135: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestTrap{ID=14} ...
*Oct 16 10:41:39.923: %DIAG-SP-6-TEST_OK: Module 3: TestTrap{ID=14} has completed successfully
*Oct 16 10:41:39.923: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestMacNotification{ID=15} ...
*Oct 16 10:41:40.299: %DIAG-SP-6-TEST_OK: Module 3: TestMacNotification{ID=15} has completed successfully
*Oct 16 10:41:40.299: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestFibDevices{ID=16} ...
*Oct 16 10:41:42.687: %DIAG-SP-6-TEST_OK: Module 3: TestFibDevices{ID=16} has completed successfully
*Oct 16 10:41:42.687: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestIPv4FibShortcut{ID=17} ...
*Oct 16 10:41:43.367: %DIAG-SP-6-TEST_OK: Module 3: TestIPv4FibShortcut{ID=17} has completed successfully
*Oct 16 10:41:43.367: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestIPv6FibShortcut{ID=18} ...
*Oct 16 10:41:44.263: %DIAG-SP-6-TEST_OK: Module 3: TestIPv6FibShortcut{ID=18} has completed successfully
*Oct 16 10:41:44.263: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestNATFibShortcut{ID=19} ...
*Oct 16 10:41:45.031: %DIAG-SP-6-TEST_OK: Module 3: TestNATFibShortcut{ID=19} has completed successfully
*Oct 16 10:41:45.031: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestMPLSFibShortcut{ID=20} ...
*Oct 16 10:41:45.935: %DIAG-SP-6-TEST_OK: Module 3: TestMPLSFibShortcut{ID=20} has completed successfully
*Oct 16 10:41:45.935: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestL3Capture{ID=21} ...
*Oct 16 10:41:46.939: %DIAG-SP-6-TEST_OK: Module 3: TestL3Capture{ID=21} has completed successfully
*Oct 16 10:41:46.939: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestL3VlanMet{ID=22} ...
*Oct 16 10:41:48.023: %DIAG-SP-6-TEST_OK: Module 3: TestL3VlanMet{ID=22} has completed successfully
*Oct 16 10:41:48.023: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestIngressSpan{ID=23} ...
*Oct 16 10:41:48.743: %DIAG-SP-6-TEST_OK: Module 3: TestIngressSpan{ID=23} has completed successfully
*Oct 16 10:41:48.743: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestEgressSpan{ID=24} ...
*Oct 16 10:41:48.975: %DIAG-SP-6-TEST_OK: Module 3: TestEgressSpan{ID=24} has completed successfully
*Oct 16 10:41:48.975: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestAclPermit{ID=25} ...
*Oct 16 10:41:49.863: %DIAG-SP-6-TEST_OK: Module 3: TestAclPermit{ID=25} has completed successfully
*Oct 16 10:41:49.863: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestAclDeny{ID=26} ...
*Oct 16 10:41:57.127: %DIAG-SP-6-TEST_OK: Module 3: TestAclDeny{ID=26} has completed successfully
*Oct 16 10:41:57.127: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestQos{ID=27} ...
*Oct 16 10:41:58.199: %DIAG-SP-6-TEST_OK: Module 3: TestQos{ID=27} has completed successfully
*Oct 16 10:41:58.199: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestNetflowShortcut{ID=28} ...
*Oct 16 10:41:58.891: %DIAG-SP-6-TEST_OK: Module 3: TestNetflowShortcut{ID=28} has completed successfully
*Oct 16 10:41:58.891: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestFirmwareDiagStatus{ID=31} ...
*Oct 16 10:41:58.903: %DIAG-SP-6-TEST_OK: Module 3: TestFirmwareDiagStatus{ID=31} has completed successfully
*Oct 16 10:41:58.903: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestAsicSync{ID=32} ...
*Oct 16 10:41:58.903: %DIAG-SP-3-TEST_SKIPPED: Module 3: TestAsicSync{ID=32} is skipped
*Oct 16 10:41:58.903: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestFibTcamSSRAM{ID=29} ...
*Oct 16 10:41:58.903: DFC3: ******************************************************************
*Oct 16 10:41:58.903: DFC3: * WARNING:
*Oct 16 10:41:58.903: DFC3: * FIB TCAM/SSRAM Memory test on module 3 may take up to .
*Oct 16 10:41:58.903: DFC3: * During this time, please DO NOT perform any packet switching.
*Oct 16 10:41:58.903: DFC3: ******************************************************************
*Oct 16 10:42:04.803: DFC3: test_144bit_lookup: Tcam test walking 0, dev 0
*Oct 16 10:48:41.975: DFC3: test_144bit_lookup: Tcam test walking 1, dev 0
*Oct 16 10:55:18.715: DFC3: test_144bit_lookup: Tcam test walking 0, dev 1
*Oct 16 11:01:55.363: DFC3: test_144bit_lookup: Tcam test walking 1, dev 1
*Oct 16 11:08:32.067: DFC3: test_144bit_lookup: Tcam test walking 0, dev 2
*Oct 16 11:15:08.727: DFC3: test_144bit_lookup: Tcam test walking 1, dev 2
*Oct 16 11:21:45.651: DFC3: test_144bit_lookup: Tcam test walking 0, dev 3
*Oct 16 11:28:22.339: DFC3: test_144bit_lookup: Tcam test walking 1, dev 3
*Oct 16 11:34:59.359: DFC3: test_72bit_lookup: Tcam test walking 0, dev 0
*Oct 16 11:43:38.027: DFC3: test_72bit_lookup: Tcam test walking 1, dev 0
*Oct 16 11:52:16.219: DFC3: test_72bit_lookup: Tcam test walking 0, dev 1
*Oct 16 12:00:54.671: DFC3: test_72bit_lookup: Tcam test walking 1, dev 1
*Oct 16 12:09:33.311: DFC3: test_72bit_lookup: Tcam test walking 0, dev 2
*Oct 16 12:18:11.823: DFC3: test_72bit_lookup: Tcam test walking 1, dev 2
*Oct 16 12:26:50.363: DFC3: test_72bit_lookup: Tcam test walking 0, dev 3
*Oct 16 12:35:28.291: DFC3: test_72bit_lookup: Tcam test walking 1, dev 3
*Oct 16 12:44:06.731: DFC3: FIB TCAM Test Passed.
*Oct 16 12:44:15.063: DFC3: FIB SSRAM Test Passed.
****************************************************************************
* WARNING: TCAM/SSRAM are filled with test pattern. Module 3 MUST be reset *
****************************************************************************
*Oct 16 12:44:15.190: %DIAG-SP-6-TEST_OK: Module 3: TestFibTcamSSRAM{ID=29} has completed successfully
*Oct 16 12:44:15.190: %DIAG-SP-6-TEST_RUNNING: Module 3: Running TestEobcStressPing{ID=30} ...
*Oct 16 12:44:15.190: %DIAG-SP-3-TEST_SKIPPED: Module 3: TestEobcStressPing{ID=30} is skipped
And then I get error:
*Oct 16 12:46:40.187: %HA_EM-6-LOG: Mandatory.go_fabrich1.tcl: GOLD EEM TCL policy for TestFabricCh1Health
Router#sh diagnostic result module 3
Current bootup diagnostic level: minimal
Module 3: CEF720 8 port 10GE with DFC SerialNo : SAL16084SR5
Overall Diagnostic Result for Module 3 : MAJOR ERROR
Diagnostic level at card bootup: minimal
Test results: (. = Pass, F = Fail, U = Untested)
1) TestOBFL ------------------------> .
2) TestFabricCh0Health -------------> .
3) TestFabricCh1Health -------------> F
4) TestTransceiverIntegrity:
Port 1 2 3 4 5 6 7 8
----------------------------
U U U U U U U U
5) TestLoopback:
Port 1 2 3 4 5 6 7 8
----------------------------
. . . . . . . .
6) TestScratchRegister -------------> .
7) TestSynchedFabChannel -----------> .
8) TestDontLearn -------------------> .
9) TestConditionalLearn ------------> .
10) TestNewLearn --------------------> .
11) TestStaticEntry -----------------> .
12) TestIndexLearn ------------------> .
13) TestCapture ---------------------> .
14) TestTrap ------------------------> .
15) TestMacNotification -------------> .
16) TestFibDevices ------------------> .
17) TestIPv4FibShortcut -------------> .
18) TestIPv6FibShortcut -------------> .
19) TestNATFibShortcut --------------> .
20) TestMPLSFibShortcut -------------> .
21) TestL3Capture -------------------> .
22) TestL3VlanMet -------------------> .
23) TestIngressSpan -----------------> .
24) TestEgressSpan ------------------> .
25) TestAclPermit -------------------> .
26) TestAclDeny ---------------------> .
27) TestQos -------------------------> .
28) TestNetflowShortcut -------------> .
29) TestFibTcamSSRAM ----------------> .
30) TestEobcStressPing --------------> U
31) TestFirmwareDiagStatus ----------> .
32) TestAsicSync --------------------> U
But after reload this module, and run minimal diagnostic, I get:
Router#sh diagnostic result
Current bootup diagnostic level: minimal
Module 1: CEF720 8 port 10GE with DFC SerialNo : SAL16084SR5
Overall Diagnostic Result for Module 1 : PASS
Diagnostic level at card bootup: minimal
Test results: (. = Pass, F = Fail, U = Untested)
1) TestOBFL ------------------------> .
2) TestFabricCh0Health -------------> .
3) TestFabricCh1Health -------------> .
4) TestTransceiverIntegrity:
Port 1 2 3 4 5 6 7 8
----------------------------
U U U U U U U U
5) TestLoopback:
Port 1 2 3 4 5 6 7 8
----------------------------
. . . . . . . .
6) TestScratchRegister -------------> .
7) TestSynchedFabChannel -----------> .
8) TestDontLearn -------------------> U
9) TestConditionalLearn ------------> .
10) TestNewLearn --------------------> U
11) TestStaticEntry -----------------> U
12) TestIndexLearn ------------------> U
13) TestCapture ---------------------> U
14) TestTrap ------------------------> U
15) TestMacNotification -------------> .
16) TestFibDevices ------------------> .
17) TestIPv4FibShortcut -------------> .
18) TestIPv6FibShortcut -------------> .
19) TestNATFibShortcut --------------> .
20) TestMPLSFibShortcut -------------> .
21) TestL3Capture -------------------> U
22) TestL3VlanMet -------------------> .
23) TestIngressSpan -----------------> .
24) TestEgressSpan ------------------> .
25) TestAclPermit -------------------> .
26) TestAclDeny ---------------------> U
27) TestQos -------------------------> .
28) TestNetflowShortcut -------------> .
29) TestFibTcamSSRAM ----------------> U
30) TestEobcStressPing --------------> U
31) TestFirmwareDiagStatus ----------> .
32) TestAsicSync --------------------> U
I try to change IOS and Sup720 but I also get this error.
But if I do this test with RSP720-3C-GE, test complete and all test is pass.
Current bootup diagnostic level: minimal
Module 1: CEF720 8 port 10GE with DFC SerialNo : SAL16084SR5
Overall Diagnostic Result for Module 1 : PASS
Diagnostic level at card bootup: minimal
Test results: (. = Pass, F = Fail, U = Untested)
1) TestOBFL ------------------------> .
2) TestFabricCh0Health -------------> .
3) TestFabricCh1Health -------------> .
4) TestTransceiverIntegrity:
Port 1 2 3 4 5 6 7 8
----------------------------
U U U U U U U U
5) TestLoopback:
Port 1 2 3 4 5 6 7 8
----------------------------
. . . . . . . .
6) TestScratchRegister -------------> .
7) TestSynchedFabChannel -----------> .
8) TestDontLearn -------------------> .
9) TestConditionalLearn ------------> .
10) TestNewLearn --------------------> .
11) TestStaticEntry -----------------> .
12) TestIndexLearn ------------------> .
13) TestCapture ---------------------> .
14) TestTrap ------------------------> .
15) TestMacNotification -------------> .
16) TestFibDevices ------------------> .
17) TestIPv4FibShortcut -------------> .
18) TestIPv6FibShortcut -------------> .
19) TestNATFibShortcut --------------> .
20) TestMPLSFibShortcut -------------> .
21) TestL3Capture -------------------> .
22) TestL3VlanMet -------------------> .
23) TestIngressSpan -----------------> .
24) TestEgressSpan ------------------> .
25) TestAclPermit -------------------> .
26) TestAclDeny ---------------------> .
27) TestQos -------------------------> .
28) TestNetflowShortcut -------------> .
29) TestFibTcamSSRAM ----------------> .
30) TestEobcStressPing --------------> U
31) TestFirmwareDiagStatus ----------> .
32) TestAsicSync --------------------> U
33) TestErrorCounterMonitor ---------> .
What You recommended to do?
Thanks!
------------------------------------------------------
Helping seriously ill children, all together. All information about this, is posted on my blog
Solved! Go to Solution.
10-16-2013 10:43 AM
Hi Oleg,
The test constantly monitors the health of the ingress and egress data paths for fabric channel 1 on 10-gigabit modules.
The test runs every five seconds. Ten consecutive failures are treated as fatal and the module resets; three consecutive reset cycles may result in a fabric switchover.
The module resets after 10 consecutive failures. Three consecutive failures resets powers down the module.
Your issue mathing with known DDTS CSCtq54730
CSCtq54730 - TestFabricCh1Health fails on WS-X6708-10GE
Symptom:
FabricCh1health fails when diagn start module <> test all is executed resulting in Major fail.
Conditions:
When you execute diagn start module 8 test all , it runs disruptive test including the tests that need a reset of the card.
There is a clear message on the console that when FibTCAM memory is tested , the card has to be reset after the test is performed and it may affect the normal operation and in this case FabricCh1 HM test is affected. FabricCh1 health test also runs parallelly as HM test along with this FibTcam manually run test.
Ideally when a reset required test is running all the other tests should be skipped. But there is a s/w bug in the code and this particular FabricCh1 HM test is not getting skipped and hence it is failing resulting in Major error which is misguiding.
Workaround:
workaround that you can suggest to your customer running SRD.
We do all these test as part of boot-up and I don't think they have to do these tests again. But even if they want to do it , I suggest they can disable all the HM test and run all the test manually so that there is no conflicts. i.e conf t>no diagn monitor module <> test all.
Regards,
Aru
*** Please rate if the post is useful ***
10-16-2013 10:43 AM
Hi Oleg,
The test constantly monitors the health of the ingress and egress data paths for fabric channel 1 on 10-gigabit modules.
The test runs every five seconds. Ten consecutive failures are treated as fatal and the module resets; three consecutive reset cycles may result in a fabric switchover.
The module resets after 10 consecutive failures. Three consecutive failures resets powers down the module.
Your issue mathing with known DDTS CSCtq54730
CSCtq54730 - TestFabricCh1Health fails on WS-X6708-10GE
Symptom:
FabricCh1health fails when diagn start module <> test all is executed resulting in Major fail.
Conditions:
When you execute diagn start module 8 test all , it runs disruptive test including the tests that need a reset of the card.
There is a clear message on the console that when FibTCAM memory is tested , the card has to be reset after the test is performed and it may affect the normal operation and in this case FabricCh1 HM test is affected. FabricCh1 health test also runs parallelly as HM test along with this FibTcam manually run test.
Ideally when a reset required test is running all the other tests should be skipped. But there is a s/w bug in the code and this particular FabricCh1 HM test is not getting skipped and hence it is failing resulting in Major error which is misguiding.
Workaround:
workaround that you can suggest to your customer running SRD.
We do all these test as part of boot-up and I don't think they have to do these tests again. But even if they want to do it , I suggest they can disable all the HM test and run all the test manually so that there is no conflicts. i.e conf t>no diagn monitor module <> test all.
Regards,
Aru
*** Please rate if the post is useful ***
10-17-2013 02:38 AM
Thanks Dear Arumugam!
We are really help me!
------------------------------------------------------
Helping seriously ill children, all together. All information about this, is posted on my blog
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide