The purpose of this document is to provide a step-by-step process to determine if a N7K-M132XP-12 or N7K-M132XP-12L module in Nexus7000 switch need to be RMAed or not.
This document lists the symptoms and relevant troubleshooting steps in determining the health of the module.
Scenario 1: N7K-M132XP-12 or N7K-M132XP-12L “TestPortLoopback” diagnostic test failed
Diag failure, the following syslog is observed:
2011 Dec 10 11:47:11 %DIAG_PORT_LB-2-PORTLOOPBACK_TEST_FAIL: Module:18 TestPortLoopback failed 10 consecutive times. Faulty module:Module 18 affected ports:23 Error:Loopback test failed. Packets lost on the LC at the Queueing engine ASIC
N7K2# show diagnostic result module 18
Current bootup diagnostic level: complete
Module 18: 10 Gbps Ethernet Module
Test results: (. = Pass, F = Fail, I = Incomplete,
To verify if you hit these bugs, perform the following checks:
(1) Check NX-OS version if match with ddts found version. Both bugs are fixed and verified in 5.2(4) and later releases.
(2) Understand when diag message was observed, “show log” will give the time stamp of diag test failure, then check if there is any CPU issue happening near the sametime. Sometimes when the CPU is overwhelmed it causes the diag port loopback test failed, it is not definite, but a good data point to collect.
(3) Collect additional Logs:
“tac-pac bootflash:tech.txt” <<== This command may take several minutes to complete
“show tech module 1”
“show tech gold”
“show hardware internal errors module 1 | diff -s” <<== execute this command few times
(4) We can clear the diagnostic result and re-run them while CPU is not overwhelmed:
# show diagnostic result module 1
# diagnostic clear result module all
(config)# no diagnostic monitor module 1 test 5 (check the test number using "show diagnostic content module X")
(config)# diagnostic monitor module 1 test 5
# diagnostic start module 1 test 5
# show diagnostic result module 1 test 5 (could take a few minutes before test completed)
# show module internal exceptionlog module 1
# show module internal event-history errors
# show hardware internal errors module 1
If the module is recovered and diag test pass, highly likely this is due to the DDTS' mentioned above; as actual hardware failure should fail diag consistently.
If the module failed diag consistently, Please contact Cisco TAC for further analysis.
Scenario 2: M1 modules gets reset and/or link flaps
2012 Jun 13 15:51:30 MDT Q93-7010-A %$ VDC-1 %$ %DIAG_PORT_LB-2-PORTLOOPBACK_TEST_FAIL: Module:3 TestPortLoopback failed 10 consecutive times. Faulty module: affected ports:3,5,7,11,13,15,19,21,23,27,29,31 Error:Loopback test failed. Packets lost on the LC at the MAC ASIC
2012 Jun 13 15:51:30 MDT Q93-7010-A %$ VDC-1 %$ %DIAG_PORT_LB-2-PORTLOOPBACK_TEST_FAIL: Module:3 TestPortLoopback failed 10 consecutive times. Faulty module: affected ports:4,6,8,12,14,16,20,22,24,26,28,30,32 Error:Loopback test failed. Packets lost on the LC at the Queueing engine ASIC
Error Description : LM_INT_CL1_TCAM_B_PARITY_ERR <== uncorrectable parity error in TCAM B
DSAP : 211
UUID : 382
Time : Tue Jul 10 04:05:40 2012
(840168 usecs 4FFC0C84(H) jiffies)
(4) Compare the TCAM exception above with the exception log record at the time of the crash
Exception Log Record : Tue Jul 10 04:17:02 2012 (270399 us) <== 12 minutes after the TCAM parity error
Device Id : 34304
Device Name : 0x8600
Device Error Code : 7e010000(H)
Device Error Type : NULL
Device Error Name : NULL
Device Instance : 0
Sys Error : (null)
Errtype : CATASTROPHIC
PhyPortLayer : 0x0
Port(s) Affected :
Error Description : lamira_usd hap reset <== lamira crashed because the number of TCAM parity errors in TCAM B
violated the HA Policy(HAP)
DSAP : 0
UUID : 16777216
Time : Tue Jul 10 04:17:02 2012
(270399 usecs 4FFC0F2E(H) jiffies)
(5) If the lamira crash was caused by HAP reset due to multiple TCAM parity errors then the LC should be RMAd/EFAd. Otherwise, if lamira crashed for some other reason, continue your normal troubleshooting. The onboard lamira log (command #2) will help Cisco TAC to root-cause it.
Scenario 4: All M1 modules fail specific diagnostic tests, like PortLoopback or RewriteEngineLoopback test
In Nexus7000, if there is any issue between active supervisor engine and an xbar module, and as a result diagnositic packets are dropped, the supeervisor engine may report diagnostic test failure for multiple/all ports in multiple/all modules.
This issue requires manual investigation and isolation of faulty sup engine.
The condition which caused the tests to go into errdisabled state may be transient.
It is recommended to run the tests on-demand and see if the condition is persistent.
First, try to clear the ErrDisabled state of the test:
N7K# diagnostic clear result module 1 test ? <1-6> Test ID(s) all Select all
To run on-demand test:
N7K# diagnostic start module <mod#> test <test#>
To stop the test:
N7K# diagnostic stop module <mod#> test <test#>
As a corrective action, the sup engine do not trigger failover or reset to recover from this condtion. To request corrective action, an enhancement request has been filed -
CSCth03474 - n7k/GOLD:Improve Fault Isolation of N7K-GOLD
I have a Meraki MS390 that I'm using to setup my wireless network. This switch is connected to 2 5K Nexus switches as a up link. From there, I have distrubtion switches that have to go through the 5K to the MS390. Routing has been setup ...
Hello everyone,My company has multiple sites, so they have configured GRE tunnels between each other. But, I meet a problem that when I ping from site A(an example is host A) to a host in site B, in core switch of site B I have created an extended access-...
Hi, VRF usually are used at interface so that making packet special. Please the two below configurations. Is there any different impact on dhcp pool between the two configurations? Thank you Configuration 1:ip vrf test1rd 100:10!ip dhcp pool te...
On a Cisco SG350-28P, putting a port into trunk mode, adding the desired VLAN's to it, and trying to set the default VLAN doesn't hold after a reboot - it reverts back every time to just the default VLAN and the Voice VLAN. Here's the switchport conf...
Hi all, for a client (sort of ISP) of mine I was asked to develop an infrastructure where 60-70 tenants, the majority of which have already chosen their own VLAN and IP addressing with phisically segregated/cabled networks, will have their own segreg...