cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
759
Views
10
Helpful
3
Replies

Should I now flash the running config memory on a Nexus5K switch?

Mic_Jameson
Level 1
Level 1

Hello folks.

-Can you please tell me if you agree with my troubleshoot idea of flashing the running config memory on a Nexus5K switch manifesting abundant errors communicating with a FEX?

-Also, how do I test a switchport for physical defects?

Thank you!

--- 

 

After inspection, I am convinced that the cause of the symptom below is either a misconfiguration error in the primary Nexus 5K (possibly in the secondary Nexus 5k), or a physical defect in a switchport on the primary Nexus 5K (possibly in the secondary Nexus 5k). See hardware and symptom below...

Hardware:
Switch1 is NEXUS 5K. 56128P (primary)
Switch2 is NEXUS 5K. 56128P (secondary)
Two N2K-C2232TM-E-10GE Fabric extender (FEX) devices

Symptom:
"On Switch1 we have a lot of connectivity problems with the FEX 150 and 151.
-In October 2021 we replaced the FEX 150 and the twinax cable, but the problem persists.
-Every time we try to connect any server on the FEX 150, the interface flaps many times and never comes UP.
-As you can see on the configuration, at the moment all the ports on the FEX 150 with the description "Port NOK". That means the port is already tested but it never comes UP, and we had to move the server to another port."

 

The symptoms suggest a major issue at OSI data-link layer or physical layer. After investigation this is confirmed with evidence of...
1. extreme amounts of jumbo frames and CRC errors received on port eth1/17 of Switch1...
"234923790 input packets , 109581999 jumbo packets"

2. abundant extreme data-link errors of various sorts, for example...

IntMacTx-Er
Eth1/1 133074
Eth1/2 153015
(null output omitted)
Eth1/9 85
Eth1/10 42

Only certain ports in the FEX devices that display symptoms.

Also, only the 10G models (bottom 2) are exhibiting symptoms...

Switch1# sh fex
Number Description State Model Serial
------------------------------------------------------------------------
100 FEX0100 Online N2K-C2248TP-E-1GE FOX(output censored)
101 FEX0101 Online N2K-C2248TP-E-1GE FOX(output censored)
102 FEX0102 Online N2K-C2248TP-E-1GE FOX(output censored)
103 FEX0103 Online N2K-C2248TP-E-1GE FOX(output censored)
106 FEX0106 Online N2K-C2248TP-E-1GE FOX(output censored)
107 FEX0107 Online N2K-C2248TP-E-1GE SSI(output censored)
150 FEX0150 Online N2K-C2232TM-E-10GE SSI(output censored)
151 FEX0151 Online N2K-C2232TM-E-10GE SSI(output censored)


RELEVANT NOTES:

-Both 5K Nexus switches are running identical IOS version 7.3(8)N1(1) from same file location. A Nexus 5k in a different module that was healthy was also running this identical IOS.
-The link speed and duplex settings throughout the circuit have been confirmed auto-duplex.

-It is confirmed that the physical architecture of the module represents that within the submitted diagram.

-It is confirmed through simple visual inspection that there exists no unusual electromagnetic radiation, such as emanating from a power supply or magnet, affecting the environment.

-These symptoms could occur from FCoE technology (Ethernet traffic and Fiber Channel (FC) sharing the same Ethernet wire), but the client says FC is not used in this environment.

-It is possible that this is caused by a hardware bug in the FEXs. The bug in the link below manifests different symptoms, but its root cause relates to physical defects in 10G interfaces in some N2K-C2232TM-E-10GE devices; thus it is logical that such physical defects could cause the symptom the client experiences.
https://www.cisco.com/c/en/us/support/docs/field-notices/641/fn64178.html

-There can be various misconfigurations that would cause this issue, but it was evidenced that a different instance of this same technology structure (same device models and architecture, and IOS) exhibited healthy operation. All these devices received their configurations from automated Ansible templates. This architectural module is replicated many times in the network center.

While there is strong circumstantial evidence that the configurations within the Nexus 5Ks are sound, there is stronger direct evidence that the fault source is either a configuration root-cause or a hardware root cause because the client has already replaced a FEX device with a new one, and has experienced same symptom.

THUS, logic concludes that the root cause must be a HARDWARE FAILURE or a MISCOFIGURATION within the Nexus 5K switches.

First a hardware test must be performed on both Nexus 5Ks. If that reveals nothing unusual, then logic concludes that the fault must exist in a running configuration within one or both switches. To solve this, the flash memory where exists the running config must be wiped clean. If there exists any other NVRAM in device line cards, etc, then this must be noted and wiped clean also.
Lastly, it is possible yet unlikely that this symptom is caused by the hardware failure mentioned earlier in a bug. To check this, the suspect FEX must be swapped out with a proven healthy FEX.

ACTION PLAN
1. execute diagnostic tests on BOTH Nexus 5k devices, especially the switchports
2. if 1. confirms healthy hardware, then on both devices erase the flash memory where lives the running config. Also verify existing possible flash memory in modular components such as line cards.
3. reboot both 5Ks. Using Ansible, inject fresh configuration into both Nexus 5Ks. Pay close attention to use the same configuration template as was used in the healthy Nexus 5k located in the different healthy module of Nexus 5Ks.
4. Inspect if symptom eliminated.

3 Replies 3

marce1000
VIP
VIP

 

 >....2. if 1. confirms healthy hardware, then on both devices erase the flash memory where lives the running config. Also verify existing possible flash memory in modular components such as line cards.

  ??????????? Not related to fex problems if any , and don't do that (because you loose all config , to start with). Keep it simple for the moment : look at logs , if you think you have a fex problem (show logging) or use a syslog server (often preferred). Diagnostics on fex devices is not possible from a network manager viewpoint. You may try to connect the N5K's with : https://cway.cisco.com/cli through SSH , at the top left you can then press or run System Diagnostics.

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Thank you for your reply Marce.

 

I prefer to first run a port hardware diagnostic, but I just don't know exactly how to do that, I want to know how to do that. Anyone know how to run a port hardware diagnostic without rebooting switch?

 

I am still convinced that erasing the flash is a practically best course of action, because I conclude that the issue must be either in the configuration, or in the port hardware. My theory is that the system administrator used the wrong Ansible template to provision the switch. If the symptom source is in the configuration, then it is of minimal importance to discover the configuration errors (there could be a lot of them). Best is to wipe clean the flash (running config) and load a clean running config, thereby ensuring no bad configurations are left over. 

Of course it is always possible that I am incorrect. I'd appreciate to know why my plan would be incorrect.

And if any know how to check hardware port health (without rebooting the system) please let me know.

 

Thank you!

 

 

                                        >... Check hardware port health 

 I already replied on that :  You may try to connect the N5K's with : https://cway.cisco.com/cli through SSH , at the top left you canthen press or runSystem Diagnostics. , and don't erase the flash it is not related to port problems.  You may if desired erase the config (write erase) , and start to configure the nexus from scratch again (take care of impact)

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '
Review Cisco Networking for a $25 gift card