Nexus 3172T MutliChassis EC port failure

lcaruso · ‎09-28-2020

Been dealing with a possible bad Nexus 3172T switch in a VPC pair. After an upgrade, an uplink port in MEC on the Secondary Nexus connection to the Core failed. Both Primary and Secondary in VPC pair have 2x10G connections to the Core.

We have tried everything including swapping QSFP/10Gbase-LR-S from the Primary to the Secondary, fiber paths, patch cables, SFPs on both sides. A QSFP/10Gbase-LR-S that works on the Primary does not function on the Secondary.

To conclusively prove it is the Secondary switch is bad, I want to move a working uplink fiber from Primary MEC uplink to the failed port on the Secondary.

Before I do this, I just want to confirm this does not pose ANY risk to the entire MEC core connection, that it will not bounce or tear down completely as this is where all of the most important servers are hosted. From the Core side, it is just one EC, so I don't see how it could cause an issue, but stranger things have happened.

Thanks for your comments.

Christopher Hart · ‎09-28-2020

Hello!

To get started on this issue, I'd like to clarify a handful of points in your original post:

You mentioned that after an upgrade, you encountered this issue - which specific device(s) was/were upgraded? The Nexus switches, or the upstream core switches? From what software release to what software release was this device upgraded?
You mentioned that the uplink port on the secondary Nexus failed. Can you describe this failure in a bit more detail? Does the physical link refuse to come up/up (showing as Link Failure in syslogs and/or interface status)? What is the exact status of the interface according to the CLI output of the switch?
Are the uplink interfaces of both the Primary and Secondary Nexus switches configured in a vPC? (e.g. Eth1/49 is configured as a member of port-channel10, which has vpc 10 configured under it on both switches.)

Assuming that the answer to Question #3 above is "Yes", then migrating a working uplink interface from the Primary Nexus to a suspect uplink interface on the Secondary Nexus will not cause an issue. If the Secondary Nexus truly holds the Secondary or Operational Secondary vPC role, then if there is any mismatch in vPC consistency parameters, the Secondary Nexus would simply suspend its local interface.

Thank you!

-Christopher

lcaruso · ‎09-28-2020

Hello and thanks for your response.

The Nexus VPC pair were the only upgrades. The Core was not an upgrade we would do in the same window.

From Software
BIOS: version 4.0.0
NXOS: version 7.0(3)I4(6)
BIOS compile time: 12/05/2016
NXOS image file is: bootflash:///nxos.7.0.3.I4.6.bin
NXOS compile time: 3/9/2017 22:00:00 [03/10/2017 01:05:18]

To Software
BIOS: version 5.3.1
NXOS: version 7.0(3)I7(8)
BIOS compile time: 05/17/2019
NXOS image file is: bootflash:///nxos.7.0.3.I7.8.bin
NXOS compile time: 3/3/2020 20:00:00 [03/03/2020 22:49:49]

We opened a ticket with TAC

Status of port is

Ethernet1/54/1 is down (Link not connected)
admin state is up, Dedicated Interface

The log messages for that port on the reboot from the upgrade shows the port likely failed as it is spontaneously reporting a different transcevier type and that was not changed for the upgrade (upgrade was remote)

2020 May 25 16:43:07 %ETHPORT-5-IF_DOWN_NONE: Interface Ethernet1/54/1 is down (None)
2020 May 25 16:43:13 %ETHPORT-5-IF_HARDWARE: Interface Ethernet1/54/1, hardware type changed to 40G
2020 May 25 16:43:13 %ETHPORT-3-IF_UNSUPPORTED_TRANSCEIVER: Transceiver on interface Ethernet1/54/1 is not supported
2020 May 25 16:43:13 %ETHPORT-5-IF_DOWN_NONE: Interface Ethernet1/54/1 is down (None)
2020 May 25 16:43:30 %ETHPORT-5-IF_DOWN_NONE: Interface Ethernet1/54/1 is down (None)

The port does not report back the details of the inserted transceiver which has twice been eyeballed after removal. Instead it reports these incomplete details which are the QSFP+ host port details

show interface e1/54/1 transceiver details
Ethernet1/54/1
transceiver is present
type is QSFP-40G-SR4
name is UNKNOWN
part number is --
revision is --
serial number is --
nominal bitrate is 10300 MBit/sec per channel
Link length supported for 50/125um OM2 fiber is 30 m
Link length supported for 50/125um OM3 fiber is 100 m
cisco id is 13
cisco extended id number is 16

Lane Number:1 Network Lane
SFP Detail Diagnostics Information (internal calibration)
----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low
----------------------------------------------------------------------------
Temperature N/A N/A N/A N/A N/A
Voltage N/A N/A N/A N/A N/A
Current N/A 32.25 mA 0.00 mA 0.00 mA 0.00 mA
Tx Power N/A N/A N/A N/A N/A
Rx Power N/A N/A N/A N/A N/A
Transmit Fault Count = 0
----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning

*** This QSFP support partial diagnostic data! ***

The Uplink ports are VPC but the show vpc consistency-parameters output match perfectly.

Christopher Hart · ‎09-29-2020

Hello!

I appreciate the answers - a few additional questions:

Can you confirm the specific transceiver(s) that you are using for the uplinks to your core switches?
Would you be willing to share the output of show running-config interface Ethernet1/54/1 from both the Primary and Secondary switches?
Would you be willing to share the output of show interface Ethernet1/54/1 transceiver details from the Primary switch?

Thank you!

-Christopher

lcaruso · ‎09-29-2020

TAC just called me and said it could be a bug related to this upgrade path. They are specifying some commands to run to verify this status. I will share once this is determined.

lcaruso · ‎09-29-2020

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvp54772/?rfs=iqvred