Solved: What would cause all fiber optic ports on a switch to go down at once?

DaniloB · ‎04-14-2023

Hi. On a big industrial plant we've replaced an old HP switch with a brand new couple of C2960x switches in stack configuration and ever since then, every 6/8 hours or so, the two fiber optics links of switch #2 go down at once. These are connected to a ring of 3 similar other access switches, that never detect their interfaces going down, so the issue is on the rx side from the point of view of switch #2. Also note that the links on switch #1, connected to the main network, never go down.

Traffic is on average very low as the switches are connected to industrial equipment that requires very little bandwidth. No errors on the interfaces, the fiber links have been checked and they are in good condition. IoS is the latest recommended. Both CPU and memory usages are low. Optical tx and rx powers are at good levels. There are no processes that stop running after the event occurs.

We've replaced the SFP modules. We've replaced switch #2. We've installed an UPS and checked the power source. I've been looking at several kind of debug outputs but so far nothing seems to be of any help, I can't find anything unusual. Note that the *only* way to make the links come up again is to reload switch #2. Shut/no shut does nothing. Disconnecting and reconnecting the cables does nothing. Removing and reinserting the SFP does nothing.

It seems that the whole fiber optic controller stops working for some external cause, like there's something that troubles the controller to the point that it just hangs after a while, after a buffer overflows or something. We can't disconnect the hosts on the ethernet ports as a test, as the connected equipment needs to run 24x7. None of this ever happened with the previous HP switch, it seems that whatever is going on with the network didn't bother it one bit.

We've been plagued by this issue for months and nothing seems to solve it. How do I troubleshoot it further?

DaniloB · ‎04-18-2023

SOLVED. This is mental. There was a PLC connected to a fiber optic port on stack switch ONE that somehow managed to disrupt the functionality of the rx side of the optical controller on stack switch TWO, and that for some reason never bothered the previous HP switch installed. See attached workaround.

View solution in original post

balaji.bandi · ‎04-14-2023

the two fiber optics links of switch #2 go down at once < can you post the configuration and what module ? what is the far end connected? Do any Logs you see on both ends?

is these uplink Modules the SFP connected? what IOS code running? (check any latest IOS available and upgrade and test it)

For testing if you move one of the Fibre Link and SFP to switch 1 (what is the outcome ?)

Why do you think only 2960 side issue, it may be other side issue also, you also mentioned its RING, how is your STP running (majorly Look the Logs before you reboot)

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

DaniloB · ‎04-14-2023

There's nothing relevant in the configuration, these are simply two trunk links with a few VLANs passing through.

interface GigabitEthernet2/0/25
switchport trunk allowed vlan 10, 15, 99
switchport mode trunk

interface GigabitEthernet2/0/26
switchport trunk allowed vlan 10, 15, 99
switchport mode trunk

This isn't a matter of configuration, there are hundreds of other switches in the plant by the same model, same IoS version [latest recommended, 15.2(7)E7], same configuration, and they are working flawlessly. There are dozens of other instances of this topology in the plant, working with zero issues.

If I move one of the links to switch #1 [this was actually the standard initial configuration for switch redundancy purposes, before I moved both ring links to switch #2 as a test], the outcome is that when that link coming from the ring goes down, the other link on switch #1 coming from the distribution network goes down as well. Similarly, when the link on switch #2 coming from the ring goes down, the other link on switch #2 coming from the distribution network goes down as well.

Regardless of where you connect the links coming from the ring, whenever they fail they bring down the whole optical component of that particular switch. I've consistently tested this by connecting another switch on the 3rd port of any of the two stack switches: whenever the link coming from the ring goes down, it causes every single other optical port _on that specific switch_ to shut down.

So, in either cases, the links going down are always the ones facing the ring. The outcome is the same if I physically open the ring by shutting down a link along the way, so no spanning tree at work in that area in that case. There's rapid-pvst enabled everywhere, with the root bridge being the core switch. In normal conditions, everything works without issues, without registered errors, without broadcast storms, without packet loss. It just happens that at some point the stack switch #2 stops receiving/processing signal on all optical ports at once. Note that we've also replaced the SFP modules on the ring side. When the problem arises, a show interface transceiver command on the ring switches shows that they keep transmitting as usual. The same command on stack switch #2 shows nothing regarding the interfaces that went down [as expected, this is a common IoS behavior].

Nothing relevant in the logs either, there are just the entries of the interfaces going down and nothing more, and as said they go down on switch #2 only, they never go down on the ring switches. To put it simply, switch #2 apparently stops receiving signal, ring switches do not. So it seems that, when the issue occurs, switch #2 stops processing incoming signal, but it keeps transmitting as usual. It *could* be a case where both ring switches connected to switch #2 stop transmitting, and we could test this by connecting an optical instrument on the switch #2 side when the problem occurs, but as said everything always worked without any issues for years with the old HP switch in place of the new stack. That remains to be seen, I'll ask the tech guys on the site.

I'm not saying that the issue relies in the 2960 stack switch itself, I'm saying that there's something happening somewhere else, possibly on the hosts attached to the ring, that is traveling all the way up to the switch preventing it to act normally. I've rarely seen a case where a switch reload was needed to get the interfaces back up, and it was always because of faulty hardware. We've replaced everything.

I'm looking for specific debug commands that would point out what is going on with the optical components when the problem arises, or with the processes governing them, so far I've found nothing. A debug interface states doesn't point up anything relevant, same for debug transceiver error|detail.

Ugh. Sorry for the wall of text.

DaniloB · ‎04-15-2023

Topology attached. To sum it up, G2/0/25 and G2/0/26 on SWSTACK #2 are the links that go down at once every 6/8 hours. None of the other links ever go down, not even SWRING1 G0/1 and SWRING3 G0/1, which are currently both connected to SWSTACK #2.

As said, in the original topology the SWSTACK and SWRING were connected so that one SWSTACK link of any stack member was towards the ring, the other towards the distribution network. I have switched the links between the two SWSTACK switches as a test.

MHM Cisco World · ‎04-15-2023

Did you config any broadcast storm control??

DaniloB · ‎04-15-2023

Yes, on all switches it's set to 1.0 level on all access ports connect to the end hosts. No storm ever occurs anyway.

MHM Cisco World · ‎04-15-2023

I meaning theses two link are it config with broadcast storm control?

DaniloB · ‎04-15-2023

Not currently, but as a test I also enabled storm control to level 1.0 on both links, it never detected any storm for days and both links kept going down as usual.

MHM Cisco World · ‎04-15-2023

can you share the output for both
show spanning-tree interface <SFP>

DaniloB · ‎04-15-2023

sh span int g2/0/25

Vlan Role Sts Cost Prio.Nbr Type
------------------ ---- --- ---
VLAN0010 Desg FWD 19 128.81 Shr
VLAN0015 Desg FWD 19 128.81 Shr
VLAN0099 Desg FWD 19 128.81 Shr

sh span int g2/0/26

Vlan Role Sts Cost Prio.Nbr Type
------------------ ---- --- ---
VLAN0010 Desg FWD 19 128.81 Shr
VLAN0015 Desg FWD 19 128.81 Shr
VLAN0099 Desg FWD 19 128.81 Shr

SWRING2 G0/2 is the one in blk state. Oh, forgot to mention the SFP models, it's GLC-GE-100FX-SO everywhere. These are old MM fibers that due to the distance will allow 100Mb only, but they are in good condition.

MHM Cisco World · ‎04-15-2023

Show spanning tree interface x/x detail

DaniloB · ‎04-15-2023

sh span int g2/0/25 det

Port 81 (GigabitEthernet2/0/25) of VLAN0010 is designated forwarding
Port path cost 19, Port priority 128, Port Identifier 128.81.
Designated root has priority 6, address 0cd0.f814.4b80
Designated bridge has priority 32769, address 0059.dcef.4100
Designated port id is 128.81, designated path cost 58
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is shared by default
BPDU: sent 202, received 54

Port 81 (GigabitEthernet2/0/25) of VLAN0015 is designated forwarding
Port path cost 19, Port priority 128, Port Identifier 128.81.
Designated root has priority 6, address 0cd0.f814.4b80
Designated bridge has priority 32769, address 0059.dcef.4100
Designated port id is 128.81, designated path cost 58
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is shared by default
BPDU: sent 202, received 54

Port 81 (GigabitEthernet2/0/25) of VLAN0099 is designated forwarding
Port path cost 19, Port priority 128, Port Identifier 128.81.
Designated root has priority 6, address 0cd0.f814.4b80
Designated bridge has priority 32769, address 0059.dcef.4100
Designated port id is 128.81, designated path cost 58
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is shared by default
BPDU: sent 202, received 54

sh span int g2/0/26 det

Port 82 (GigabitEthernet2/0/26) of VLAN0010 is designated forwarding
Port path cost 19, Port priority 128, Port Identifier 128.82.
Designated root has priority 6, address 0cd0.f814.4b80
Designated bridge has priority 32769, address 0059.dcef.4100
Designated port id is 128.82, designated path cost 58
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is shared by default
BPDU: sent 137, received 28

Port 82 (GigabitEthernet2/0/26) of VLAN0015 is designated forwarding
Port path cost 19, Port priority 128, Port Identifier 128.82.
Designated root has priority 6, address 0cd0.f814.4b80
Designated bridge has priority 32769, address 0059.dcef.4100
Designated port id is 128.82, designated path cost 58
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is shared by default
BPDU: sent 137, received 28

Port 82 (GigabitEthernet2/0/26) of VLAN0099 is designated forwarding
Port path cost 19, Port priority 128, Port Identifier 128.82.
Designated root has priority 6, address 0cd0.f814.4b80
Designated bridge has priority 32769, address 0059.dcef.4100
Designated port id is 128.82, designated path cost 58
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is shared by default
BPDU: sent 137, received 28

MHM Cisco World · ‎04-15-2023

VLAN0010 Desg FWD 19 128.81 Shr
VLAN0015 Desg FWD 19 128.81 Shr
VLAN0099 Desg FWD 19 128.81 Shr

sh span int g2/0/26

Vlan Role Sts Cost Prio.Nbr Type
------------------ ---- --- ---
VLAN0010 Desg FWD 19 128.81 Shr
VLAN0015 Desg FWD 19 128.81 Shr
VLAN0099 Desg FWD 19 128.81 Shr

the Shared link type is only appear as I know in half-duplex not full duplex link
and some STP mode like RSTP not work good with this type of Link.
so please check the duplex
NOTE:- some SFP not support auto so you need to hardcoded the duplex/speed in interface

DaniloB · ‎04-15-2023

It's just me experimenting with different duplex modes, currently the area is in forced duplex half mode. Makes no difference, the problem persists.

MHM Cisco World · ‎04-15-2023

sorry dont get, you config interface with half duplex ?