cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
5319
Views
3
Helpful
26
Replies

What would cause all fiber optic ports on a switch to go down at once?

DaniloB
Level 1
Level 1

Hi. On a big industrial plant we've replaced an old HP switch with a brand new couple of C2960x switches in stack configuration and ever since then, every 6/8 hours or so, the two fiber optics links of switch #2 go down at once. These are connected to a ring of 3 similar other access switches, that never detect their interfaces going down, so the issue is on the rx side from the point of view of switch #2. Also note that the links on switch #1, connected to the main network, never go down.

Traffic is on average very low as the switches are connected to industrial equipment that requires very little bandwidth. No errors on the interfaces, the fiber links have been checked and they are in good condition. IoS is the latest recommended. Both CPU and memory usages are low. Optical tx and rx powers are at good levels. There are no processes that stop running after the event occurs.

We've replaced the SFP modules. We've replaced switch #2. We've installed an UPS and checked the power source. I've been looking at several kind of debug outputs but so far nothing seems to be of any help, I can't find anything unusual. Note that the *only* way to make the links come up again is to reload switch #2. Shut/no shut does nothing. Disconnecting and reconnecting the cables does nothing. Removing and reinserting the SFP does nothing.

It seems that the whole fiber optic controller stops working for some external cause, like there's something that troubles the controller to the point that it just hangs after a while, after a buffer overflows or something. We can't disconnect the hosts on the ethernet ports as a test, as the connected equipment needs to run 24x7. None of this ever happened with the previous HP switch, it seems that whatever is going on with the network didn't bother it one bit.

We've been plagued by this issue for months and nothing seems to solve it. How do I troubleshoot it further?

1 Accepted Solution

Accepted Solutions

DaniloB
Level 1
Level 1

SOLVED. This is mental. There was a PLC connected to a fiber optic port on stack switch ONE that somehow managed to disrupt the functionality of the rx side of the optical controller on stack switch TWO, and that for some reason never bothered the previous HP switch installed. See attached workaround.

View solution in original post

26 Replies 26

balaji.bandi
Hall of Fame
Hall of Fame

the two fiber optics links of switch #2 go down at once  < can you post the configuration and what module ? what is the far end connected? Do any Logs you see on both ends?

is these uplink Modules the SFP connected? what IOS code running? (check any latest IOS available and upgrade and test it)

For testing if you move one of the Fibre Link and SFP to switch 1  (what is the outcome ?)

Why do you think only 2960 side issue, it may be other side issue also, you also mentioned its RING, how is your STP running (majorly Look the Logs before you reboot)

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

DaniloB
Level 1
Level 1

There's nothing relevant in the configuration, these are simply two trunk links with a few VLANs passing through.

interface GigabitEthernet2/0/25
 switchport trunk allowed vlan 10, 15, 99
 switchport mode trunk

interface GigabitEthernet2/0/26
 switchport trunk allowed vlan 10, 15, 99
 switchport mode trunk

This isn't a matter of configuration, there are hundreds of other switches in the plant by the same model, same IoS version [latest recommended, 15.2(7)E7], same configuration, and they are working flawlessly. There are dozens of other instances of this topology in the plant, working with zero issues.

If I move one of the links to switch #1 [this was actually the standard initial configuration for switch redundancy purposes, before I moved both ring links to switch #2 as a test], the outcome is that when that link coming from the ring goes down, the other link on switch #1 coming from the distribution network goes down as well. Similarly, when the link on switch #2 coming from the ring goes down, the other link on switch #2 coming from the distribution network goes down as well.

Regardless of where you connect the links coming from the ring, whenever they fail they bring down the whole optical component of that particular switch. I've consistently tested this by connecting another switch on the 3rd port of any of the two stack switches: whenever the link coming from the ring goes down, it causes every single other optical port _on that specific switch_ to shut down.

So, in either cases, the links going down are always the ones facing the ring. The outcome is the same if I physically open the ring by shutting down a link along the way, so no spanning tree at work in that area in that case. There's rapid-pvst enabled everywhere, with the root bridge being the core switch. In normal conditions, everything works without issues, without registered errors, without broadcast storms, without packet loss. It just happens that at some point the stack switch #2 stops receiving/processing signal on all optical ports at once. Note that we've also replaced the SFP modules on the ring side. When the problem arises, a show interface transceiver command on the ring switches shows that they keep transmitting as usual. The same command on stack switch #2 shows nothing regarding the interfaces that went down [as expected, this is a common IoS behavior].

Nothing relevant in the logs either, there are just the entries of the interfaces going down and nothing more, and as said they go down on switch #2 only, they never go down on the ring switches. To put it simply, switch #2 apparently stops receiving signal, ring switches do not. So it seems that, when the issue occurs, switch #2 stops processing incoming signal, but it keeps transmitting as usual. It *could* be a case where both ring switches connected to switch #2 stop transmitting, and we could test this by connecting an optical instrument on the switch #2 side when the problem occurs, but as said everything always worked without any issues for years with the old HP switch in place of the new stack. That remains to be seen, I'll ask the tech guys on the site.

I'm not saying that the issue relies in the 2960 stack switch itself, I'm saying that there's something happening somewhere else, possibly on the hosts attached to the ring, that is traveling all the way up to the switch preventing it to act normally. I've rarely seen a case where a switch reload was needed to get the interfaces back up, and it was always because of faulty hardware. We've replaced everything.

I'm looking for specific debug commands that would point out what is going on with the optical components when the problem arises, or with the processes governing them, so far I've found nothing. A debug interface states doesn't point up anything relevant, same for debug transceiver error|detail.

Ugh. Sorry for the wall of text.

DaniloB
Level 1
Level 1

Topology attached. To sum it up, G2/0/25 and G2/0/26 on SWSTACK #2 are the links that go down at once every 6/8 hours. None of the other links ever go down, not even SWRING1 G0/1 and SWRING3 G0/1, which are currently both connected to SWSTACK #2.

As said, in the original topology the SWSTACK and SWRING were connected so that one SWSTACK link of any stack member was towards the ring, the other towards the distribution network. I have switched the links between the two SWSTACK switches as a test.

Did you config any broadcast storm control??

DaniloB
Level 1
Level 1

Yes, on all switches it's set to 1.0 level on all access ports connect to the end hosts. No storm ever occurs anyway.

I meaning theses  two link are it config with broadcast storm control?

DaniloB
Level 1
Level 1

Not currently, but as a test I also enabled storm control to level 1.0 on both links, it never detected any storm for days and both links kept going down as usual.

can you share the output for both 
show spanning-tree interface <SFP>

DaniloB
Level 1
Level 1

sh span int g2/0/25

Vlan Role Sts Cost Prio.Nbr Type
------------------ ---- --- ---
VLAN0010 Desg FWD 19 128.81 Shr
VLAN0015 Desg FWD 19 128.81 Shr
VLAN0099 Desg FWD 19 128.81 Shr

sh span int g2/0/26

Vlan Role Sts Cost Prio.Nbr Type
------------------ ---- --- ---
VLAN0010 Desg FWD 19 128.81 Shr
VLAN0015 Desg FWD 19 128.81 Shr
VLAN0099 Desg FWD 19 128.81 Shr

SWRING2 G0/2 is the one in blk state. Oh, forgot to mention the SFP models, it's GLC-GE-100FX-SO everywhere. These are old MM fibers that due to the distance will allow 100Mb only, but they are in good condition.

Show spanning tree interface x/x detail 

DaniloB
Level 1
Level 1

sh span int g2/0/25 det

Port 81 (GigabitEthernet2/0/25) of VLAN0010 is designated forwarding
Port path cost 19, Port priority 128, Port Identifier 128.81.
Designated root has priority 6, address 0cd0.f814.4b80
Designated bridge has priority 32769, address 0059.dcef.4100
Designated port id is 128.81, designated path cost 58
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is shared by default
BPDU: sent 202, received 54

Port 81 (GigabitEthernet2/0/25) of VLAN0015 is designated forwarding
Port path cost 19, Port priority 128, Port Identifier 128.81.
Designated root has priority 6, address 0cd0.f814.4b80
Designated bridge has priority 32769, address 0059.dcef.4100
Designated port id is 128.81, designated path cost 58
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is shared by default
BPDU: sent 202, received 54

Port 81 (GigabitEthernet2/0/25) of VLAN0099 is designated forwarding
Port path cost 19, Port priority 128, Port Identifier 128.81.
Designated root has priority 6, address 0cd0.f814.4b80
Designated bridge has priority 32769, address 0059.dcef.4100
Designated port id is 128.81, designated path cost 58
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is shared by default
BPDU: sent 202, received 54


sh span int g2/0/26 det

Port 82 (GigabitEthernet2/0/26) of VLAN0010 is designated forwarding
Port path cost 19, Port priority 128, Port Identifier 128.82.
Designated root has priority 6, address 0cd0.f814.4b80
Designated bridge has priority 32769, address 0059.dcef.4100
Designated port id is 128.82, designated path cost 58
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is shared by default
BPDU: sent 137, received 28

Port 82 (GigabitEthernet2/0/26) of VLAN0015 is designated forwarding
Port path cost 19, Port priority 128, Port Identifier 128.82.
Designated root has priority 6, address 0cd0.f814.4b80
Designated bridge has priority 32769, address 0059.dcef.4100
Designated port id is 128.82, designated path cost 58
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is shared by default
BPDU: sent 137, received 28

Port 82 (GigabitEthernet2/0/26) of VLAN0099 is designated forwarding
Port path cost 19, Port priority 128, Port Identifier 128.82.
Designated root has priority 6, address 0cd0.f814.4b80
Designated bridge has priority 32769, address 0059.dcef.4100
Designated port id is 128.82, designated path cost 58
Timers: message age 0, forward delay 0, hold 0
Number of transitions to forwarding state: 1
Link type is shared by default
BPDU: sent 137, received 28

VLAN0010 Desg FWD 19 128.81 Shr
VLAN0015 Desg FWD 19 128.81 Shr
VLAN0099 Desg FWD 19 128.81 Shr

sh span int g2/0/26

Vlan Role Sts Cost Prio.Nbr Type
------------------ ---- --- ---
VLAN0010 Desg FWD 19 128.81 Shr
VLAN0015 Desg FWD 19 128.81 Shr
VLAN0099 Desg FWD 19 128.81 Shr

the Shared link type is only appear as I know in half-duplex not full duplex link
and some STP mode like RSTP not work good with this type of Link.
so please check the duplex 
NOTE:- some SFP not support auto  so you need to hardcoded the duplex/speed in interface 

DaniloB
Level 1
Level 1

It's just me experimenting with different duplex modes, currently the area is in forced duplex half mode. Makes no difference, the problem persists.

sorry dont get, you config interface with half duplex ?

Review Cisco Networking for a $25 gift card