cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
5312
Views
3
Helpful
26
Replies

What would cause all fiber optic ports on a switch to go down at once?

DaniloB
Level 1
Level 1

Hi. On a big industrial plant we've replaced an old HP switch with a brand new couple of C2960x switches in stack configuration and ever since then, every 6/8 hours or so, the two fiber optics links of switch #2 go down at once. These are connected to a ring of 3 similar other access switches, that never detect their interfaces going down, so the issue is on the rx side from the point of view of switch #2. Also note that the links on switch #1, connected to the main network, never go down.

Traffic is on average very low as the switches are connected to industrial equipment that requires very little bandwidth. No errors on the interfaces, the fiber links have been checked and they are in good condition. IoS is the latest recommended. Both CPU and memory usages are low. Optical tx and rx powers are at good levels. There are no processes that stop running after the event occurs.

We've replaced the SFP modules. We've replaced switch #2. We've installed an UPS and checked the power source. I've been looking at several kind of debug outputs but so far nothing seems to be of any help, I can't find anything unusual. Note that the *only* way to make the links come up again is to reload switch #2. Shut/no shut does nothing. Disconnecting and reconnecting the cables does nothing. Removing and reinserting the SFP does nothing.

It seems that the whole fiber optic controller stops working for some external cause, like there's something that troubles the controller to the point that it just hangs after a while, after a buffer overflows or something. We can't disconnect the hosts on the ethernet ports as a test, as the connected equipment needs to run 24x7. None of this ever happened with the previous HP switch, it seems that whatever is going on with the network didn't bother it one bit.

We've been plagued by this issue for months and nothing seems to solve it. How do I troubleshoot it further?

26 Replies 26

DaniloB
Level 1
Level 1

As said, it's just an experiment on my part. I tried them all, auto, full, half. No difference. The speed itself can't be set. I now have forced full duplexl on the whole area.

sh span int g2/0/25

Vlan Role Sts Cost Prio.Nbr Type
------------------ ---- --- ---
VLAN0010 Desg FWD 19 128.81 P2p
VLAN0015 Desg FWD 19 128.81 P2p
VLAN0099 Desg FWD 19 128.81 P2p

sh span int g2/0/26

Vlan Role Sts Cost Prio.Nbr Type
------------------ ---- --- ---
VLAN0010 Desg FWD 19 128.81 P2p
VLAN0015 Desg FWD 19 128.81 P2p
VLAN0099 Desg FWD 19 128.81 P2p

Then please share 

Show udld

Show errordisable reocvery 

Show interface error count 

Show interface x/x

 

DaniloB
Level 1
Level 1

What the hell, why are posts being removed? I've pasted the output of show spanning tree detail and it's gone. Here: https://pastebin.com/juZL3mDT

DaniloB
Level 1
Level 1

Alright, there's something wrong with these forums, I've received the notification email seeing that you replied "Did it solve your problem?" but the reply itself is nowhere to be found here. Anyway, no, as expected, forcing full duplex does not change anything, the problem persists, I already tried that before.

Sorry but the link is not open, 
can you share the output here again 

DaniloB
Level 1
Level 1

See attached file.

show errdisable recovery
ErrDisable Reason            Timer Status
-----------------            --------------
arp-inspection               Enabled
channel-misconfig (STP)      Enabled
dhcp-rate-limit              Enabled 
dtp-flap                     Enabled
gbic-invalid                 Enabled
inline-power                 Enabled
link-flap                    Enabled
pagp-flap                    Enabled
psecure-violation            Enabled
sfp-config-mismatch          Enabled
udld                         Enabled
vmps                         Enabled

I see many err disable recovery, disable the auto-recovery and let see what can cause the link down.

DaniloB
Level 1
Level 1

I have disabled the auto recovery, but errdisable itself does not detect anything. It never did.

It just happened again after roughly 8 hours of uptime, both links went down at the same time:

Apr 16 22:49:53.706 CEDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet2/0/26, changed state to down
Apr 16 22:49:53.748 CEDT: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet2/0/25, changed state to down
Apr 16 22:49:54.709 CEDT: %LINK-3-UPDOWN: Interface GigabitEthernet2/0/26, changed state to down
Apr 16 22:49:54.751 CEDT: %LINK-3-UPDOWN: Interface GigabitEthernet2/0/25, changed state to down

As usual, there are no other entries in the logs, the above is all that is being detected: both links go down at once on stack switch #2. With absolutely zero state changes on the ring switches' interfaces. Again, I had to reload switch #2 to make the links come up again.

Also note that it seems we can not simulate the problem in any way. If I either physically disconnect or manually shut down one link on the stack switch, the other keeps staying up. Same if I shut down either links on the ring switches, or I relaod them.

As a very temporary workaround I have now scheduled event manager to automatically reload switch #2 every 6 hours.

DaniloB
Level 1
Level 1

SOLVED. This is mental. There was a PLC connected to a fiber optic port on stack switch ONE that somehow managed to disrupt the functionality of the rx side of the optical controller on stack switch TWO, and that for some reason never bothered the previous HP switch installed. See attached workaround.

Can i ask how you find solution 

DaniloB
Level 1
Level 1

By sheer chance. Last night out of the blue the fiber link of that specific PLC went down, and with it both fiber uplinks on the switch as usual. So, suspecting an issue on that specific PLC, I kept it disconnected, and it's been two days without issues. I have absolutely no idea what the heck is going on with that PLC, but apparently whatever does is able to disrupt the whole optical controller on the stack switches.

Congratulatio on your find Danilo, very difficult problem to fix.

Review Cisco Networking for a $25 gift card