02-20-2016 03:24 PM - edited 03-08-2019 04:39 AM
I'm having an intermittent problem with a pair of 1000BASE-LH layer-3 fiber links from a 3750X stack to a VSS pair of 4500 core switches. The links are working fine and then spontaneously both fail at the exact same time. Failure mode is loss of link at the 3750X. At the 4500, there is no report of link loss, just EIGRP and PIM reports of lost connectivity. I don't see anything unusual in port counters or status but I'm not sure what I should be looking for. Service can be restored by bouncing (shut, no shut) the 3750X ports. This has happened twice in the past month. The failures were separated by 7 days. I have "udld aggressive" configured but there are no UDLD reports in the logs. Software is 15.0(2)SE6 with IP services on the 3750Xs and 3.4.4 SG with enterprise services on the 4500.
02-20-2016 03:50 PM
Post the complete outputs to the command "sh interface <BLAH>" of both interfaces.
02-20-2016 04:31 PM
I should be able to followup with that information on Monday. Like I said, I didn't see anything unusual there. What would you be looking for in this output?
08-24-2016 12:23 PM
I ran into a similar issue with a pair of 3750-X in a stack configuration today. We found that the distant end showed a link light, but the ports on the 3750-X showed no link, no err-disable, no issues at all as far as we could tell.
It was certainly puzzling that the distant end showed a link light. We even confirmed by unplugging the cable on the 3750-X and indeed, the distant end link status went down.
We ended up moving the cables to different ports on the 3750's and they worked. So some ports had this issue, and others did not. We'll be rebooting the switch this evening to see if the bad ports recover, but I was curious to see what your resolution was with your similar issue.
Thanks
08-24-2016 01:29 PM
I was able to clear the condition with shut/no shut so I expect rebooting it will clear it as well. This problem has not been reported again since initial install. We were running 15.0(2)SE6 when this happened. We have recently updated to 15.0(2)SE9.
08-24-2016 01:50 PM
We also did a shut/no shut without success so perhaps we have slightly different issues. Additionally, our IOS is 12.2(58)SE2. Well, we'll see what happens this evening following a reboot. Thanks again
02-21-2016 12:46 AM
err-disabled on either side? is there any "show interface ... transceiver" information available?
02-21-2016 06:39 AM
Thanks for these suggestions.
Unfortunately we have the older GLC-LH-SM transceivers on all 4 ports so no DOM information is available and "show interface...transceiver" comes back empty. I have asked to have these SFPs upgraded but that hasn't happened yet. I don't hold out too much hope for any useful information here. Since both ports, which are are on different switches in the same stack, are going down at the same time, I'm thinking it has to be a deeper problem than a strange link failure.
I probably would have noticed an err-disabled state in interface status but maybe not. That's definitely something I will look for and report back with my findings.
One really strange thing here is that the LEDs on the 3750X and the port status shows "down" and yet I get a link light at the other side of the fiber. Whether this is a link down or err-disable down, the fact the the other side reports differently tells me that the 3750X is, to some extent, lying to me.
I have reviewed the switch logs and syslogs and I don't see any err-disable events. I can't completely trust this because by the time we noticed this, the information would have rolled out of the switch logs and because both uplinks are quitting at the same time and our syslog server is on the other side of these links, I may have lost some messages from the 3750X side.
02-21-2016 02:32 AM
Hello
What does the logging buffers show?
is this a possible physical issue -have you swapped the cabling.sfp's and ports etc..
As stated any error disable messages showing?
Is this just on those particular interfaces that this is occurring - if not then I would suggest think of a possible buggy iOS also and maybe changing this could be applicable.
Res
Paul
02-21-2016 06:55 AM
See my response to Iulian Vaideanu below for info on the logs. I see no err-disable messages but for reasons detailed in my response below, I can't be sure I've seen all log messages.
We have not done a lot of cable and SFP swaps for two reasons. 1/ The problem has occurred only twice in three weeks and 2/ The fact that both ports (on different switches in the same stack) are going down at the same time, I'm thinking it has to be a deeper problem than a strange independent link failure.
I am trying to get all 4 SFPs changed to newer GLC-LH-SMD to both rule out SFP issues and to see if there is any interesting DOM information that might help me. I'm not sure when that is going to happen.
There are two fiber ports on this 5-switch 3750X stack and both are affected. I'm not having any problems with any of the copper ports. I have several other 3750X equipment stacks in this network running the same 15.0(2)SE6 IOS version and same basic configuration. These other stacks have been up for months without issue. The only thing significantly different here is this stack is running IP services and my other stacks are running IP base.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide