06-17-2015 06:14 PM
Hi ,
Need help in resolving the below issue and also answers for a few queries
The setup of the 2G FC link is :
MDS 9506(A)--FC 2G--(ONS15454-------------------------via fiber-----------------------------------ONS15454)---FC 2G---MDS 9222i (B)
The MDS port at the A site is always going to error disabled state and the link is going down all of a sudden.
Action Taken
We tried changing the Client ports and patch chords connected at the MDS switch but this still issue is reflecting.
The issue doesn't occur in sequence and occur at different intervals like in 2 days or may be 6 days etc
What could be issue?Need help in resolving the same
I am new to SAN switch domain,Please help in clarifying the below doubts too
1)What do you mean by trunk mode and what is difference in each mode?
2)Will there be issues in the link if there is a trunk mode difference on both side config?
3)Will the trunk mode difference bring the port to error disable state?
4)From the below configuration of both end switch ports ,is there a trunk mode configuration difference?
Configuration at both side switches are provided below:
A end config:
fc1/1 is down (Error disabled - bit error rate too high)
Hardware is Fibre Channel, SFP is short wave laser w/o OFC (SN)
Port WWN is 20:01:00:05:73:be:d1:c0
Admin port mode is auto, trunk mode is on
snmp link state traps are enabled
Port vsan is 1
Receive data field Size is 2112
Beacon is turned off
5 minutes input rate 0 bits/sec,0 bytes/sec, 0 frames/sec
5 minutes output rate 0 bits/sec,0 bytes/sec, 0 frames/sec
76160488214 frames input,123189668377920 bytes
0 discards,373 errors
72 invalid CRC/FCS,0 unknown class
0 too long,1 too short
18703422421 frames output,1131886741588 bytes
29 discards,0 errors
309 input OLS,418 LRR,12971 NOS,1323 loop inits
1972 output OLS,1348 LRR, 12720 NOS, 1840 loop inits
Interface last changed at Wed Jun 17 07:33:25 2015
B end configuraiton
fc2/1 is down (Link failure or not-connected)
Hardware is Fibre Channel, SFP is short wave laser w/o OFC (SN)
Port WWN is 20:41:00:05:73:c8:5c:00
Admin port mode is auto, trunk mode is auto
snmp link state traps are enabled
Port vsan is 1
Receive data field Size is 2112
Beacon is turned on
5 minutes input rate 0 bits/sec,0 bytes/sec, 0 frames/sec
5 minutes output rate 0 bits/sec,0 bytes/sec, 0 frames/sec
11797765811 frames input,837737713296 bytes
0 discards,5 errors
5 invalid CRC/FCS,0 unknown class
0 too long,0 too short
41473764404 frames output,64343622174936 bytes
503 discards,0 errors
59 input OLS,65 LRR,17 NOS,113 loop inits
12064 output OLS,91 LRR, 6071 NOS, 12020 loop inits
Request some one to help me in resolving the issue and addressing the queries.Thanks in advance
06-18-2015 12:02 AM
Hi
fc1/1 is down (Error disabled - bit error rate too high)
clearly indicates that something is wrong on the link between the 2 MDS; I'm 100% convinced that it has nothing to do with the MDS.
It's a problem with the ONS 15454 and/or the dark fibre in between. Due to the fact, that it's happening randomly, I would rather believe it's ONS related.
Check the ONS interfaces and transceivers
Measure the bit error rate on the dark fibre link (what is the distance ? has there been a fibre cut recently, with potential new splices ?
06-18-2015 03:17 AM
Hi Walter,
You may be right, But the media between the ONS is fine without any error at any point of time
the ONS client interface we saw 15 TX LCV error towards the error disabled MDS port.
1)Is 15 LCV errors enough to bring the port to error disabled state?
From what I have read for MDS switch,A bit error rate threshold is detected when 15 error bursts occur in a 5-minute period. By default, the switch disables the interface when the threshold is reached.
2)How do we correlate between LCV in ONS with bit error rate/error bursts in MDS switch?
3) Also has it got anything to do with trunking ,because I see one end is trunk is on & other end is auto in SAN switch?
Thanks ,
George
06-18-2015 03:47 AM
Hi George !
I think this has absolutely nothing to do with the trunk definition; otherwise the error message would look differently; and your link would not working at all.
You could disable this feature, which of course is not solving the root cause, but simply avoiding that the link is going down. see below:
To enable error-disable detection on bit errors, use the errdisable detect cause bit-errors command. To disable this feature, use the no form of the command.
errdisable detect cause bit-errors num-times {flaps number} duration {sec}
no errdisable detect cause bit-errors num-times {flaps number} duration {sec}
06-18-2015 10:41 PM
Hi Walter ,
Thanks for the info shared.
1.May I know the command to check the present status of the error disable detect state?
I have found one more command :no switchport ignore bit-errors".
Which would be better choice to implement and analyse the issue without link fluctuation?
2.Also are there chances that the LCV errors can be transferred from one side TX to other side RX via the existing setup in ONS?
I find there are RX line code violations on the other side when there are TX line code on one side and the MDS port which is connected where TX line code violation is observed with error disabled mode.
Thanks,
George
06-19-2015 01:22 AM
Hi George
Transmission errors can of course have different results, be it CRC, LCV,.......
Root cause is a physical OSI Layer 1 issue.
Switchport ignore bit-errors is a workaround, not fixing the basic problem.
It will result in upper layers doing recovery, meaning timeout and retransmission; this could of course have a negative impact on the performance, if it happens often.
Walter.
06-19-2015 01:59 AM
Hi Walter,
Can you let me know if errors from one side of the MDS switch propagate to the other side of the MDS switch over the current transmission setup?
ie one side Tx error will be other end Rx error
Mean while, we have replaced the patch cables connected at the both side ports and also the fiber link between two locations are fine and free of errors.but still issue is persisting.
Thanks,
George
06-19-2015 05:18 AM
Hi George
What do you mean with ...are fine and free of errors ......
Did you really meaure this with a Biterror (BER) measurement device ! and of course not a few minutes, but one or more days.
You also have to measure this including the ONS !
It is a well known situation, that DWDM systems are causing problems, not being transparent ! the counters you show in the original post look
very sick indeed.
Btw. which speed are you running your FC ports, 2Gbps ? could you try, at lower speed ?
What is the distance of the dark fibre ?
Do you have other applications, e.g. Ethernet / IP running DWDM ? and if yes, do you see any errors, retransmissions,.....
Walter.
06-19-2015 06:28 AM
Hi Walter,
There are no errors in the PM of the fiber path OTN PM and the GFP frames (as FC is transported over OTN)PM.
We are running at 2Gbps. This fiber has only this link and between this equipments as drop.
Can you please answer this unanswered query in the earlier response
Can you let me know if errors from one side of the MDS switch propagate to the other side of the MDS switch over the current transmission setup?
ie one side Tx error will be other end Rx error
Thanks,
George
06-19-2015 08:43 AM
George
Please do a "clear counters interface all"
I would also be very interested in the log file of the MDS: show logging log
The following document gives an overview of the FC link initialization, and e.g. NOS error indication which you have thousands !
Yes, a error could propagate from one side to the other !
Try to fix the ports to 2G, and TE, instead of auto !
http://www.cisco.com/c/en/us/td/docs/storage/san_switches/mds9000/sw/rel_3_x/troubleshooting/guide/trblgd/ts_port.pdf
NOS received
A NOS received condition is detected. If the other end is an MDS port, then
the NOS is transmitted by the other end in one of the following conditions:
• A signal loss or sync loss condition is detected.
• The port is administratively shut down.
• The port is operationally down.
Cheers
Walter.
06-21-2015 11:11 PM
Hi Walter,
Thanks for those detailed information. Last day we had the other end port going to error disabled mode.On checking the performance at the ONS line port in OTN & GFP PM, there were no errors reported as usual.But there were TX & RX LCV errors reported at both side client port.
This shows that both sides are showing issues & client port are moving to error disable state on the MDS switches at different scenarios.Kindly help me to narrow down and arrest the issue.
Another additional information regarding link:we have enabled squelching on the ONS port.
The MDS switches are managed by our customer and we have asked them to hard code the port and trunk config.They are asking why should they do that?
In auto mode also it should work! Do you have any good justification to the same?
Thanks,
George
06-21-2015 11:53 PM
Hi George
I didn't use ONS for many years now ! But I know, that ususally problems occur, if the ONS system is not transparent regarding all the network management (link init,....) traffic of the MDS.
Can you disable quelching on the ONS port and check again.
Walter.
06-22-2015 03:24 AM
Hi Walter ,
In this scenario squelching doesn't seem to be an issue.
Reason:
Squelch is triggered only on signal loss on client port at 15454 at other side,which is a general scenario on squelching principle.
This signal loss is occurring due to error disable state on the MDS switch ,which I am not having any clue.As I have conveyed the PM on the trunk & the GFP are clean.
2)Why did u ask to hard code the port & trunk configuration at the MDS switch?
May I have a reason for the same? Did you come across any issue related to this?
Thanks,
George
06-22-2015 09:58 AM
Hi George
Reason for hardcode: it could happen, that link initialization (auto) has problems; however, in your case, the link comes up properly, and fails after some time ? correct ?
Q. please erase all counters on the MDS on both sides; and then monitor the system.
Q. Could it be, that for certain times, there is heavy traffic over the link; e.g. synch, or backup ?
I would like to see the MDS logfile entries in the case of flipping link.
I'm still quite convinced, that this problem is not MDS related.
Walter.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide