We have a 9509 at the HQ site and a 9216 in the DR site connected via a DS3.
There are periodic times when the fcip link will flap and come back up, sometimes it will continue for minutes and has happened for several hours with this error:
%PORT-5-IF_DOWN_TCP_MAX_RETRANSMIT: %$VSAN 50%$ Interface fcip1 is down(TCP conn. closed - retransmit failure).
My manager is telling me it is a setting on the FCIP profile and it is impossible the DS3 could be congested because it is a point to point link.
Verizon is also telling me it is a Cisco problem having to do with fragmentation.
Originally the fcip profile was configured for 1000 max bandwidth and it was suggested to change it to something closer to the DS3 and this change was made:
fcip profile 1
ip address 10.1.2.1
tcp max-bandwidth-mbps 41 min-available-bandwidth-mbps 30 round-trip-time-ms 4
tcp send-buffer-size 5000
It made no difference at all. There is connectivity to the DR site during the flapping, I can see the bandwidth spike up, but there should be plenty to handle the load during the flapping.
The time I had this problem the serial interface of the DS3 was showing the tx load 65/255 and rx was 5/255.
No interface errors, but I have seen the ping response times go from 3ms to 25ms during this time, an indication of congestion somewhere.
There are no errors on any of the interfaces for the DS3 or the fcip.
I have tried numerous different profile settings and nothing has made a differnce.
I have opened a TAC case and Cisco could find no reason for the flapping and suggested a carrier issue.
Does anyone have any ideas?
Any suggestion would be appreciated.
This happened due to Interface is down buckeye it reached the maximum allowed retransmission failures.
This may be because of a loss of IP connectivity. Enter the show interface, show ip route CLI commands or use a similar Fabric Manager/Device Manager procedure to provide the IP address of the peer and the route used to connect to this FCIP peer. Enter the traceroute ipaddr CLI command or use a similar Fabric Manager/Device Manager procedure to check the connectivity to the peer. Introduced Cisco MDS SAN-OS Release 1.2(2a).
what version of SAN-OS are you running?
Are you using IP Compression and if so what mode?
what alerts are the peer switch sending when the link flaps?
we ran into a new bug when upgrading to SAN-OS 3.1(3a). if using IP compression set to (auto). 'auto' uses a combination of both software and hardware based compression depending on the type of traffic, load, bandwidth, etc.. the software portion of the ip compression in SAN-OS 3.1(3a) wiggs out dealing with packet buffers. this would cause our fcip profile to bounce randomly anywhere from 10-20 times a day. EMC SAC and Cisco TAC had me enable IPS full core dumps to a tftp server. from there they were able to determine that it was specifically the 'auto' setting in IP compression on the MPS 14/2 blades on the MDS 9216i's. we then set the IP compression to 'mode1', which is all hardware based compression, on all switches and that resolved our issue with the fcip profile flapping.
Not sure about the SAN OS.
Compression is set to "Mode 1".
It has not happened in several months now and nothing (as far as I know) has changed.
There was a rash of incedences and now it has stopped.
What I would see on the HQ side (95% of traffic is sourced from HQ) is:
%PORT-5-IF_DOWN_TCP_MAX_RETRANSMIT: %$VSAN 50%$ Interface fci
p1 is down(TCP conn. closed - retransmit failure)
The DR switch would have:
%PORT-5-IF_DOWN_PEER_CLOSE: %$VSAN 50%$ Interface fcip1 is down
(TCP conn. closed by peer)
I am wondering if it is a carrier issue.
We recently had someone tell us the 3275 router cannot handle a full DS3 with any other services running and this could caused the isse, but I have seen this link peak at around 40Mbps and not have any problems.
Seems I saw somewhere that this could be caused by the frame size, that since FC uses a larger packet, it doesn't always work well when transmitting across an IP link. Resolution was to set the packet size to a smaller number.
I have seen this as well, the recommended solution was to allow jumbo frames (mtu 4500)across the FCIP link.
I have not tried this because the issue has not happened since I was looking into it.
It was happening quite a bit for a while, but not in a few months.
Won't you still end up segmenting the traffic at some hop along the way though? All that assembly and disassembly to get it to the other side sounds like a headache....