08-08-2018 06:49 AM - edited 08-08-2018 08:31 AM
I have a long distance ISL ( 60 miles) between a couple of 9513s with 32x8GB modules.
The port that is in the Production Data Center often runs out of Tx Buffer Credits. The corresponding port in the DR site does not run out of Rx or TX buffers.
Each port has 500 Tx & 500 Rx buffers.
Is this actually causing a delay? If the port runs out of buffers wont it just get more buffers from the Global pool ?
If it is causing a delay what can I do to improve things? As far as I can work out Extended buffer credits only apply to Rx Buffers and not Tx
I have read some of the Interfaces Config Guide and the Slow Drain Device Detection manual but if there are other sources of info please let me know.
08-08-2018 08:25 AM
Hi Frank,
Typically b2b credits do not need to be changed for ISL. The only time changes need to be made to b2b credits for an ISL is if the connections are separated by some miles.
It sounds like what you are dealing with is a slowdrain (congested) device in your fabric. A slowdrain device is basically a host/storage that cannot keep up with data it requesting or being sent so it is slow in returning buffer credits.
buffer credits are basically for flow-control in the fc world.
a device telling the switch it has x amount of credits means the switch can send x number of frames to the device before switch has to wait for host to info it it can send more frames.
When a device takes a long time to tell the switch it now can send another frame congestion ensues because the switch is holding on to frames in its queue longer than desired.
That one device can have a cascading effect on the fabric especially when ISLs are in play.
Here are some steps to help you investigate:
first clear interface counters so counters are current (clear counters interface fcx/y or clear interface counters all)
1) look for discards on interfaces over time. ( show interface |i fc|channel|discard|CRC)
2)look for timeout_drops in OBFL log ( show logging onboard error-stats |i i timeout|credit_loss|tx_wt
if you see current dated timeout drops this means the switch was unable to deliver a frame to the connected device with 500ms(default) so it dropped it to prevent further congestion.
credit loss means switch was unable to deliver frame for 1sec for F port or 1.5 secs for ISL port
Remember that for ingress congestion there is correspond egress port congestion. This means congestion seen on TE/E (ISL) ports is most of the time related to congestion at a F (host/target) port.
Go along each hope in fabric data path until you find a F port that is showing congestion/slowdrain sign.
The problem is rarely with the ISLs. ISL are the victim.
Here is a doc that explains further and provides details:
08-08-2018 08:30 AM
Hi Eappiah
Thanks for your reply. Sorry, I should have mentioned the ISL is a long distance connection , about 60 miles. Does this make a difference ?
I am just about to leave work for the day but will try all your suggestions tomorrow.
08-09-2018 07:01 AM
Hello Frank,
Yes this make difference.
To things to consider
http://blogs.cisco.com/datacenter/storage-distance-by-protocol-fc-fcoe-and-fcip-part-i/
https://blogs.cisco.com/datacenter/storage-distance-by-protocol-part-ii-physical-layer
https://blogs.cisco.com/datacenter/storage-distance-by-protocol-part-iii-fibre-channel
Good Luck,
Regards,
08-09-2018 11:16 AM
Hi Frank,
For long hauls b2b credits come more into play. 60 miles = ~96km. Rule of thumb for b2b for distance is below. AssumING your links are 8g, all you need for b2b between sites is about 384. You have 500 so you should be good when it comes to credits. Do you know if connections are direct or if there is a dwdm/cwdm in middle? If dwdm gear involved, find out if any buffer spoofing is being done on the gear.
More than likely its a slow device that is causing problem.
1G FC: 1 BB credit per 2 kms 2G FC: 1 BB credit per 1 km 4G FC: 2 BB credits per 1 km 8G FC: 4 BB credits per 1 km
08-10-2018 05:41 AM
Thanks for the responses, they have all been very helpful. I am going to reset all counters on a number of switches and check again after the weekend.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide