Input Queue drops on stretched vlan on a Cisco 6509

rickysahni · ‎09-13-2013

Hello everyone,

for the past few weeks we have started getting problems where backups from all our App servers in our Data Centre are running very slow, some are still running until the next morning during working hourse. we have been 'around the houses' and trying to solve it but cannot. Here is a background and what the issues are:

we have a NAS box directly connected to our Cisco 6509 VSS distribution switch. All our accesss switches are also connetced to this 6509 distr switch. All our app servers backup to the NAS box and then we have a Net BAckup server (also connected to the 6509 VSS switch) which then backs the NAS server up. The NAS and Net Backup talk over a layer 3 private network (172.16.xx.xx). All app servers backup to the NAS share over vlan 13 (a CIF share stretched vlan, stretched across our second data centre). The stretched vlan13 occurs over a 4gb bundled EoMPLS connection.

The Apps team have noticed that transmit speeds for backup transfers are very slow. What i have began to notice that:

a) i am getting input queue drops on vlan 13

b) i was getting output drops on the interfaces that make the 4gb EoMPLS connection (however after clearing counters this has not been occuring).

I am pretty sure we have ip cef enabled as standard globally on the switch (how can i check?). I was just wondering whether i could get some help as to why all of a sudden i am getting input queue drops and how this could possibly affect the issues with backup transfer speeds.

1. output of vlan 13:

Vlan13 is up, line protocol is up

Hardware is EtherSVI, address is 0008.e3ff.fc50 (bia 0008.e3ff.fc50)

Description: CIFS-SHARE STRETCHED

Internet address is 10.206.13.2/24

MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,

reliability 255/255, txload 10/255, rxload 17/255

Encapsulation ARPA, loopback not set

Keepalive not supported

ARP type: ARPA, ARP Timeout 04:00:00

Last input 00:00:00, output 00:00:00, output hang never

Last clearing of "show interface" counters 1d00h

Input queue: 0/75/161/161 (size/max/drops/flushes); Total output drops: 0

Queueing strategy: fifo

Output queue: 0/40 (size/max)

5 minute input rate 67294000 bits/sec, 9992 packets/sec

5 minute output rate 42858000 bits/sec, 8751 packets/sec

L2 Switched: ucast: 1530009283 pkt, 1226839999505 bytes - mcast: 14326467 pkt, 2257397576 bytes

L3 in Switched: ucast: 961668671 pkt, 855561546932 bytes - mcast: 0 pkt, 0 bytes mcast

L3 out Switched: ucast: 840604143 pkt, 789765062088 bytes mcast: 0 pkt, 0 bytes

962289231 packets input, 856374223843 bytes, 0 no buffer

Received 658153 broadcasts (0 IP multicasts)

0 runts, 0 giants, 0 throttles

0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

840734518 packets output, 789788694959 bytes, 0 underruns

0 output errors, 0 interface resets

0 output buffer failures, 0 output buffers swapped out

2. Layer 3 config of Vlan 13:

interface Vlan13

description CIFS-SHARE STRETCHED

ip address 10.xx.13.2 255.255.255.0 secondary

ip address 10.xx.13.2 255.255.255.0

no ip redirects

ip flow ingress

ip flow egress

standby version 2

standby 13 ip 10.xx.13.1

standby 25 ip 10.xx.13.1

3. Output from the NAS box interface on the 6509:

TenGigabitEthernet1/7/3 is up, line protocol is up (connected)

Hardware is C6k 10000Mb 802.3, address is 8843.e1d1.3602 (bia 8843.e1d1.3602)

Description: Channel to OASSES-NAS001 DM3

MTU 9216 bytes, BW 10000000 Kbit, DLY 10 usec,

reliability 255/255, txload 1/255, rxload 1/255

Encapsulation ARPA, loopback not set

Keepalive set (10 sec)

Full-duplex, 10Gb/s, media type is 10Gbase-SR

input flow-control is on, output flow-control is off

ARP type: ARPA, ARP Timeout 04:00:00

Last input never, output 00:00:03, output hang never

Last clearing of "show interface" counters 1w1d

Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 0

Queueing strategy: fifo

Output queue: 0/40 (size/max)

5 minute input rate 57748000 bits/sec, 4888 packets/sec

5 minute output rate 18809000 bits/sec, 7138 packets/sec

13895604647 packets input, 31280724820311 bytes, 0 no buffer

Received 25639 broadcasts (25637 multicasts)

0 runts, 0 giants, 0 throttles

0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

0 watchdog, 0 multicast, 0 pause input

0 input packets with dribble condition detected

10206307458 packets output, 8224789073771 bytes, 0 underruns

0 output errors, 0 collisions, 0 interface resets

0 babbles, 0 late collision, 0 deferred

0 lost carrier, 0 no carrier, 0 PAUSE output

4. Output from one of the EoMPLS interfaces:

GigabitEthernet1/6/1 is up, line protocol is up (connected)

Hardware is C6k 1000Mb 802.3, address is 0021.a07f.3b14 (bia 0021.a07f.3b14)

Description: EoMPLS DCI

MTU 9216 bytes, BW 1000000 Kbit, DLY 10 usec,

reliability 255/255, txload 3/255, rxload 3/255

Encapsulation ARPA, loopback not set

Keepalive set (10 sec)

Full-duplex, 1000Mb/s, media type is SX

input flow-control is off, output flow-control is off

Clock mode is auto

ARP type: ARPA, ARP Timeout 04:00:00

Last input 00:00:42, output 00:00:30, output hang never

Last clearing of "show interface" counters 1d17h

Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 0

Queueing strategy: fifo

Output queue: 0/40 (size/max)

5 minute input rate 13543000 bits/sec, 4079 packets/sec

5 minute output rate 11843000 bits/sec, 2224 packets/sec

645709185 packets input, 613639329616 bytes, 0 no buffer

Received 8861962 broadcasts (8848456 multicasts)

0 runts, 0 giants, 0 throttles

0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

0 watchdog, 0 multicast, 0 pause input

0 input packets with dribble condition detected

118241941 packets output, 75156978761 bytes, 0 underruns

0 output errors, 0 collisions, 0 interface resets

0 babbles, 0 late collision, 0 deferred

0 lost carrier, 0 no carrier, 0 PAUSE output

0 output buffer failures, 0 output buffers swapped out

6. Line card details:

Mod Ports Card Type Model Serial No.

--- ----- -------------------------------------- ------------------ -----------

1 8 CEF720 8 port 10GE with DFC WS-X6708-10GE SAL1538QNUU

2 8 CEF720 8 port 10GE with DFC WS-X6708-10GE SAL1538QLUH

3 8 CEF720 8 port 10GE with DFC WS-X6708-10GE SAL1538QNUW

4 8 CEF720 8 port 10GE with DFC WS-X6708-10GE SAL1538QNV6

5 5 Supervisor Engine 720 10GE (Active) VS-S720-10G SAL1538QTEX

6 24 CEF720 24 port 1000mb SFP WS-X6724-SFP SAL1249CA5T

7 8 CEF720 8 port 10GE with DFC WS-X6708-10GE SAL1421J6LB

Mod MAC addresses Hw Fw Sw Status

--- ---------------------------------- ------ ------------ ------------ -------

1 0007.7d38.7ef8 to 0007.7d38.7eff 2.3 12.2(18r)S1 12.2(33)SXI8 Ok

2 0007.7d38.7ea8 to 0007.7d38.7eaf 2.3 12.2(18r)S1 12.2(33)SXI8 Ok

3 0007.7d38.7cf0 to 0007.7d38.7cf7 2.3 12.2(18r)S1 12.2(33)SXI8 Ok

4 0007.7d38.7d18 to 0007.7d38.7d1f 2.3 12.2(18r)S1 12.2(33)SXI8 Ok

5 8843.e18a.8304 to 8843.e18a.830b 4.0 8.5(4) 12.2(33)SXI8 Ok

6 0021.a07f.3b14 to 0021.a07f.3b2b 3.3 12.2(18r)S1 12.2(33)SXI8 Ok

7 8843.e1d1.3600 to 8843.e1d1.3607 2.1 12.2(18r)S1 12.2(33)SXI8 Ok

Mod Sub-Module Model Serial Hw Status

---- --------------------------- ------------------ ----------- ------- -------

1 Distributed Forwarding Card WS-F6700-DFC3C SAL1538QNQ1 1.4 Ok

2 Distributed Forwarding Card WS-F6700-DFC3C SAL1538QNRP 1.4 Ok

3 Distributed Forwarding Card WS-F6700-DFC3C SAL1538QJKW 1.4 Ok

4 Distributed Forwarding Card WS-F6700-DFC3C SAL1538QNJD 1.4 Ok

5 Policy Feature Card 3 VS-F6K-PFC3C SAL1538QQSD 1.1 Ok

5 MSFC3 Daughterboard VS-F6K-MSFC3 SAL1538QTX8 5.1 Ok

6 Centralized Forwarding Card WS-F6700-CFC SAL1249BWW1 4.1 Ok

7 Distributed Forwarding Card WS-F6700-DFC3C SAL1423JV73 1.4 Ok

Mod Online Diag Status

---- -------------------

1 Pass

2 Pass

3 Pass

4 Pass

5 Pass

6 Pass

7 Pass

Line card 6 has the connections for the EoMPLS, Line card 7 is where the NAS and NetBackup servers are attached to.

many thanks

Joseph W. Doherty · ‎09-13-2013

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Yes, your one interface shows input queue drops, but 161 out of 962,289,231 packets input. Normally such a miniscule drop percentage would not impact performance.

What I would investigate, whether there's some issue with the EoMPLS. Be prepared for the service provider saying there's nothing wrong even if there is. If there is a service provider issue, you normally have to prove it to them. (Best way to verify service provider link, have a traffic generator that can transmit at the contracted maximum bandwidth and then note if other side is getting all the traffic and at the latency expected.)

rickysahni · ‎09-13-2013

hi there, thanks for your response, i did think that too that the few drop outs would not cause such poor performance.

Luckily i have a good relationship with our WAN provider so will talk with them. Will see what they come back with and update.

Joseph W. Doherty · ‎09-13-2013

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

It's wonderful you have good relationship with your service provider but in their experience most problems, that are not already clearly theirs, are customer related. They don't want to spend time investigating a performance issue when they don't believe from the get go they're not causing it.

A wonderful example I once spent 3 months telling one of our WAN providers one of its circuits didn't seem to allow for full capacity, but just a little less than what I believe we should be obtaining, but it was such a small difference, it was even difficult to document. To make a long story short, with my constant prodding, they did find the problem. It was out-of-date (and buggy) firmware on one of their line cards. Under load, that line card would drop a few more packets than it should. They updated the card's firmware, problem resolved.

The moral of this story, you often need to be able to demonstate or document performance across their WAN cloud isn't right (if in fact it isn't).

rickysahni · ‎09-23-2013

They are currently looking into any errors on their kit in between the layer 2 stretched vlan. We have been doing some replication tests this morning over the stretched vlan, when the server guys change the tcp load window size down to say 5 minutes, the trasnfer rate suddenly increases to 3gb/sec, but then starts dropping down to near 300mb/sec. Read speed is around 900k/sec and write speed is arounf 2mb/sec.

I dont understand the reason for the dramatic drop in transfer rate. replication traffic is over a separate vlan (14) and i can see input queue drops on there too. (although very minimal).