Ask the Expert: LAN Switching - Page 5

ciscomoderator · ‎03-09-2012

With Matt Blanshard

Welcome to the Cisco Support Community Ask the Expert conversation. This is an opportunity to ask your toughest layer 2 questions to two of the technical leaders of the San Jose LAN Switching team, Matt Blanshard. Learn more about Spanning Tree, VTP, Trunking, Resilient Ethernet Protocol, IGMP Snooping, Private VLANS, Q-in-Q Tunneling, QoS, various switching platforms including all desktop switches, Metro Ethernet switches, 4500 and 6500 switches, Blade Center switches, and Nexus 7000 switches.

Matt Blanshard began his Cisco career as an intern in 2007. He is now a technical leader at the Cisco Technical Assistance Center on the LAN Switching team. He holds a bachelor's degree from the University of Phoenix in computer science, and has CCNA certification.

Remember to use the rating system to let Matt know if you have received an adequate response.

Matt might not be able to answer each question due to the volume expected during this event. Remember that you can continue the conversation on the discussion forum shortly after the event. This event lasts through March 23rd, 2012. Visit this forum often to view responses to your questions and the questions of other community members.

johnnylingo · ‎03-21-2012

My last question is a common issue, but I've yet to see a good solution so I'll ask.

We have some high utilization Linux servers connected to WS-X6748-GE-TX cards at 1 GB. These servers run jobs that talk to a 10GB NAS filers on a different 6509-E, but same vlan. The two switches are connected via 10GB trunks. The access switch side uses WS-X6708-10GE and distribution uses WS-X6716-10GE. All switches have QoS globally enabled.

At times when the traffic bursts, we see "Total output drops" and "OutDiscards" increase on the 1GB access ports to the servers. Despite being TCP based, the application sees the dropped packets and reports it as a broken stream (or something along those lines).

We know it's not oversubscription as the WS-X6748-GE-TX blades are not that densely populated. So the conclusion reached is we're exhausting output buffers on the port. Essentially, the 10GB NAS sends traffic at a higher rate than the 1GB port can process it. The recommend solution is to upgrade the servers to 10GB, which obviously has time, hardware, and cost associated.

Is there any way to work-around this via QoS? Since both the source and destinations are on the same VLAN, it would have to be layer 2 and I've never configured that type of rate limiting.

Matthew Blanshard · ‎03-21-2012

Hello Johnny,

This is actually turning into quite a common problem as the speed of network equipment is increasing. You are spot on when with your statement that the 10GB NAS is sending traffic at a faster rate than the gig port can handle it. Given the packet rate that is needed to be able to sustain wire rate 10GB speed versus 1GB speed the NAS is very much capable of overwhelming the buffers on the gigabit module.

You can try and increase buffers on the 6748 port. How customized is your qos model currently on this switch? Can you share the output of show run on one of the gig interfaces, along with show queueing for that interface?

We might be able to tweak the transmit buffers to give that class some more available buffer, but this will be a bandaid and unfortunately the only real fix is what you already mentioned, which is upgrading to 10GB across the board.

-Matt

johnnylingo · ‎03-22-2012

We currently only have QoS policies applied to WAN links, so at the switch level we have is "mls qos" configured and that's it. At the application level, all traffic has CoS = 0. Here is a typical access switch interface configuration:

mls qos

mls cef error action freeze

!

interface GigabitEthernet1/1

description Linux Server

switchport

switchport access vlan 123

switchport mode access

switchport nonegotiate

spanning-tree portfast

!

As expected as see all the drops in Output Queue #1 on the 1GB port (see attached).

Matthew Blanshard · ‎03-22-2012

I am going to make this recommendation based upon the assumption that you are only using Voice and Data, not marking any additional traffic. My first recommendation would be to raise cos 0 to queue 1 threshold 2. By the following output you can see that at 70% queue utilization we start tail dropping the traffic in the cos 0 queue. Since this is the only traffic assigned to this queue no reason to not give it 100%.

queue tail-drop-thresholds

--------------------------

1 70[1] 100[2] 100[3] 100[4] 100[5] 100[6] 100[7] 100[8]

2 70[1] 100[2] 100[3] 100[4] 100[5] 100[6] 100[7] 100[8]

3 100[1] 100[2] 100[3] 100[4] 100[5] 100[6] 100[7] 100[8]

Next recommendation I would make is to tweak the queue-limits a bit.

Here's your current configuration:

queue-limit ratios: 50[queue 1] 20[queue 2] 15[queue 3] 15[Pri Queue]

I would recommend going to something like this:

queue-limit ratios: 70[queue 1] 5[queue 2] 10[queue 3] 15[Pri Queue]

A key thing to remember here is that these changes will propagate to all 12 ports on the same asic, so if you make the changes to g1/1, it will change g1/1-12. So we need to be sure that the servers connected here have similar needs (IE all data).

Here are the exact commands to add to the interface:

wrr-queue cos-map 1 2 0

wrr-queue queue-limit 70 5 10

Without knowing more details on how exactly your network is setup and what you are marking/etc and how much voice you have going over the network it's hard to make a more specific recommendation. Again this will also just be a band-aid, but it might give us just enough buffer to allow it to not drop enough packets to call it a broken stream, or even stop the drops alltogether.

-Matt

johnnylingo · ‎03-22-2012

Ah yes, I remember from the TAC case the command replicates to ports on the ASIC. At the time we felt it was too drastic of a change, and held off until we could consider all options.

The other concern was that increasing the ratio of queue #1 from 50% to 70% was unlikely to be enough to alleviate the problem. That being said, our environment is 100% data, so we have the option to make queue #1 as large as possible without any ill affect.

Is there any way to just make the queues use SRR? So frame with CoS = 0 hits a random queue, rather than queue #1 every time?

Matthew Blanshard · ‎03-23-2012

To be honest since your network is completely data I would disable qos. That will allow the traffic to be queue'd in all queues. On the 6500 (non sup2t) there is no concept of SRR, so the queue's are shaped to their max queue depth.

I would implement the first command for sure though, that will put cos0 into a higher threshold and stop the tail dropping. No downside to that one at all.

-Matt

Ibrahim Jamil · ‎03-22-2012

HI matt

why when y extend the L2 VLAN s for servers from DC-1 to DC-2, y see on the interfaces(L2 Port-channel) connected both DCs,a huge amout of traffic going from dc-1 to dc-2,even though in the dc-2, no servers deployed there but y see in the core dc-2 interfaces huge traffic.pls explain why the reasons is?

thanks

jamil

curriesig · ‎03-23-2012

Hi Matt,

I have the dumbest question of the day.

Currently I am working with an MPLS carrier who supports our internal network. On one of their routers we see constant packet drops, which they say is related to their COS settings.

Their router is a 2821, running on 12.4(25D). They use DSCP for packet shaping. There is one default queue, a management queue of theirs, a AF31 queue where we have an ACL defined, and a tail queue. The AF31 is set to 1 Meg on a 4 Meg line. The management queue is set to 8 K.

Their class-map class-default (match any) has a bandwith statement set to 360 K.

We are seeing packets dropped like crazy, approximately 100.000 per day, without the line being overutilised. Now the telco claims this is because we are overutilising the default class, as we are sending more than 360 K over the line. I think this is a ridiculous statement, as we have a 4 Meg line for a reason.

Now here's the question:

When using DSCP to shape the line, shouldn't this only kick in when we reach the maximum line bandwith?

In other words, shaping should never be a reason packets are dropped, when you are not using the full capacity of the line, and this issue is independant of what class of traffic we are using the line for, right?

Thanks

harleyacha · ‎03-16-2018

Hi Cisco Experts,

I have a problem, I am about to deploy 4 new Cisco 2960-X 24 Ports +POE but my 2 switch are encountering intermittent connection due to uplink / trunk ports facing core switch are experiencing auto-reset. I hope somebody can help me / enlighten me why is it only the 2 switches are experiencing auto-reset on trunk port(s) while the other 2 are okay, knowing that they all have the same configurations and using the same Lan connection to Core switch.

Best Regards,

Harley Acha

Garrison Botts · ‎06-08-2018

I have two DataCenters and an ISP has configured and EPL between them. I'm having issues because VLAN 1 works but none of my others do. STP doesn't seem to be passing either as BOTH sides of the link are showing as root. The ISP is saying they aren't blocking anything but if only the mgmt vlan is working, what else can I check?

On my end, I have a simple config on each switch:

1) Nexus 7700 vdc
int E1/5

switchport

switchport mode trunk

no shut

2) 3650

int g1/1/1

switchport

swithport mode trunk

no shut

I can't think of anything easier to configure as far as being able to troubleshoot
Adva 114 -> ASR 920 -> into their network..

Any troubleshooting steps you can tell me would greatly be appreciated...

It's as if they are blocking STP, CDP, etc... I can't get them to look at it. All they keep saying is, "Looks good on our end"... I see the macs on the other side but when I configure a port on that specific vlan, give my laptop an IP, I can't reach them...
Frustrated...