cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1896
Views
8
Helpful
33
Replies

Cisco 9200 packet loss when using 10G SFP

carl_townshend
Spotlight
Spotlight

Hi All

We are having some issues wherby when we plug in a 10G SFP int our 9200L-48P-4X switch we get severe performance loss, if we put in a 1G SFP the connection is fine.

We have tried multiple SFPs, different ports, different fibers, software upgrade etc and we will get lots of packet loss, this is backed up with some wireshark testing.

Software is on, 17.09.05 Cupertino

Any known issues here?

33 Replies 33

Hi All

So, we have fixed the issue, but I am not happy with the fix and what to know why its happening and what we can do.

Basically, we run auto qos voip cisco-phone on the user ports running at 1G, we then have auto qos voip trust on the trunk port that goes to the distribution switch and core etc.

I have removed the policy maps from the edge port and uplink

edge port
no service-policy input AutoQos-4.0-CiscoPhone-Input-Policy
no service-policy output AutoQos-4.0-Output-Policy

uplink

no service-policy input AutoQos-4.0-Trust-Cos-Input-Policy
no service-policy output AutoQos-4.0-Output-Policy

You can see the drops in Q1 drop Threshold 2 below

carl_townshend_0-1739976159656.png

The problem as gone away no we removed the policy, whats the cause and whats the best course of action here?

 

 

 

@carl_townshend if would be helpful if you would clarify the topology.  Those stats appear to be on a gig interface, so this is a host connection?  If, so, performance issue only this particular host?

This is the first time you noticed and/or checked interface stats?

As to what you can do about it, would be to tweak your QoS configuration to make it more suitable for your QoS needs.  (One of the problems with AutoQoS, it assumes its policy works for all cases; which is not the case.  However, I recall [???] some of the later AutoQoS implementations support an auto tuning option.  If true, on your platform, possibly it could resolve this avoiding manual tuning.)

Hi Joseph

We have the 9200L, user is connected to port 48 on 1G, this is then uplinked to a 9500 distribution switch on a 10G port, this then goes to a 9600 on a 10G port, then to a server switch on 10G port, then the server on a 10G port, this is the same for the internet to, the core goes to the firewalls on 10G and internet is 10G, so its always 10G to the switches, then 1G to user.

Its the first time we have had a good look at the interface stats, the fact its fine with QOS off means it has to be QOS policy or buffers etc.

I can see on the >show policy-map interface command there are drops showing in the class-default class too.

Basically I need to allow for more traffic in the default class

Its currently like this

Class-map: class-default (match-any)
0 packets
Match: any
Queueing

(total drops) 893011474
(bytes output) 4869745327
bandwidth remaining 25%
queue-buffers ratio 25

Is this a buffer issue? as nothing else would be coming in on that interface in any other class other than the class default.

How would we set up this auto tuning option or what manual settings would we tweak? 

If not already configured try: 

qos queue-softmax-multiplier 1200

If that does the trick, probably don't need to do anything else.

If it doesn't, then we get into custom policies (which might only be need just for that one edge port).

As to the auto tuning option I vaguely recalled, I've done some quick research on AutoQoS, and didn't find such a feature.  Possibly it might have been a new feature related to NBAR.  I'll have to do some more digging.

Hi Joseph

ill give this a try, cheers for that.

Alternatively, we don’t use ip phones anymore, we use Webex soft client. Maybe it’s worth switching auto qos off on the edge ports? Or is it recommended we leave it on ?


@carl_townshend wrote:

Alternatively, we don’t use ip phones anymore, we use Webex soft client. Maybe it’s worth switching auto qos off on the edge ports? Or is it recommended we leave it on ?


Since the introduction of AutoQoS, I've never used it, nor much studied it beyond noting the reasons why I don't want to use it, which basically boils down to it doesn't do what I want QoS to do.  So, I'm unable to recommend for it or against it.

That said, I believe on the later Cisco switches, like the 9K series, QoS is always active, just the default policy is different (simpler too, I believe) from the AutoQoS policy.  So, your question is really which policy is better for your requirements.  Again, cannot say, or whether either is optimal for your requirements (probably neither).

Further, though, on LAN ports, with typical LAN buffer latencies, most of the time, even for something like VoIP, all traffic kinds can usually work well without any QoS.  However, the keyword is "usually".  QoS provides predictable performance when and if there's congestion, severe enough to be problematic.

Personally, if possible, I do advise QoS everywhere, to provide service guarantees.  For something like a soft client, that's providing real-time voice or video, such QoS guarantees are "nice" insurance.  (Of course, if you don't have QoS to guarantee voice quality, at least when it breaks, possible no one will be able to call you to complain; laugh.)

If you create your own QoS model, it can be very simple, if that meets your needs.  For example, initially perhaps just two classes, one for real-time traffic (like VoIP) and one for everything else.

Personally, I find a simple four class QoS model handles most QoS situations pretty well, especially on Cisco platforms that support  class FQ.

Since you're already using AutoQoS, try the softmax configuration first.  If if works, you can decide whether you want to determine whether device QoS defaults is adequate too, or whether you want to be able to customize your QoS.

If the configuration doesn't work to "cure" AutoQoS, but default QoS does (with or without that command), your choice, again, how you may want to understand QoS going forward.

If you still need to "cure" your issue, or you want to pursue "better" QoS, that's a option too.

https://www.cisco.com/c/en/us/support/docs/switches/catalyst-9300-switch/216236-troubleshoot-output-drops-on-catalyst-90.html#toc-hId--1961081611https://www.cisco.com/c/en/us/support/docs/switches/catalyst-9300-switch/216236-troubleshoot-output-drops-on-catalyst-90.html#toc-hId--1961081611

Sisira

@siskum , could you further expand upon the specific relevance of the document snippet you posted, either as a reply to my reply or, in general, this late in the discussion?

Joseph W. Doherty
Hall of Fame
Hall of Fame

Oh, if it was the 9200L<>9500 link upgraded, and wondering why, perhaps, an edge egress issue is now happening, but did not happen on the 9500 when it was gig, the most likely reason, the 9500 probably has more buffer resources than a 9200L.  There are other possibly factors too, not mutually exclusive, but platform differences are probably the most significant.

Again, there are some counter intuitive cases where increasing bandwidth causes issues.

Also again, if this is just an edge port buffer issue, good chance some resource tuning will mitigate or even eliminate the issue.

carl_townshend
Spotlight
Spotlight

Hi All

We have tried the softmax multiplier command and it seems to have done the trick.

This has now lead me to more questions.

1.Why isn't the setting (1200) like this by default? most people have high speed uplinks with slower edge ports

2.If QOS isnt enabled on the port and the port is FIFO, does the softmax command still have an effect? I assume it does?

3.If a 1Gig port is maxing out and using most the buffer space, what will happen if I then do the same on another port? will the 2nd port suffer more, i.e is it a first come first served basis?

I see there is an article from Cisco live saying this is a "famous" command, if that famous then why isnt it in any of the Cisco course, I have done the 9K one, Encor and many others and this has never been bought up before

carl_townshend_0-1740050256809.png

 

Your #3 is yes which also answers the why not default for #1.

For #2, unsure you can fully disable QoS, on many small Catalyst switches since the introduction of the 3650/3850 series.  On the 3560/3750, I don't believe its similar command was active when QoS disabled.

As to why softmax only famous at a Cisco Live, laugh, Cisco Live has to have some "juicy" info, otherwise why have it?

Hello @carl_townshend ,

you are right some courses for certiifications are not tied to real world troubleshooting . As a Cisco partner I can tell you there are other courses that we need to attend called Blackbelt Academy where all this kind of info is reported.

I have to pass general purpose certifications to keep my certs alive, and at the same time I have to study and pass these Blackbelt Academy courses. In addtion for advanced programs Cisco wants to know how we solve issues in multi vendor environments so we had to develop a software integration between our ITSM and Cisco TAC using a middle linux cloud based appliance.

For FTD there is the Firepower program in three levels that again does not count for recertification purposes.

Most of the issues with Cat 9300 are :

the licensing : the need for DNA license make selling difficult when compared to other vendors. Because they do not charge for something most customers do not use ( DNA license is for SD Access onboarding of the devices)

troubleshooting: they are called as a single family but actually Cat 9200 L and Cat9300 X are totally different in terms of buffering and architecture.

number of ports per ASIC

renaming of features: VSS is now SVL, but this generates a lot of questions but like VSS a single pair of devices can be in an SVL paiir

support of third party optics

and so on

I recommend to use ciscolive where you can access to architecture presentations and troubleshooting presentations lead by Cisco TAC engineers. You can use the same CCO account you use to access the forums and the access is free.

Hope to help

Giuseppe

 

carl_townshend
Spotlight
Spotlight

Juicy info, I like it

OK, so where do we go from here? in your opinion what would you do? I see the options as

1.Use softmax multiplier and risk starvation for other clients

2.Change buffer ratios per queue

3.switch off auto qos and create our own QOS policy

4.switch off QOS on the LAN and only use where needed, i.e on WAN links etc


@carl_townshend wrote:

OK, so where do we go from here? in your opinion what would you do? I see the options as

Cannot say what I would do, because I don't have enough information to make a recommendation for you, besides considering your goals, and the options and costs to meet them, or not.

As an analogy: I don't drive the car I would really like to drive.  I drive one that meets many of my goals, to various degrees, and which I can afford.  But, you're asking me, sort of, what car would you recommend you should drive.

Personally, I'm a huge booster of QoS, because it can, in many cases, do so much for perceived network usage quality and less expensively than bandwidth upgrades, especially MAN/WAN bandwidth.

But, QoS isn't a panacea for all situations, and it has it own care and feeding costs, not to mention its startup (learning it) costs.

This is what makes AutoQoS so attractive.  I.e. with just one configuration statement, you obtain a complex (IMO often a too complex) QoS policy,

For #1, overall, I believe the risk is rather low, which is why it so often recommended to try it as the first "fix", but as there's some risk, it's not the default.

For numbers 2 through 4, you need to "marry" the knowledge of what you want your network to accomplish with the knowledge of what QoS can do.