Solved: Well, we can do this test and

Daniel Davidson · ‎05-26-2016

We have cisco 3850 stacks on our access layer. If someone transfers a large amount of data through one, after some time all of that data starts to get broadcast to all ports on the switch, causing all active ports to report that they are running at nearly 100% of 1Gb bandwidth. I have gone and visited one of these stacks and can confirm that ssh traffic that is not destined for one of the access ports is actually going out there.

It is almost as if the stack is acting like a hub and not a switch. Does anyone know what can cause this or have an idea as to what the fix could be?

Dan

Carlos Villagran · ‎05-27-2016

Well, we can do this test and see how it works over the day. Can you statically configure the MAC address of these nodes that are expecting or receiving this high load of traffic in the switch?

switch(config)#mac address-table static aaaa.aaaa.aaaa vlan x interface giga x/x

This will stop the unknown unicast and you should stop perceiving this slowness/behavior.

Hope it helps, best regards!

JC

View solution in original post

Carlos Villagran · ‎05-26-2016

Hi Daniel!

Can you tell me what version are you currently running in your switches?

Also, I am thinking this could be an issue with unknow unicasts. Is this affect happening as a spike or when it happens it just does not stop until the switch is reloaded?

Best regards!

JC

Daniel Davidson · ‎05-27-2016

sw-2n-3850#show ver
Cisco IOS Software, IOS-XE Software, Catalyst L3 Switch Software (CAT3K_CAA-UNIVERSALK9-M), Version 03.03.03SE RELEASE SOFTWARE (fc2)

When this occurs, it lags the transfer a bit, but then the transfer rate on all ports stays high until it stops.

Carlos Villagran · ‎05-27-2016

Hi!

Usually the switch replicates traffic over all its ports (flooding) when there is unknown unicast (the switch does not how to reach destination MAC address).

There may be some transmission over your switch where mac address gets erased from the MAC address table of the switch, however, the 1GB transmission is just too much.

Firstly I would upgrade to 03.06.04 version since that is the most stable version of this platform.

Do you have some kind of server with high transfer rates connected to that switch?

Hope it helps, best regards!

JC

Daniel Davidson · ‎05-27-2016

Upgrading the firmware is a possibility, but that could take a while to schedule.

There is a large amount of data (multiple TB) that is being transferred through the switch at near 1Gb speeds when this is happening.

Carlos Villagran · ‎05-27-2016

Well, we can do this test and see how it works over the day. Can you statically configure the MAC address of these nodes that are expecting or receiving this high load of traffic in the switch?

switch(config)#mac address-table static aaaa.aaaa.aaaa vlan x interface giga x/x

This will stop the unknown unicast and you should stop perceiving this slowness/behavior.

Hope it helps, best regards!

JC

pwwiddicombe · ‎05-27-2016

You probably have the 2 nodes in different subnets, and they happen to point to different physical L3 switches for default gateway (or active gateway in HSRP). There is a condition where A sends to sw1, packet forwarded in a different vlan, to Sw2 and destination B. B then replies to sw2's gateway, and the translated packet makes it back to A.

(This in an environment with multiple L3 core switches that are trunked).

However, sw2 never actually saw the MAC address of A in this exchange, so doesn't know which port it is on, so floods it. This continues until A happens to send something (broadcast or packet to another server in sw2 that happens to be in the same subnet). So flooding will happen periodically, mysteriously.

One fix is to make sure the same physical gateway is used for all vlans that interact (possibly by adjusting HSRP priorities...); another that is easier but may be less effective is to increase the MAC aging time to, say, >60 minutes; in the hope that all stations will send SOME broadcast periodically.

Daniel Davidson · ‎06-01-2016

Carlos had the fix with the static arp entry. For some reason when there is a large amount of data that goes through the switch for an extended period of time, the arp table gets whiped and this results. We will also do the IOS upgrade once we get our e911 people to approve it.

Philip D'Ath · ‎05-28-2016

I'm with cavillag . I would not run 03.03.03SE full stop. It was too flaky. I had a *lot* of issues with that software release. It had a lot of issues around ARP bugs as well (which leads to flooding when the ARP entry is lost).

I strongly recommend you find a time to upgrade to 03.06.04.E as well.

I was able to mitigate some of the more serious issues I faced by putting the below command on every interface.

nmsp attachment suppress

I also agree with cavillag that a forced static mapping may be your best temporarily solution.

3850 stack with data broadcast to all ports