Strange phenomenon in the campus network

andrepeter · ‎06-02-2020

Hi support community,

we have a very strange phenomenon in our company campus network and we don't know how to deal with it.

Maybe you have any idea?

Our campus network consists of 3 core switches (2 C6880-X-LE in a VSS network) and several access switches (mostly WS-C2960X-48FPD-L).
A new data center in which storages are located has now been connected to one of the access switches.
These storages are attached to a Nexus switch ... this is connected to the access switch with a 2x1G port channel.
The entire rest of the network is connected by 2x10G port channels.
VLANs are used in the data center that are not used anywhere else on campus.
The "data center" VLANs are routed to the rest of the house network on one of the cores.

Now the phenomenon:
If data is copied to the storages in the data center, this leads to impairments of the entire campus network.
Ping dropouts sometimes occur within the campus.
For example, access points lose connection to the WLC and restart.
As soon as the copying process is stopped or ended, the ping times normalize to less than 1 ms.
Via Prime it can be seen that uplinks / trunks are partially utilized to over 90% which have nothing to do with the copying process.
Even under full load, the 2x1G uplink to the data center should not affect the campus LAN.

What could be the cause here?

The storage systems in the data center use the SVI of the Nexus switch as a gateway since routing should only take place within the data center.
The clients use the Campus Core as a gateway because they should of course get everywhere.
Can it be problematic if the server and client use different gateways, i.e., sent packets may be routed differently than received packets?

pieterh · ‎06-02-2020

>>> Can it be problematic if the server and client use different gateways,<<<
NO, using different gateways is a normal setup! each device needs to use the gateway within it's own VLAN.

>>> A new data center in which storages are located has now been connected to one of the access switches <<<
this is an unusual setup ?
I suggest connecting the nexus-switch directly to the core-switch, not passing through an extra access-switch.

most likely causes will be a routing loop or even a bridging loop

other possibility is the nexus runs a different spanning-tree mode than the connected access-switch.

pleas specify more detail (diagram, configs)

andrepeter · ‎06-02-2020

Hi... thanks for the reply.

If I have a litte bit time I will generate a diagram with all important devices.

That every device needs a gateway in it's VLAN is clear... but if there are 2 or 3 routers and gateways in one VLAN?

And the client is using the one gateway and the server the other... that shouldn't neither be a problem?

Sure connecting the nexus directly to one of the cores would be better... but that would be too easy ;-)

A few devices in the datacenter are supported by our "global" IT department... they are connected directly to the access-switch from us.

The nexus-switch is supported by a "divisional" IT department and the trunk between our access-switch and the nexus is limited to a few VLANs they are using within the datacenter.

The switches (including the nexus) are configured with RSTP.

pieterh · ‎06-03-2020

>>>
That every device needs a gateway in it's VLAN is clear... but if there are 2 or 3 routers and gateways in one VLAN?
And the client is using the one gateway and the server the other... that shouldn't neither be a problem?
<<<

No it should not be a problem. the client and server in the same vlan should be able to communicate directly without using the gateway. But it is a curious setup,
You do not use redundancy routing protocol like HSRP or VRRP ? then server and client use the same virtual ip-address as gateway.

>>>
The nexus-switch is supported by a "divisional" IT department and the trunk between our access-switch and the nexus is limited to a few VLANs they are using within the datacenter.
<<<

and you are sure this nexus is not connected to the campus core ?

maybe you created a loop between the nexus-vlans and other acces-switch vlans.

Richard Burts · ‎06-02-2020

I am interested in the observation that "uplinks / trunks are partially utilized to over 90%". Is this only when the copy is active? Or does it occur at other times?

I wonder if there is something about the copy traffic that needs to use the cpu and software forwarding rather than using hardware forwarding?

HTH

Rick

andrepeter · ‎06-02-2020

Hi...

that is exactly the strange thing here... it only appears while the copy is active.

If you stop the copy process the whole network is stabilizing within a few seconds.

And now the strangest thing... If we copy to a specific storage system the error occurs. If we copy to another storage system which is located in the datacenter too it won't occur.

Could it really be that a server sends out some kind of broadcasts or anything like that while a data transfer is active which affects a whole campus-network with 3 core-switches and over 100 access-switches?

Leo Laohoo · ‎06-02-2020

Look at the CPU of all your core switches during the time of the issue.
Look at the links of your core switches for any "Total Output Drops" in very large quantities.

paul driver · ‎06-03-2020

Hello

@andrepeter wrote:

Hi support community,

Our campus network consists of 3 core switches (2 C6880-X-LE in a VSS network) and several access switches (mostly WS-C2960X-48FPD-L).
A new data center in which storages are located has now been connected to one of the access switches.
These storages are attached to a Nexus switch ... this is connected to the access switch with a 2x1G port channel.
The entire rest of the network is connected by 2x10G port channels.
VLANs are used in the data center that are not used anywhere else on campus.
The "data center" VLANs are routed to the rest of the house network on one of the cores.

Can you post a topology of this network and maybe attach a file of the running configuration of the vss core

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul