Re: Weird behaviour with new 9400 series switches

IntegraXP · ‎10-13-2020

I'm hoping someone can shine a light on a recent experience which I still don't understand.

I recently had to migrate a client from a pair of old catalyst 5500 switches to a new stack of 9300-48S. I had a very weird issue with virtual servers on a Proxmox virtualisation server the client has which I cannot explain.

The client also has a couple of 2500 series WLC controllers (running 7.0.116.0 and 7.6.120.0). This may be important. Each WLC has a few APs connected. The WLCs are connected to access switches via single ethernet ports. The access switches are connected to the new core switch, the 9300, via two fibres in a portchannel config (LACP). The WLCs were not touched during this process. There is no roaming of devices between the antennas attached to the two WLCs.

I migrated by first setting up and connecting the new stack to one of the 5500s. There were a few days between unboxing/racking-up/setting up the initial config and starting the migration of fibres. Nothing strange happened in this period. I migrated over all the access switch fibre connections without issue.

My problem came when I moved an ethernet cable connecting the proxmox server from an access switch to the 9300 switch. Several of the VMs connected to the default Vlan went offline. No ping, in or out of the vm, no network access. Only the ones on the default vlan were affected. Proxmox did not register anything. The virtual servers themselves did not register anything in their event logs.

While this was occurring the console of the 9300 would occasionally register an alert saying that a particular Mac address was flapping on port AP0/1. I had checked out the mac address and, as far as I remember, it was not associated with any manufacturer so I didn't read too much into it. I hadn't configured anything on the 'AP' interface.

Turns out I should have paid more attention. It definitely didn't correspond to either the Proxmox host servers, nor the VMs running on the servers.

After several hours of tearing my hair out, all the while on the phone to a colleague who was more familiar with Proxmox, I decided to stop the **bleep** alerts but shutting down the AP0/1 interface.

Immediately upon shutting down the interface, the VMs all came back online.

Spanning tree cannot explain this as the vm hosts have a single cable connecting them to the switch. No spanning tree alerts were generated on the console of the 9300. The problem only occurred once the vm host was disconnected from the access switch and reconnected to the 9300. Moving the eth cable back to the access switch as it had been before the problems did not result in the VMs coming back online. Only the default vlan was affected.

So, my question is, what on earth happened here? What I saw made no sense whatsoever. A non-configured virtual interface reported flapping of a mac address which didn't exist and that provoked some sort of behaviour on the switch to block the mac addresses of vm hosts, but without generating alert messages regarding the blocking....

Georg Pauwen · ‎10-13-2020

Hello,

it is a bit hard to figure out from your post what your topology looks like, and what is connected to what. Can you post a schematic drawing ?

The below piece of information seems crucial:

-->

I decided to stop the **bleep** alerts but shutting down the AP0/1 interface.

Immediately upon shutting down the interface, the VMs all came back online.

What does the configuration of that interface look like ?

IntegraXP · ‎10-13-2020

Topology is fairly simple. There were two 5500 core switches, now removed and replaced by the single 9300 stack. There are quite a few access switches, but they're all connected to the 9300 via two fibres each, always in LACP mode. In all cases fibres go to different units of the stack for redundancy. i.e, its a star network.

The two WLCs are connected to access switches.

The only loops in the network are the internal one in the 9300 stack and each of the dual fibre connections to each switch.

The stack consists of 3 x 48sfp + 1 x 24 rj45, connected in a loop using normal stacking cables.

It hadn't occurred to me to post the configuration of the AP interface as it was blank, i.e factory default config. All I added was 'shutdown'.

One important thing I forgot to mention was that show mac address-table never showed the mac indicated in the flapping alert. i.e, it wasn't showing up on any other interfaces.

pieterh · ‎10-14-2020

the AP0/1 interface on the new series switches is an environment where application-services can run.
Basically this is a virtualization environment! where VMs like a vWLC can run
I know VMware uses internally dynamically created and assigned MAC adresses for it's VM's,

I think here lies the cause of this behavior, some internal and external VM's or vswitch use same MAC addresses