07-19-2024 10:32 AM
Hi everyone,
I have inherited an existing 2 host setup and have been troubleshooting the network performance since I arrived. I have noticed that not all the VLANs in use by the VMs have been assigned to the LAG group connecting the ESXi hosts to the SG Stacked switches. The question I have is this: is that a problem or not?
I am reading that ESXi vSwitches are VLAN agnostic i.e. they don't handle them internally or tag the packets. i.e. it only matters to the Cisco Switch when routing packets from the switch to the ESXi host. In which case I am confused. The VLAN used by our wired and wireless clients is not assigned to the LAG group that feeds the hosts and yet it seems to work most of the time. It is more a throughput performance issue that I am seeing with very intermittent reports of drops.
I am a little out of my depth here, I know how to set up the LAGs and vSwitches as per the existing ones and copying the current setup works when I added the third host not so long ago. What I don't know is if this is optimal or not. I have found various internal network configuration issues on the virtual servers and inside the ESXi configuration that were causing problems so I am a bit suspicious.
Any pointers to further information or suggestions for network configurations like this gratefully recieved.
07-19-2024 02:54 PM
"the VLANs in use by the VMs"
How have you determined that?
"I have noticed that not all the VLANs in use by the VMs have been assigned to the LAG group connecting the ESXi hosts to the SG Stacked switches. The question I have is this: is that a problem or not?"
It depends on the physical switch, VM applications and ESXi Port Groups. In ESXi, Port Groups are where you can set VLAN IDs. Port Groups are mapped to VM application NICs. If VLAN ID is set to anything from 1 to 4094, the ESXi host will tag outgoing frames with the given VLAN ID. If VLAN ID is set to 0, the ESXi host will send untagged frames so tagging for it needs to be done on the switch. If VLAN ID is set to 4095, it means that VM applications themselves do tagging and the ESXi host will let those tagged frames go through.
https://knowledge.broadcom.com/external/article/311764/vlan-configuration-on-virtual-switches-p.html
07-22-2024 07:01 AM - edited 07-22-2024 07:06 AM
Hi Kris,
Thanks for your reply. Good question about how I have I determined that. I always get confused with the terminolgy of Port Groups. They do not appear on the Configure Menu nor in the Network Menu of my version of ESXi.
So here goes, inside the Host I have vSwitches to which I have assigned VMkernels whose VLAN ID is 0. I have three VMKernels the standard management one and a pair of VMkernels I have used to assign a pair of custom TCP/IP stacks to the vSwitches one for the VMs and the other for our backup solution. This I believe is doing what it should do, segregate traffic between two different LAG so that the backup network does not impact production.
So if I follow what you are saying my internal (ESXi) network does not tag so I need to do that on the switch.
On the switch side there are roughly thriteen different VLAN IDs which in IP Configuration have (in the IPv4 Interface section) been given a static IP and all are showing in IPv4 Routes.
Virtual Servers on the ESXi hosts need access to all of them bar one (guest WiFi) which the servers never need to touch. However, in the VLAN Managment section under Port VLAN Membership if I select Interface type LAG I can see the LAG Groups only have three of the thirteen VLAN IDs assigned and one of them is untagged (the virtual server VM NIC VLAN ID). The other two Tagged VLANs are the VoIP server and CNC Machine VLAN. The Wired and Wireless PC VLAN ID on which all our domain client computers connect are not included. Neither are the other VLANs all of which are in use on the swtich and which one or more servers will interact with.
The bit I don't understand, and why I am asking, is how is this still working? It is working hand has been for years, albeit with some random hiccups (that is what started me investigating the network.)
So how are the packets from these other devices on other VLANs getting routed to and from the virtual servers if the VLAN they are on is not permitted on the LAG that connects the ESXi host to the Switch Stack. My understanding of VLAN is that a LAG or a Port in Trunk Mode with VLANs assigned to it as Tagged should be rejecting any VLAN not included in the list of assigned VLANs. I am not clear on what happens via an Untagged VLAN assigned to a LAG.
As you can see I am rusty with regard to network switching and despite doing a bit of reading couldn't find an answer to the question. If this is working how is it working and is it best practice? i.e. can I improve the situation by adding all the required VLANs to the LAGs or is not necessary to add anything other than the Server NIC VLAN (Note: this is only defined in the Switch there are no VLAN IDs specified in either the VM NIC config or in the ESXi Config) The interesting thing is that in ESXi Networking if I look at the IPv4 Routes it can see externally on each Physical NIC connecting them to the switch stack it can only see networks that have been added to the LAG.
Sorry if this seems a bit rambling please ask me for clarification of anything you need.
Thanks
Nick K
07-22-2024 06:45 PM
I have a little trouble following your message. I suspect that you look at Port Groups thinking that they are virtual switches. Anyway, all ESXi networking entities can be easily found by clicking on Networking in the Navigator pane. That opens a large pane to the right with tabs for each networking entity. Also, I'm not an ESXi expert, but I can tell you that there are VMkernel NICs, not just VMkernels. VMkernel NICs are used by the ESXi host itself, not VMs. VMs have their own NICs and they do not use any VMkernel NIC. Both VMkernel and VM NICs are mapped to Port Groups. When you look at a virtual switch, you have Port Groups on the left side and physical adapters on the right side.
I think the best practice is to add only VLANs that are necessary. You may have many VLANs in your network, but VMs may be located in only some of them. VMs can access hosts in the other VLANs by means of routing. It looks like your physical switch does inter-VLAN routing. If there aren't any access issues around those VMs, most likely there is no need to add the other VLANs to those LAGs and I would not add them.
07-23-2024 05:42 AM
07-23-2024 06:44 AM
I would say yes, but there isn't really any routing done in ESXi. The ESXi virtual switch does not do any routing. It's just plain VLAN assignment and L2 traffic. I also think that your understanding of the default gateway is incorrect. First of all, it has no function in L2. Second, it is not even used in inter-VLAN routing. For inter-VLAN routing, all routes are defined for each VLAN. If your CISCO switch does inter-VLAN routing, you will find those routes there. They will be described as local, directly connected.
07-23-2024 09:44 AM
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide