Community PVLAN proxy ARP problem?

m.o.andersson_2 · ‎03-18-2016

Hi,

im currently setting up a DMZ for servers with private vlan and i have a theoretical question about the Community PVLAN and Proxy ARP design.

Let say i have a Isolated secondary PVLAN and a Community secondary PVLAN with servers that connects to a Firewall promiscous port that runs proxy ARP for the whole subnet so that servers can communicate with each other on the private vlan subnet through the firewall. In the community pvlan i place servers that needs to communicate over multicast or other cluster communication that are very chatty that i dont want to pass through the firewall.

When a server from the isolated pvlan wants to communicate with a server in the community pvlan it sends a ARP-request, firewall reply with its MAC-address and traffic is forwarded through the firewall. But how does this work in the community pvlan? If a server in the community pvlan want to communicate to a server in the same community pvlan, it sends an ARP-request, now both the server and the firewall will reply with its own mac-address. So then the arp-reply that reaches the asking server first will be in its arp-table. So this is something you do not have control over? The server will probably be located closer and will send it's ARP-reply quicker, but lets say it have high CPU util and the ARP-reply gets delayed then traffic in the same communtiy pvlan will be sent through the firewall.

Is there any mechanism i don't understand in Private Vlan or is there a design issue that I've not considered?

Cheers! // Mattias

Shaunak · ‎03-21-2016

Hi Mattias,

In this scenario, can we program the arp-cache on the servers and install static entries for all the local servers in the same community VLAN so that there is no ARP-request reply process that takes place.

I'm not sure about the kind of servers you're running but I guess you can think on these terms and see if it fulfils your requirement.

Thanks,

Shaunak

m.o.andersson_2 · ‎03-21-2016

Thank you Shaunak, that would defenitly work. But that solution raises alot of "what if"'s. What if we replace one of the servers, what if we add an additional server, what if VMware changes the MAC-address on the vNIC... I wish there would be a more robust solution for this design that solved the issue more automatically. We do not have the processes in place for this kind of manual routine in our production environment to add static MAC-entries on the servers. But if there is no alternative way then this might be our only option.

I just thought that Cisco who developed Private Vlan had thought of this kind of scenario and had a clever solution for it.

Shaunak · ‎03-21-2016

Your questions are valid as this might not be scalable in a mid to large scale deployment since every server will have n-1 arp entries.

Isn't there a way to bind the MACs to the vNICs and always have the platform assign the same MAC to the VM? I'm not a VMware expert just brain storming here.

Well this issue is really a host to host issue rather than how the PVLANs architecture is designed and is specific to how your network is setup and what is expected of it but none the less is valid.

I'm not able to think of anything else other than this, let's see if someone chimes in and gives a more elegant solution.

Thanks,

Shaunak

m.o.andersson_2 · ‎03-21-2016

Well this might not even be a problem, in our lab setup we have not yet seen this behaviour we describe. The servers in the community pvlan never gets the firewall mac-address in the ARP-table for other servers in the same community pvlan. Thats why i said theoretical question, we do not want to proceed with this solution if there is a possibility that traffic will be sent to the firewall instead of the server in the same community pvlan. We have only tried with Windows 2012-servers, Linux might behave differently.

So i think its a valid question what happens in theory if the firewall replies before the server does on an ARP-request. Working with static ARP-tables wouldn't scale for us as you mentioned. So we might need to solve the community pvlan in some other way.

Shaunak · ‎03-21-2016

I guess if the servers need to communicate with each other and have vNICs and are part of the same ESXi chassis then the internal switch will facilitate that communication and data will not hit the physical devices, but if we have the servers spread across multiple ESXi chassis then this might pose a problem. Again I'm not a VMWare expert I may be wrong about the data schematics in the vDS switches etc.

Thanks,

Shaunak

m.o.andersson_2 · ‎04-19-2016

Yes the problem will probably occur when the VMs are spread across multiple hosts. So we are going with a workaround, where we use two primary PVLAN where one only contain isolated secondary pvlan with proxy arp in the Firewall. The second primary PVLAN only contains Community secondary PVLANs without proxy arp. This will solve the problem, but it also limits the traffic between the community vlans. But we dont think this is a big issue in our environment.

Still i think its a bit strange that no one ever runned into this issue before. PVLAN with firewall setup should be implemented on a couple of places, and not be able to run community vlan in that setup is bad...