During the last day I've been wondering about dual DC design for large scale enterprise and how various decisions with regards to layers impact the infrastructure usability, security and availability. Severs need connectivity, not only to the production but they all (just as network equipment) need connectivity to backup, management and perhaps storage as well. Additionally they might even be in need of a heartbeat network to sync information in between etc. Many server platforms need this connection to be L2. But at the same time it's ( in my opinion ) important to separate production in two separate DC by L3 in order to secure availability, ease troubleshooting etc. But then again; with production on separate L3 networks how do you most efficiently load balance so that you can utilize the capacity of both DCs equally. One design I had in mind was to divide the infrastructure into 6 layers;
Backend (BE) – The backend is where I thought such things as heartbeat, synchronization and out of band management would connect. (Separated according to the same rules as the production layer - 1 VLAN/compartment)
Backup (BKP) - Backup network that would allow servers to reach backup services (Here I use private VLAN, isolated secondary for each server or compartment)
Production (PR) - The core to where all production server interfaces connect (one VLAN/subnet for each compartment)
- FIREWALL - That separate the different compartments within the production layer by access rules
Front end (FE) - Front end network would provide L3 connectivity between DC productions
Load Balancing (LB) - Separated by L3 the LB layer would allow virtual IP addresses to be used for server resources to be reached from clients by a single IP - dividing the traffic to the two separate IP addresses in each DC.
Backbone Edge (BBE) - The backbone edge would be where the DCs connect to the backbone network.
This would result in;
BBE BBE (SP Router)
LB LB (LB...)
FE-L3-FE (Edge Router)
FW FW (Firewall)
PR PR (DC Switch)
BKP-L2-BKP (Access Switch)
BE-L2-BE (Access Switch)
... something like that...
So traffic coming into the DCs via the backbone would first hit the LB, get load balanced across the DCs and the information could be in sync thanks to the backend L2 connection while still separating the production via L3.
However - coming this far in reasoning about this I can't help to think that I even though I separate the production by L3 I still have L2 in the backend that could cause issues for both DC if something went wrong.
Of course there is the options of running for example VRF and a trunk with SVIs for each VRF on a trunk between the DCs but leaving that out of discussion for now.
I can see you've put plenty of thought into this. This is something we as network designers are struggling with. A distributable load with high availability across diverse locations network. My personal opinion here....I'm not a fan of overlays; extending L2 over L3 links. I understand the need for it, however it just creates new and different problems. Bigger and uglier ones! I'm really hoping something like LISP will make the network more transparent. When applications can start making calls like, "I need email, connect me to the closet email server." instead of "I need email, connect me tosmtp://mail.mycorp.com" our networks will become more scalable. I sure hope we're moving in that direction, but only time will tell.
From a high level point of view
Active/active DC is complicated but achievable
The complication comes from how the applications work in the dc and how your routing is designed
Using global load balncer aka global site selector can help you to distribute traffic over different data centers as well
Using l2 or l3 this is open and dependent on the servers needs
Hope this helps