Lately, I've had a number of requests for isolated VLANs, non-routed subnets, etc. Call them what you will, they are all pretty much the same concept. Historically, we've honored such requests by simply creating a new VLAN and, if requested, providing a subnet / IP range for the requester to use that will not overlap, should they ever change their mind and decide they want it routed. Not that it's ever happened, of course. :D
With that said, we haven't changed much with ACI. But my gut tells me there's a better way to do it. What we do is to create a new bridge domain, flood in the BD, and disable unicast routing. Then create a new EPG for that BD, exclude it from the preferred group, neither provide nor consume any contracts, then call it a day.
As I stare at two requests in our queue right now, both asking for a new isolated VLAN, my first thought is to, at the very least, create a separate VRF for these bridge domains. Then I question my own sanity, as there actually won't be any L3 interfaces configured, thus making a VRF a moot point.
Another thought I had was that, for some use cases at least, I could at least reuse an "Isolated" bridge domain and just create the new EPGs, again, no contracts. We have all 2nd-gen switches, so I could flood the encap instead of the bridge domain. In most cases, these are untagged (802.1p), so the encap isn't relevant anyway.
You see where I'm going with this. Any thoughts? How do you do it?
... View more
Thanks, Chris. Your response is very much appreciated. I think it showed me that I need to spend some time in the lab futzing with useg just to get a better understanding. But I have the freedom of doing so at my leisure, as I now have a definitive answer for the server folks, who are no longer waiting on me for an answer.
Add a theoretical second beer to my tab. The first theoretical beer I owe you was from a few months ago when I was asked to come up with a solution to ensure traffic from a VDI would traverse an inline IPS before hitting its default gateway. Your blog post / tutorial on per-port VLANs was the basis for my being able to better understand how PPV worked, and was, with very little modification, the basis for the VDI / IPS solution. Like I said, your post was what helped solidify in my mind the nature of PPV, and how it is configured in ACI. So double kudos.
... View more
(Edit: Updated info and answered some of my own questions after hitting the lab)
Hello all. I had an interesting request come across my messy desk yesterday, and I wanted to get some thoughts on this.
One of our server teams has chosen a hyper converged VxRail appliance, and they are doing the right thing by adhering to the vendor recommended design, planning the network design, and trying to think ahead, under the assumption that there will be additional appliances in the future.
The specific question revolves around the vSAN network. The appliance has two or more network connections (active/standby, NOT LACP) specifically for the purpose of communication among nodes within the vSAN cluster. The typical number of nodes is apparently four, with a hard cap of 64. In reality, you would add additional clusters before you would add that many nodes to a cluster.
The vSAN network is completely isolated. No routing, nothing special at all. The only network requirement is that it be a trunk port. In other words, the VID is tagged in the 802.1q header. No problem.
Here's how the request was proposed to me. I like the way he worded his request.
Let's assume that we have a cluster of four nodes. We'll use VID 10 and we have a subnet of 10.10.10.0/24 to KISS and to allow for future expansion. Non routed, so no biggie either way.
Node vSAN1-n1 has 10.10.10.1
Node vSAN1-n2 has 10.10.10.2
Node vSAN1-n3 has 10.10.10.3
Node vSAN1-n4 has 10.10.10.4
Assume now that we have our own switch dedicated to this network. All ports are configured as switchport mode trunk and switchport trunk allowed vlan 10. No uplinks, no mess. Everybody is happy and sings Kumbaya. The best part is that nobody else has to hear it.
A year goes by and they purchase another appliance. Assume that they again buy a switch specifically for vSAN. All ports have vlan 10 trunked to them, and the nodes look like this:
Node vSAN2-n1 has 10.10.10.1
Node vSAN2-n2 has 10.10.10.2
Node vSAN2-n3 has 10.10.10.3
Node vSAN2-n4 has 10.10.10.4
No problem. As time goes on, they have a drop-in solution for vSAN. And I have to operate under the assumption that they are correct when they say that there is no chance that these will ever need to route.
That works great if you have a dedicated switch for every vSAN cluster. But that's not realistic.
The request was that they be able to do exactly that with our production data center switches.
My first thought was to use port-local VLANs, but that pretty much means a 1:1:1:1 VLAN Pool : Domain : EPG : BD which feels... wasteful. However, it does, as some quick lab work just proved out, satisfy the requirements outlined in the request. (See below)
I was also thinking that useg may be an option? I don't know enough about how useg works, but if I understand it correctly, I can keep it simple with one bridge domain and a separate community useg EPG per cluster? I figured I would get some thoughts before I hit the lab with this.
Thanks in advance for your thoughts.
Details (Per-Port VLAN Lab)
In my lab, I created three VLAN static VLAN pools with a single encap: VID 100. I created three physical domains and tied each to its respective VLAN pool. I tied all three to a common AAEP. I then configured my locally-scoped interface policy groups and created profiles for these. I labeled them as follows:
L201# show int e1/17-18,e1/20-21,e1/27-28 descr
Port Type Speed Description
Eth1/17 eth inherit vSAN-Cluster1_n1
Eth1/18 eth inherit vSAN-Cluster1_n2
Eth1/20 eth inherit vSAN-Cluster2_n1
Eth1/21 eth inherit vSAN-Cluster2_n2
Eth1/27 eth inherit vSAN-Cluster3_n1
Eth1/28 eth inherit vSAN-Cluster3_n2
I in my tenant, I created three bridge domains with no unicast routing enabled. They have the following VNIDs:
moquery -c fvBD -x query-target-filter='and(wcard(fvBD.name,"vSAN-Cluster"))' | grep 'name\ \|seg\'
name : vSAN-Cluster1
seg : 16482210
name : vSAN-Cluster2
seg : 16383903
name : vSAN-Cluster3
seg : 16121798
I created three EPGs, one per bridge domain, applied the respective physical domain to the EPG, and statically-assigned the respective EPGs to the respective ports:
L201# show int e1/17-18,e1/20-21,e1/27-28 trunk | grep -A7 -B1 Allowed
Port Vlans Allowed on Trunk
And note the vlan-100 encap on all ports, plus the vnid of the bridge domains (see above):
L201# show vlan extended | grep 17[1-7]\ *enet
171 enet CE vxlan-16482210
172 enet CE vlan-100
173 enet CE vxlan-16383903
175 enet CE vlan-100
176 enet CE vxlan-16121798
177 enet CE vlan-100
... View more
Just to follow up, I just did the same test with 3.2(2o). Again, using 93180YC-EX switches. Same results for me as on 3.0(2k). I was really starting to suspect the version more than anything else, but maybe it is, in fact, something with 1st gen vs 2nd gen.
Still trying to get my hands on a 1st gen switch. I really want to try to duplicate your results.
... View more
I am running 3.0(2k) not 3.0(1k). Sorry about that.
Anyway, I tried to duplicate your problem in my lab today, but as it turns out, someone put the 1st-gen leaf switches that were in the lab into production! So I was unable to test the 1st-gen theory. For now, anyway.
Good news, I suppose: Running 3.0(2k), using 93180YC-EX switches, I did NOT see the same behavior out to a directly-attached host. I tried in all three modes (trunk, dot1p, and untagged). I also cleared endpoints after each change, just to be sure. In my case, I used encap vlan-500. The cvid on the leaf switch was not 500. I forgot exactly what it was, but what's important is that it wasn't 500.
In all three situations, I only saw the ARP requests from the default gateway being broadcast, as expected. I did not see any unicast ARP messages, nor did I see anything from any address other than the gateway.
When trunked, I saw the above frames tagged with the expected VID: 500. I did not see anything tagged with the cvid, nor did I see anything untagged.
In 802.1p and in untagged modes, I saw only untagged frames. Nothing unexpected.
So again, I suppose that is good news. I would like to test on this version with a 1st-gen switch, but I may not be able to do so.
We're staring down the barrel at a fabric upgrade and were asked to upgrade the lab to 3.2. Hopefully that will happen next week. Once that is completed, I will repeat the test. Again, on the 93180YC-EX switches at least, and will let you know what I see.
... View more
I would always encourage starting with RFC-1925 .
RFC-1149 is always a good read, and RFC-2549 is worth a subsequent glance, as it defines QoS for 1149. RFC-6214 is the IPv6 adaptation.
Okay, joking aside, I agree with Seb, but I would expand on the thought. IP, OSPF, TCP, SNMP, etc... These fundamental protocols put food on your table and a roof over your head. They are your livelihood. Understanding as much as you possibly can understand about the protocols and how they work is a key to your long-term success. I would strongly encourage you to not just read the specifications for the protocols that you work with when you have the time. I would advise you to make the time to do so.
I didn't have this realization until probably 15 or 20 years into my career. When I did, I just started reading. One day I was working on an OSPF problem, so that night, I dug into RFC-2328, which naturally led me into RFC-5340. Wash, rinse, repeat.
Just my 2c + my sad attempt at comedy.
... View more
Thank you. I think you just answered some questions and confirmed some suspicions I had. Every time I start down the road of thinking through this, I realize that, ultimately, I am probably looking at using GOLF anyway and I would probably be doing my future self more favors now if I invested my limited lab / research time in GOLF instead.
... View more
I would like to know as well. I have been desperately trying to find some time to look into this. I have a common L3Out that is applied to both pods. The prefixes are redistributed into OSPF as E2 routes. Hot potato routing. Just get it to the fabric as quickly as possible, and if I have to traverse the IPN, so be it.
That's fine for the most part. But I am seeing a trend that I like. Server teams are starting to ask for a separate subnet at each DC. I would rather see those prefixes as E1 routes where I can adjust costs accordingly. But I don't think that is possible.
What I was thinking was it may be possible, using a route map, to only advertise those prefixes out pod 2 or only out pod 1, except for when the ospf adjacency in said pod is lost. I just haven't been able to find the time to dig in and do some testing and / or research.
Not an answer, I know, but something to consider maybe?
... View more
Are you REALLY sure? I'm running 3.0(1k) and literally had a change this last Tuesday where I had to change unknown unicast from hardware proxy to flood. I tested this first in my lab. I had an EPG in this BD statically mapped to two ports. One was trunk, the other was 802.1p.
Both ports FLAPPED. Line proto down, link down. Link up line proto up. It was painful to watch.
So, before performing the change on Tuesday, I had to delete all static mappings where I can't have the port bounce, then re apply them. It was "fun" :)
... View more
Wow! First off, great post, as always. It caught my attention, as I have had a todo for some time to get some packet captures to gain a little more insight into the ARP gleaning process before I start making any recommendations one way or another re: ARP flooding. Basically, to make sure that I understood it before others start asking about it. :)
At first I was thinking that you had a specific corner case, but you identified some behaviors that simply shouldn't be. The kicker at the end was most disturbing, re: the additional cvid-encapsulated ARP request.
My gut tells me that it might be platform-specific. In particular, re: forwarding behavior in 3.2 on the first-generation leaf switches vs. second-generation leaf switches. I'm pretty sure it was the Orlando 2018 BRKACI-3545 where said behavior changes and differences were noted.
I'm curious, do you have a second-gen leaf switch you could use, but leave everything else the same? And / or maybe try this with an older release? 3.0 maybe?
My lab is running 3.0(2k). I have a mix of first- and second-gen leaf switches. I have a 9372PX pair, a 9372PX-E pair, and a 93180YC-EX pair. Unfortunately, I just don't have the time during the week, and this is not a good weekend for me to try to get deep into the muck. I can probably duplicate your setup maybe some time next weekend, though.
Edit: Fixed a typo of my own; Fixed my uh-oh: I'm running 3.0(2k) not (1k).
... View more
Thanks to CSCvg71263 , I am looking at a fabric upgrade. I see that 3.2 is now the long-lived release branch, so I would have to have a compelling reason to upgrade to 3.1. H owever, I would be remiss if I did not do my due diligence.
I've reviewed the 3.2 release notes for the APIC and the 13.2 release notes for the switches and have not found anything that jumps out at me even in the open caveats . My 40-ish leaf switches, across both pods (multipod), are almost all 93180YC-EX, with a 9372 pair or two that are soon to be replaced with 93180YC-EX switches. My four spines are all 9508 chassis.
Has anyone found any gotchas, in general, or maybe specifically regarding my setup? Obviously at a high level, that is.
... View more
Hello. I've picked up a lot from some of you who are ACI Kung Fu Masters and some of you who are very creative and have even more bizarre use cases than I. :) So, I figured I would post here asking for best practice suggestions.
We have two data centers, several miles apart. We have them setup using multipod on 3.0(2k). The IPN is a a pair of N9Ks (Standalone) at each DC, each with a 10G Metro-E between them. Along with the IPN VRF, we are also routing our core area 0. This is also where our common L3out terminates. (Core VRF, not IPN VRF). Why does the common L3out terminate here? Read on and, if you are experienced with OSPF, you will probably pick up on why.
Our WAN circuits out to our various locations come into yet another pair of standalone N9Ks at each DC so that they can traverse our IPS before hitting our ASR1K at each DC.
For "Reasons", some of our DC SVIs remain on our ASR1K routers. They run VRRP across DCs. Currently, this is via the fabric Each ASR has a port-channel to a leaf pair with the various VLANs trunked. Our data centers are, by policy, hot standby. So the ASR at DC1 is the VRRP master. It works out okay unless the need arises to have any given site use their backup WAN circuit as their primary. This use case has arisen.
This is an inherent problem with OSPF, so there is a standard fix. And let's skip the debate over design. I inherited it. What I need to do is enable OSPF multi area adjacency between the two ASR1Ks. The requirement to do so is to have point-to-point adjacency between the two routers.
What I'm thinking is that I could simply use the existing port-channel and create a new SVI on each ASR. Let's call it Po100.666 (the necessary evil). ip ospf 1 area 0. ip ospf network point-to-point. ip ospf 1 multi area 1234. Drop the mic. Enjoy life as women everywhere want to be with me and men everywhere want to be me.
The philosophical question: is that the right way to do this? The ACI fabric is for servers, not transit. But I am already doing this anyway. In theory, it would go away when Po100 goes away anyway.
Or will it? It's nice to have that multi area adjacency.
So if I don't use the fabric for transport, what do I do? QinQ tunnel through the IPN switches between ASRs? GRE? An EVPN through them? But I already have an EVPN setup with VXLAN encapsulation. So why not use it?
I think I have my answer but I wanted your opinions. What would you do?
Also, if you were to use the existing port-channel for this purpose, given that only these two endpoints would exist in the new non-routed bridge domain, how would you configure the BD and EPG?
[Edit: ASR1K not 9k.]
... View more