01-11-2023 08:24 AM - edited 01-17-2023 12:36 AM
Dear,
I took the ownership of an existing SDA Fabric and I should bring some redundancy at various levels. Unfortunately the implementation was not documented and I have to do reverse engineering to understand what stuff are configured and why in order to think about the items to configure to bring this needed redundancy. I have no lab to test and dCloud is not sufficient here.
I'll try to describe the best I can the current implementation of the Fabric.
The SDA Fabric is configured with a single Fabric site, 2 Border/Control Plane nodes and 2 Border-only nodes in total. All 4 Borders are Anywhere borders.
The Fabric site is actually split in 2 geographical sites (close enough to respect the latency requirements). On each geographical site, there is 1 B/CP and 1 Border. Both of them are connected to a single site-local Fusion with eBGP.
Fabric Edges have 1 uplink to each site-local Border nodes (so 2 uplinks in total).
There is 1 physical link between BCP1 and BCP2, and 1 physical link between B1 and B2. There is no direct physical link between Fusion switches.
iBGP is configured between both BCP nodes for all VNs with the route-map "tag_local_eids" applied in the outgoing direction as documented in CSCvm77399 - SDA - Prefixes stay in LISP map-cache even after Source of prefix is lost . I don't know at the time of the implementation (2019) if this was configured manually or automated via DNAC but anyway, the config is there and it's OK.
Fusion redundancy:
At the moment:
However if Fusion1 fails, endpoints from the fabric (whatever the geographical site they are located) have no way to reach the legacy and outside world. This is obviously a big concern.
Configuring routing between Fusion2 and legacy2 is not a very big deal. With iBGP being configured between BCP nodes, I know that BCP nodes have redundant paths towards the outside.
Now my concern/reverse-engineering is about how the 4 Borders are configured and how Border-only nodes can route traffic to the outside.
> BCP1 and BCP2 are route-reflectors. B1 and B2 are route-reflector clients of both BCP1 and BCP2. >>> Q1: First, can you please confirm that this configuration is entirely manual (I assume that DNAC templating is equal to be manual config) and not automated by DNAC?
> B1 and B2 have the following route-map applied under BGP IPv4 and VPNv4 address-families in the ingress direction:
B1#
address-family ipv4
bgp redistribute-internal
redistribute isis level-1-2
neighbor BCP2.122.133 activate
neighbor BCP2.122.133 route-map deny_0.0.0.0 in
neighbor BCP1.124.129 activate
neighbor BCP1.124.129 route-map deny_0.0.0.0 in
exit-address-family
!
address-family vpnv4
neighbor BCP2.122.133 activate
neighbor BCP2.122.133 send-community both
neighbor BCP2.122.133 route-map deny_0.0.0.0 in
neighbor BCP2.122.133 route-map tag_local_eids out
neighbor BCP1.124.129 activate
neighbor BCP1.124.129 send-community both
neighbor BCP1.124.129 route-map deny_0.0.0.0 in
neighbor BCP1.124.129 route-map tag_local_eids out
exit-address-family
!
route-map deny_0.0.0.0 deny 25
match ip address prefix-list deny_0.0.0.0
route-map deny_0.0.0.0 permit 30
!
ip prefix-list deny_0.0.0.0 seq 10 permit 0.0.0.0/0
!
// *same config on B2.
This route-map says "Drop the advertisement of the 0.0.0.0/0 route coming from BCP nodes" leaving the Border-only switches with the following BGP table (example with B1):
B1#show ip bgp vpnv4 vrf CAMPUS
BGP table version is 168555, local router ID is B1.124.130
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
t secondary path, L long-lived-stale,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 1:4102 (default for vrf CAMPUS)
*> 0.0.0.0 Fusion1.126.74 65535 65000 i
[...]
B1#
while BCP1 has 4 different possible paths:
BCP1#show ip bgp vpnv4 vrf CAMPUS
BGP table version is 3811592, local router ID is BCP1.124.129
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
t secondary path, L long-lived-stale,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 1:4102 (default for vrf CAMPUS)
* i 0.0.0.0 B2.122.132 0 100 45000 65000 i
* i BCP2.127.237 0 100 0 65000 i
* i B1.124.130 0 100 45000 65000 i
*> Fusion1.126.62 65535 65000 i
[...]
BCP1#
>>> Q2: Why such route-map is applied on Border-only switches? Right now if Fusion1 fails, B1 has no exit path which would result in a negative impact for SDA endpoints.
>>> Q3: Application of this route-map is manual or automated? I know this route-map is also used under LISP routing process but I didn't find any documentation about applying it under BGP routing process.
> Another configuration applied on all 4 Borders which is not symmetric is the redistribution of the underlay prefixes in BGP routing process (AF IPv4).
On BCP1: both "redistribute isis level-1-2" and "redistribute lisp metric 10" commands are present.
On B1: only "redistribute isis level-1-2" command is present.
On BCP2: only "redistribute lisp metric 10" command is present.
On B2: no redistribute command configured.
On both Fusion1 and Fusion2, I can see /31s and /32s underlay prefixes learned via eBGP from site-local Border nodes.
>>> Q4: Since the configuration is not symmetric I believe the commands were pushed manually. Do you confirm?
>>> Q5: At the end, which command should be applied? ISIS only? LISP only? Both? For sure, I will configure aggregate-address command to avoid getting absolutely all Fabric prefixes learned by Fusion switches.
Thanks in advance.
Additional questions might arise...
Regards,
Sylvain.
01-17-2023 07:41 AM
>>> Q2: Why such route-map is applied on Border-only switches? Right now if Fusion1 fails, B1 has no exit path which would result in a negative impact for SDA endpoints.>>> Q3: Application of this route-map is manual or automated? I know this route-map is also used under LISP routing process but I didn't find any documentation about applying it under BGP routing process.
Q2: From my research, it looks like this route-map "deny_0.0.0.0" is useful under BGP routing process and applied to eBGP neighbors, only in the case of Internal-only Borders since this Border type should not advertise a default route into the SDA Fabric.
Removing the application of the route-map doesn't break anything. It brings the default route redundancy on B1 and B2 nodes.
Q3: I believe it was manually applied but cannot confirm.
Still investigating on the other questions...
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide