cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
657
Views
0
Helpful
1
Replies

SDA Fusion redundancy constructs

Sylvain_Che
Level 1
Level 1

Dear,

I took the ownership of an existing SDA Fabric and I should bring some redundancy at various levels. Unfortunately the implementation was not documented and I have to do reverse engineering to understand what stuff are configured and why in order to think about the items to configure to bring this needed redundancy. I have no lab to test and dCloud is not sufficient here.

I'll try to describe the best I can the current implementation of the Fabric.

The SDA Fabric is configured with a single Fabric site, 2 Border/Control Plane nodes and 2 Border-only nodes in total. All 4 Borders are Anywhere borders.

The Fabric site is actually split in 2 geographical sites (close enough to respect the latency requirements). On each geographical site, there is 1 B/CP and 1 Border. Both of them are connected to a single site-local Fusion with eBGP.

Fabric Edges have 1 uplink to each site-local Border nodes (so 2 uplinks in total).

There is 1 physical link between BCP1 and BCP2, and 1 physical link between B1 and B2. There is no direct physical link between Fusion switches.

iBGP is configured between both BCP nodes for all VNs with the route-map "tag_local_eids" applied in the outgoing direction as documented in CSCvm77399 - SDA - Prefixes stay in LISP map-cache even after Source of prefix is lost . I don't know at the time of the implementation (2019) if this was configured manually or automated via DNAC but anyway, the config is there and it's OK.

Fusion redundancy:

At the moment:

  • The site-1 endpoints exit the Fabric through Fusion1 which then forwards to final destination.
  • The site-2 endpoints exit the Fabric through Fusion2. Traffic then goes from Fusion2 to Fusion1 because Fusion1 sends a default route via OSPF to Fusion2. The reason is because routing between Fusion2 and legacy2 has never been configured.

However if Fusion1 fails, endpoints from the fabric (whatever the geographical site they are located) have no way to reach the legacy and outside world. This is obviously a big concern.

Configuring routing between Fusion2 and legacy2 is not a very big deal. With iBGP being configured between BCP nodes, I know that BCP nodes have redundant paths towards the outside.

Now my concern/reverse-engineering is about how the 4 Borders are configured and how Border-only nodes can route traffic to the outside.

> BCP1 and BCP2 are route-reflectors. B1 and B2 are route-reflector clients of both BCP1 and BCP2. >>> Q1: First, can you please confirm that this configuration is entirely manual (I assume that DNAC templating is equal to be manual config) and not automated by DNAC?

> B1 and B2 have the following route-map applied under BGP IPv4 and VPNv4 address-families in the ingress direction:

B1#
address-family ipv4
bgp redistribute-internal
redistribute isis level-1-2
neighbor BCP2.122.133 activate
neighbor BCP2.122.133 route-map deny_0.0.0.0 in
neighbor BCP1.124.129 activate
neighbor BCP1.124.129 route-map deny_0.0.0.0 in
exit-address-family
!
address-family vpnv4
  neighbor BCP2.122.133 activate
  neighbor BCP2.122.133 send-community both
  neighbor BCP2.122.133 route-map deny_0.0.0.0 in
  neighbor BCP2.122.133 route-map tag_local_eids out
  neighbor BCP1.124.129 activate
  neighbor BCP1.124.129 send-community both
  neighbor BCP1.124.129 route-map deny_0.0.0.0 in
  neighbor BCP1.124.129 route-map tag_local_eids out
 exit-address-family
!
route-map deny_0.0.0.0 deny 25
 match ip address prefix-list deny_0.0.0.0
route-map deny_0.0.0.0 permit 30
!
ip prefix-list deny_0.0.0.0 seq 10 permit 0.0.0.0/0
!
// *same config on B2.

This route-map says "Drop the advertisement of the 0.0.0.0/0 route coming from BCP nodes" leaving the Border-only switches with the following BGP table (example with B1):

B1#show ip bgp vpnv4 vrf CAMPUS
BGP table version is 168555, local router ID is B1.124.130
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
              t secondary path, L long-lived-stale,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 1:4102 (default for vrf CAMPUS)
 *>   0.0.0.0          Fusion1.126.74                      65535 65000 i
[...]
B1#

while BCP1 has 4 different possible paths:

BCP1#show ip bgp vpnv4 vrf CAMPUS
BGP table version is 3811592, local router ID is BCP1.124.129
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
              t secondary path, L long-lived-stale,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 1:4102 (default for vrf CAMPUS)
 * i  0.0.0.0          B2.122.132           0    100  45000 65000 i
 * i                   BCP2.127.237           0    100      0 65000 i
 * i                   B1.124.130           0    100  45000 65000 i
 *>                    Fusion1.126.62                      65535 65000 i
[...]
BCP1#

>>> Q2: Why such route-map is applied on Border-only switches? Right now if Fusion1 fails, B1 has no exit path which would result in a negative impact for SDA endpoints.

>>> Q3: Application of this route-map is manual or automated? I know this route-map is also used under LISP routing process but I didn't find any documentation about applying it under BGP routing process.

 

> Another configuration applied on all 4 Borders which is not symmetric is the redistribution of the underlay prefixes in BGP routing process (AF IPv4).

On BCP1: both "redistribute isis level-1-2" and "redistribute lisp metric 10" commands are present.

On B1: only "redistribute isis level-1-2" command is present.

On BCP2: only "redistribute lisp metric 10" command is present.

On B2: no redistribute command configured.

On both Fusion1 and Fusion2, I can see /31s and /32s underlay prefixes learned via eBGP from site-local Border nodes.

>>> Q4: Since the configuration is not symmetric I believe the commands were pushed manually. Do you confirm?

>>> Q5: At the end, which command should be applied? ISIS only? LISP only? Both? For sure, I will configure aggregate-address command to avoid getting absolutely all Fabric prefixes learned by Fusion switches.

 

Thanks in advance.

Additional questions might arise...

 

Regards,

Sylvain.

1 Reply 1

Sylvain_Che
Level 1
Level 1
>>> Q2: Why such route-map is applied on Border-only switches? Right now if Fusion1 fails, B1 has no exit path which would result in a negative impact for SDA endpoints.

>>> Q3: Application of this route-map is manual or automated? I know this route-map is also used under LISP routing process but I didn't find any documentation about applying it under BGP routing process.


Q2: From my research, it looks like this route-map "deny_0.0.0.0" is useful under BGP routing process and applied to eBGP neighbors, only in the case of Internal-only Borders since this Border type should not advertise a default route into the SDA Fabric.

Removing the application of the route-map doesn't break anything. It brings the default route redundancy on B1 and B2 nodes.

Q3: I believe it was manually applied but cannot confirm.

 

Still investigating on the other questions...

Review Cisco Networking for a $25 gift card