Re: Ask the Expert- SD-WAN fundamentals and implementation - Page 2

Cisco Moderador · ‎08-19-2019

This topic is a chance to discuss more about SD-WAN, it's foundations and inner mechanisms as well as its correct design and implementation to achieve desired business outcomes. Software-Defined WAN (SD-WAN), is a popular technology and this event is aimed to help engineers/customers/partners understand the benefits and possible advantages that its implementation can bring.

To participate in this event, please use thebutton below to ask your questions

Ask questions from Monday 19th to Friday 30th of August, 2019

Featured expert

David Samuel Peñaloza Seijas works as a Senior Network Consulting Engineer at Verizon Enterprise Solutions in the Czech Republic. Previously, he worked as a Network Support Specialist in the IBM Client Innovation Center in the Czech Republic. David is an expert interested in all topics related to networks. However, he focuses mainly on data centers, enterprise networks, and network design, including software-defined networking (SDN). David has a long relationship with Cisco. He has been a Cisco Instructor for the Cisco Academy and was recognized as a Cisco Champion and a Cisco Designated VIP for 2017, 2018 and 2019. David holds a CCNP R&S, CCDP, CCNA Security, CCNA CyberOps and a CCNA SP certification. Currently, he is preparing for a CCDE.

David might not be able to answer each question due to the volume expected during this event. Remember that you can continue the conversation on the SD-WAN community.

Find other events https://community.cisco.com/t5/custom/page/page-id/Events?categoryId=technology-support

**Helpful votes Encourage Participation! **
Please be sure to rate the Answers to Questions

kenneth.meyers · ‎08-26-2019

Question around control machine limits and ztp:

We’re looking at a fairly large SD-WAN rollout and I was wondering on limits of the controllers around bfd sessions, control connections etc. We’ll most likely have a hub and spoke type configuration as “branches” do not need connectivity between each other. What I’m trying to find information on is how many bfd type sessions a “hub vEdge” device can accommodate in additional to the capacity of vSmarts around control connections to begin to look at sizing things appropriately (including failover of one “Hub” or vSmart device and how this plays into overall design of the overlay control plane).

With respect to ztp, if we would like to deploy our own certificates (in house CA) would we need to “touch” each vEdge before shipping to remote site (or have on-site personnel install a certificate on the device) before the vEdge contacts vBond? Is ztp possible with self-signed certificate requirement?

Thanks,

David Samuel Penaloza Seijas · ‎08-27-2019

Hello @kenneth.meyers

Effectively, as you have mentioned, one of the ways to scale the solution is to rely on a hierarchical model to restrict the tunnels between sites - the solution works in an any-to-any fashion which taxes scalability as the state is held in the network even if those tunnels are not needed.

Quoting a previous reply in this thread:

The main drawback of it being scalability: as each vSmart controller supports a limit of around 5400 control connections (and those are shared when deployed in multitenancy mode), please note that each TLOC will establish a control connection. Furthermore, doing the math by increasing the number of TLOCs in each vEdge will cut down that limit substantially:

One TLOC - 5400 vEdges
Two TLOCs - 2700 vEdges
Three TLOCs - 1800 vEdges

Regarding the vEdge BFD session limits:

As far as I know, the ZTP process relies on certificates signed by a CA, being Symantec or your enterprise root CA chain, which is then installed in vManage (and all vEdges would need to have the root certificate as well - which means touching them). Have not seen this being accomplished with a self-signed certificate.

Hope that helps.

David

kenneth.meyers · ‎08-27-2019

Thanks David,

Would it be safe to assume that a "HUB" type vEdge device would have the same scaling limitations as the vSmart controllers previously mentioned.

One TLOC - 5400 vEdges
Two TLOCs - 2700 vEdges
Three TLOCs - 1800 vEdges

In the hierarchical model we're wondering how many "spoke vEdges" can connect to the "HUB vEdge" before we start taxing the capabilities of the Hub device with respect to BFD and IPSEC sessions.

Thanks again.

Regards

daniel.dib · ‎08-27-2019

Kenneth,

The scaling for the Edge devices is different than for the controllers. For the controllers, the scaling factor is mainly the number of control sessions. For the Edge devices, it's about the amount of IPSec tunnels, hence the number of BFD sessions. How many sites are you planning to deploy? What kind of device were you thinking of using at the Hub location(s)?

Daniel Dib
CCIE #37149
CCDE #20160011

Please rate helpful posts.

kenneth.meyers · ‎08-27-2019

Hi Daniel,

How many sites are you planning to deploy?

We're looking at between 7,000 and 8,000 sites.

What kind of device were you thinking of using at the Hub location(s)?

We're trying to find a datasheet or some other document that outlines the capability of each hardware vEdge device. I have not been able to find a datasheet that identifies number of ipsec tunnels (or BFD sessions). What I've found is throughput information, number of interfaces etc. but yet to stumble upon the information around IPSEC tunnel capabilities of each device. Once we have that information we'll be in a better position to figure out the design aspects given we'll know how many sites can peer with each HUB, how to ensure successful failover in case a HUB goes offline etc.

Thanks Daniel!

Ken

David Samuel Penaloza Seijas · ‎08-27-2019

The number of BFD sessions is indeed convoluted and hard to find, I suppose it could be motivated to the variety of deployment options (e.g. HW vEdge, cEdge, vEgde cloud) - I have not seen official numbers yet, only throughput and interfaces.

The previous slide I have shared is from a Cisco Live session providing some overview f the solution. Hope that helps!

David Samuel Penaloza Seijas · ‎08-27-2019

Just for the sake of accuracy - the picture I uploaded in my comment wasnt visible for some reason, just did re-upload. It shows an estimation of tunnels per device.

@kenneth.meyers - keep in mind you would need a high performance device to accommodate the number of tunnels a hub would entail in your design.

Seth Beauchamp · ‎08-26-2019

We are a SP looking to use a compute cluster where we deploy one vedge cloud per customer. We would like to put all customers in a shared underlay for the transport interface, that is a single vlan with a /24 and each vedge gets an IP in that subnet. We also would like our customers to have access to their own vManage to make changes. A danger I see here is a customer changing their vedge cloud transport IP to an IP that overlaps with another customer, allowing customer A to bring down customer B. What could I do to prevent that?

David Samuel Penaloza Seijas · ‎08-27-2019

Hello @Seth Beauchamp

We would like to put all customers in a shared underlay for the transport interface, that is a single vlan with a /24 and each vedge gets an IP in that subnet. We also would like our customers to have access to their own vManage to make changes.

Being an MSP where your business is about offering transport and sharing the same infrastructure with all your customers, this is always a risk. That being said, there are techniques (mostly relying ion virtualization) to segment your customers so their failure domain is contained and separated, hence, not affecting other customers sharing the same infrastructure.

Are you trying to save IP addresses? Allowing the customers to share the same broadcast domain is dangerous, involves fate sharing. Unless you can enforce it somewhere else in the infrastructure (many access lists or similar tools) can only be cumbersome and posse as a highly complex operational model. Is there a hard constraint? is there any other reason behind this request? cant you simply segment them through subnetting? maybe even PVLANs come to my mind if you need/must go down this road, alas, it would not prevent a customer from using an unauthorized IP address and affecting some other customer's operation. The best is always to keep them "together but not scrambled" - with their own playground.

David

Seth Beauchamp · ‎08-27-2019

@David Samuel Penaloza Seijas wrote:
Hello @Seth Beauchamp
 
Are you trying to save IP addresses? Allowing the customers to share the same broadcast domain is dangerous, involves fate sharing. Unless you can enforce it somewhere else in the infrastructure (many access lists or similar tools) can only be cumbersome and posse as a highly complex operational model. Is there a hard constraint? is there any other reason behind this request? cant you simply segment them through subnetting? maybe even PVLANs come to my mind if you need/must go down this road, alas, it would not prevent a customer from using an unauthorized IP address and affecting some other customer's operation. The best is always to keep them "together but not scrambled" - with their own playground.

David

We were thinking to put the whole subnet behind a single public NAT address so we aren't burning tons of public IPs. Of course we can split them into /31s per customer but that takes a bit more effort burning another vlan, setting up a sub interface, etc. I think thats likely what we will do, but I was searching for any other option. Sounds like its that or take the risk.

daniel.dib · ‎08-27-2019

Seth,

I want to highlight that even though the solution is able to work behind NAT, there are still considerations for not turning this into an operational nightmare.

The vEdge routers form control sessions with the controllers using DTLS/TLS. If using DTLS, this is done by using UDP in the port range of 12346 to 12446, where 12346 is the base port. This means that you will have several devices trying to communicate through the NAT device with the same source port. When the NAT device tries to translate the source IP, it will not be able to maintain the source port for all of the Edge devices. Now, there are methods to select a different base port and to do port hopping, but it is more complex than if they were behind no NAT or 1:1 NAT.

Also be very careful with symmetric NAT, that is, NAT devices that translate the source port to something else than the original port. If the original port was 12346, it then got translated to 23800, this can cause issues with the data plane because the symmetric NAT may report a port being used to the vBond but the actual port used between two vEdges may be another one. This can cause issues in forming tunnels between vEdge devices depending on if the other side is behind NAT or not.

Daniel Dib
CCIE #37149
CCDE #20160011

Please rate helpful posts.

Seth Beauchamp · ‎08-27-2019

Thanks Daniel this is good information. If you don't mind let me clarify one thing so I'm sure I understand... The vEdges will communicate to the controllers sourcing port 12346? As opposed to the vEdge sourcing a high number port with a destination to 12346. The control plane will be hosted in AWS and will all get their own public IPs.

daniel.dib · ‎08-27-2019

Yes. Both the source port and destination port is in the range 12346 to 12446. It's not an ephemeral port. I'm guessing they designed it this way to make it more deterministic and easier to write firewall rules etc.

https://sdwan-docs.cisco.com/Product_Documentation/Getting_Started/Viptela_Overlay_Network_Bringup/01Bringup_Sequence_of_Events/Firewall_Ports_for_Viptela_Deployments

Daniel Dib
CCIE #37149
CCDE #20160011

Please rate helpful posts.

Hilda Arteaga · ‎08-27-2019

Dear @daniel.dib

Thanks for joining this session sharing your knowledge, you're contributions are value and help many to solve their issues and doubts

David Samuel Penaloza Seijas · ‎08-27-2019

He has been a fantastic sparring partner - always supporting around! Kudos to @daniel.dib !