Re: SDWAN control connections are up; no BFD sessions

maxnpj · ‎03-24-2023

So this is driving me crazy. I have a vEdge 100M behind a firewall that has control connections but no BFD sessions. "sh tunnel statistics" shows the correct SRC IP and a SRC port of 12366; the DST IP is correct as well and the DST port is 12346; there are "tx-pkts" and "tx-octets" but zero "rx-pkts" and of course zero "rx-octets"

The "sh tunnel statistics bfd" output shows the same SRC and DST IP's; the same SRC / DST ports; there are lots of "BFD echo TX pkts" but zero "BFD echo RX pkts" and lots of "BFD echo TX octets" but of course zero "BFD echo RX octets"

The DST end in this case has HUNDREDS of other remotes connected so i am quite sure the problem doesn't exist on the DST end...It seems that the firewall is stopping something from getting out, but I'm not sure what to look at....the rules for this vEdge are basically any/any right now just to get it to work. One other thing of note, there are two IPSec tunnels (to Zscaler) using port 4500 (both SRC and DST) and those tunnels are up/up, so the firewall isn't blocking any IPsec ports.

Any and all suggestions are welcome.

svemulap@cisco.com · ‎03-24-2023

Check with 'port-offset' option to see, if it fixes the issue.
We have seen caveats with NAT / FW devices, where they are not able to maintain unique flows.
https://www.cisco.com/c/en/us/td/docs/routers/sdwan/command/sdwan-cr-book/config-cmd.html?dtid=osscdc000283#r_port_offset_3051.xml

Also, check port-hop, in the same document.

HTH

Kanan Huseynli · ‎03-24-2023

Hi,

enable port-offset and port-hopping. BFD runs over IPSec, so definitely IPSec issue with NAT.

HTH,
Please rate and mark as an accepted solution if you have found any of the information provided useful.

svemulap@cisco.com · ‎03-24-2023

Good to hear, Kanan.
It is an issue with upstream NAT device.
We have seen in deployments.
Hence, these knobs were introduced to get around.

>From the functionality wise, between these two commands:

port-offset: is offset by the number that is configured. i.e., if port-offset is set to 1, then the base port 12346 to 12347 and then port-hop with ports 12367, 12387 and so on.
port-hopping: is case where you hop across the ports, until control is formed. i.e., is the base port is 12346, if control is not formed w/n reasonable amount of time, device will hop to 12366, 12386 and so on.

There is little more to it both of theses. The design guide goes in detail on this. Link below: [check Firewall Port Considerations section]
https://www.cisco.com/c/en/us/td/docs/solutions/CVD/SDWAN/cisco-sdwan-design-guide.html?dtid=osscdc000283#FirewallPortConsiderations

HTH

maxnpj · ‎04-05-2023

svemulap;

So I'm still a bit confused here. I read the "Cisco SD-WAN Design Guide; Firewall Port Considerations" section. This states:
The secure sessions between the WAN Edge routers and the controllers (and between controllers), by default are DTLS, which is User Datagram Protocol (UDP)-based. The default base source port is 12346.
So, control connections are DTLS/UDP; starting @ port 12346. Got it.

Then I read:
IPsec tunnel encapsulation from a WAN Edge router to another WAN Edge router uses UDP with similar ports as defined by DTLS.

So I'm still a bit confused here. I read the "Cisco SD-WAN Design Guide; Firewall Port Considerations" section. This states:
**The secure sessions between the WAN Edge routers and the controllers (and between controllers), by default are DTLS, which is User Datagram Protocol (UDP)-based. The default base source port is 12346**
So, control connections are DTLS/UDP; starting @ port 12346. Got it.

Then I read:
IPsec tunnel encapsulation from a WAN Edge router to another WAN Edge router uses UDP with similar ports as defined by DTLS. I believe this is the BFD sessions between WAN edge devices.
So, this reads that the BFD sessions are also UDP; starting @ port 12346. Even in Table 2 in the "Firewall Port Considerations" section, the very last table entry; shows the source as WAN Edge (IPSec) and destination as WAN Edge, and both the source and dest protocol is UDP and the ports start @ 12346 and increment by 20.

This brings me to my confusion. If the control connections are UDP 12346, and the BFD sessions are UDP 12346, why are the control connections up and the BFD sessions down?

I've attached a screen capture of a device having this issue. Looking at this device I can see the control connections are using a "Peer Private Port" of 12346 and a "Peer Pub Port" of 12346. So according to Figure 38 in the documentation, those control connections are using DTLS/UDP, but looking at the screen cap I can see that the "DEST PORT" is not 12346 or anything similar. It's 25604. Does this mean that the BFD sessions are somehow using TCP instead of UDP?

I know this is in-depth and I'm asking for detail, but we get this situation often and as I said, it makes me bananas based on my understanding.

I believe this is the BFd sessions between WAN edge devices.
So, this erads that the BFD sessions are also UDP; starting @ port 12346. Even in Table 2 in the "Firewall Port Considerations" section. the very last table entry; shows the source as WAN Edge (IPSec) and destination as WAN Edge, and both the source and dest protocol is UDP and the ports start @ 12346 and increment by 20.

Which brings me to my confusion. If the control connections are UDP 12346, and the BFD sessions are UDP 12346, why are the control connectiosn up and th BFD sessions down?

Kanan Huseynli · ‎04-05-2023

Hi,

you seems really confused

control connection with vBond is always DTLS.

control connection with vManage/vSmart is either DTLS or TLS (depends on controller config).

all these can be verified by show control connections (note that, with vbond control connection is transient, it may not show up).

 It's 25604. Does this mean that the BFD sessions are somehow using TCP instead of UDP?

Based on your output, your device have TLS connection to vSmart/ vManage based on port number 2xxxx (again ,re-verify with above command).

Which brings me to my confusion. If the control connections are UDP 12346, and the BFD sessions are UDP 12346, why are the control connectiosn up and th BFD sessions down?

Between edge devices (routers) you don't have control connection, only data plane runs. IPSec uses ports as indicated in last line of table. Here, port number is for IPSec, BFD runs inside IPSec (BFD over IPSec) and BFD has its own header and UDP header (UDP3478 default). Intermediate devices don't see these header, since they are encapsulated inside IPSec.

If control plane is UP, BFD/IPSec can be down, because traffic between different devices. Control is between local router and remote controller ; BFD/IPSec between local router and remote router. So, check firewall or device which can block the traffic.

Destination port number (12346) can be the same, but destination IP address is different.

HTH,
Please rate and mark as an accepted solution if you have found any of the information provided useful.

maxnpj · ‎04-06-2023

I do see in the "sh control connections" output that the "PEER PROT" is DTLS and the control connection are using port 12346 which lines up with the figure 38 DTLS example in the "Cisco SD-WAN Design Guide; Firewall Port Considerations". I'm good with all of that.

I remain lost on some differences between the control connections and the BFD sessions. I know that there are no control connections between edge devices, but I'm still hung up on the fact that the control connections are DTLS/UDP and are able to get to the controllers, but according to the documentation the IPSec tunnels which are used for edge<>edge BFD sessions, are using UDP as well but those CANNOT get from one edge device to the other. In my particular case, I am sure this is a firewall problem, and that is part of my confusion. If the DTLS/UDP control connections can get out the why are the IPSec UDP BFD connections UNABLE to get out? According to the documentation the port range is the same and the protocol is the same.

Kanan Huseynli · ‎04-06-2023

Hi,

I've already answered to this question. See below:

If control plane is UP, BFD/IPSec can be down, because traffic between different devices. 
Control is between local router and remote controller ; BFD/IPSec between local router and remote router.
So, check firewall or device which can block the traffic.

Destination port number (12346) can be the same, but destination IP address is different.

Plus, controllers normally are with public IP address or work through 1:1 NAT. In your case, if both routers are behind dynamic NAT / PAT, then it will not work. Below is NAT section from CVD:

https://www.cisco.com/c/en/us/td/docs/solutions/CVD/SDWAN/cisco-sdwan-design-guide.html#NAT

I don't know your firewall and NAT setup. But in networking if UDP12346 works with remote system_A, it does not mean that it will work with remote system_B.

HTH,
Please rate and mark as an accepted solution if you have found any of the information provided useful.

svemulap@cisco.com · ‎04-05-2023

hi maxnpj:

Just saw a response from Kanan Huseynil to your questions.
It is a good summary and the diff. between control plane and data-plane.
Let me/us know, if you still have any questions.

maxnpj · ‎03-27-2023

Kanan;

I'm going to try the port-offset and port-hopping; but I have another question....I know that BFD runs over IPSec ports, but which ports? 4500 only? 500 only? both 4500 and 500? I'm asking because as I mentioned I have two Zscaler IPSec tunnels that are up/up using port 4500 so I'm not sure what would cause the BFD IPSec tunnels to not come up if they are using the same ports. I'm not questioning your answer, I believe you and Svemulap are spot on with your answers, I want to better understand the difference between the two IPsec tunnels (the Zscaler tunnels and the tunnels used for BFD)

svemulap@cisco.com · ‎03-27-2023

hi maxnpj,

For BFD with Cisco/Viptela solution runs inside the data path and it is encrypted.
We defer from standards based to optimize the data-plane key exchange.
We eliminate IKE altogether and leverage vSmart to reflect the keys. More scalable.
This is at a high level concept. Take a look at:
https://www.cisco.com/c/en/us/td/docs/routers/sdwan/configuration/security/vedge/security-book/security-overview.html

The IPSec ports that you are referring to are based on standards.
IKE with UDP port 500 and IPSEC with UDP port 4500
We use these ports when connecting to third party IPSec Tunnels.
It could be Zscaler, Palo Alto, or any one.

HTH

Kanan Huseynli · ‎03-28-2023

Hi @maxnpj ,

UDP 500 (ISAKMP) and UDP4500 (NAT traversal) are used in standard IPSec solutions where ISAKMP protocol is needed (zscaler, umbrella, 3rd party S2S VPN, Cloud S2S etc.).

In SD-WAN, security architecture for overlay is IKEless i.e no phase1 step, devices directly begin to send each other encrypted traffic (they rely on information that received from authenticated controller - vSmart, vSmart distributes IPSec keys). Details can be found here:

Data Plane Security Overview

https://www.cisco.com/c/en/us/td/docs/routers/sdwan/configuration/security/ios-xe-17/security-book-xe/security-overview.html#c_Data_Plane_Security_Overview_12213.xml

Regarding, port numbers UDP12346 is the IPSec transport protocol/port as base. If port-offset is configured, value is added to this base port number. And port-hopping adds 20 each time port-hopping happens. See "Firewall Port Considerations" from SD-WAN CVD:

https://www.cisco.com/c/en/us/td/docs/solutions/CVD/SDWAN/cisco-sdwan-design-guide.html

HTH,
Please rate and mark as an accepted solution if you have found any of the information provided useful.