Solved: Cedge ZBFW and control connections/sdwan bfd sessions bestpractice

csco10260962 · ‎05-16-2024

What are bestpractices when using ZBFW in SDWAN with cedge in regards to selfzone and the controlconnections to vsmart/vmanage and the sdwan bfd sessions. ?

Last couple of weeks we see controlconnections getting droped to vsmarts on all tlocs and sdwan bfd sessions timing out.

Vmanage control connections stays up. Only way to resolve this is ssh via vmanage gui to cedge and reload router or reset the vpn0 interfaces to get control connections back up and restablish sdwan bfd sessions.

ZBFW is als in place with any tot any on to datacenter with service vpn id's and local service vpn interfaces on rouer in zone pairs. And also local selfzone poiicy voor ping to router and bootp/bootps with selfzone and service vpn interfaces in zonepairs.

Should there also be a policy for control connections any any with selfzone and vpn0 interfaces in the zonepairs ? As the overlay does get established at the moment ? But in local logging i sometimes see src: 10.14.1.6/12346 dst: 10.14.247.242/12346 proto: 17 tos: 192 inbound-acl, Implicit-ACL, Result: denyPkt SDWAN_SERV_UDP count: 1 bytes: 58 Ingress-Intf: GigabitEthernet0/0/0 Egress-intf: GigabitEthernet0/0/0
src: 10.14.1.7/12346 dst: 10.14.247.242/12346 proto: 17 tos: 192 inbound-acl, Implicit-ACL, Result: denyPkt SDWAN_SERV_UDP count: 1 bytes: 58 Ingress-Intf: GigabitEthernet0/0/0 Egress-intf: cpu
src: 10.14.1.2/12746 dst: 10.14.247.242/12346 proto: 17 tos: 192 inbound-acl, Implicit-ACL, Result: denyPkt SDWAN_SERV_UDP count: 1 bytes: 58 Ingress-Intf: GigabitEthernet0/0/0 Egress-intf: GigabitEthernet0/0/0
src: 10.14.0.40/12346 dst: 10.14.247.242/12346 proto: 17 tos: 0 inbound-acl, Implicit-ACL, Result: denyPkt SDWAN_SERV_UDP count: 15 bytes: 2724 Ingress-Intf: GigabitEthernet0/0/0 Egress-intf: cpu getting dropped.

10.14.1.7 is a vsmart for example

csco10260962 · ‎05-29-2024

One possible solution is to turn of port hopping under the vpn0 tunnel interfaces feauture template if the tloc is not behind a nat device. In our case it is probably caused by having a fqdn for the vbonds on the vsmarts and cedges for the vbond controllers. And regardless if the vpn0 interfaces can reach the dns servers to resolve the fqdn of the vbond controllers. It can happen due to asymmetric routing, firewalling or that the internal dns reslover of vedges and cedges sometimes does not even try to resolve the fqdn of the vbonds (looks like the client resolver sometimes just fails to resolve anything while it can reach the dns servers) The best way to prevent the intermittent drops of control sessions and bfd sessions is to add static host to ip mapping on vmanage,vsmarts, cedges and vedges under the vpn0 config for the fqdn entry of your vbond config used under the system part. And thos should resolve the issue. We adjusted this under VPN0 feauture template used for all cedges,vedges,vsmarts and vmanage and everything looks stable now for the last few days

View solution in original post

MHM Cisco World · ‎05-18-2024

Hi friend'

Can you more elaborate

All Vedge/cedge lost connection to vsmart ?

Can i see

Show sdwan peer <- when issue appear

MHM

csco10260962 · ‎05-19-2024

Yes when it occurs the cedge looses connections to vsmarts. I'll givbe you output when it happens again. But disconnect reason is always timeout to vsmarts. Could be port hop issue. We use ondemand dynamic tunneling to datacenter that are two separate sites interconnected via datacentercore via DCI's. And decentral sites can also connect to other decntral sites directly via datacenter hub for intial tunnel setup to conenct directly

csco10260962 · ‎05-29-2024

One possible solution is to turn of port hopping under the vpn0 tunnel interfaces feauture template if the tloc is not behind a nat device. In our case it is probably caused by having a fqdn for the vbonds on the vsmarts and cedges for the vbond controllers. And regardless if the vpn0 interfaces can reach the dns servers to resolve the fqdn of the vbond controllers. It can happen due to asymmetric routing, firewalling or that the internal dns reslover of vedges and cedges sometimes does not even try to resolve the fqdn of the vbonds (looks like the client resolver sometimes just fails to resolve anything while it can reach the dns servers) The best way to prevent the intermittent drops of control sessions and bfd sessions is to add static host to ip mapping on vmanage,vsmarts, cedges and vedges under the vpn0 config for the fqdn entry of your vbond config used under the system part. And thos should resolve the issue. We adjusted this under VPN0 feauture template used for all cedges,vedges,vsmarts and vmanage and everything looks stable now for the last few days

csco10260962 · ‎05-29-2024

This was confirmed by TAC as a best practice.