cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
10294
Views
7
Helpful
15
Replies

cEdge stuck in state "connect", DCONFAIL

rudimocnik
Level 1
Level 1

Hi

 

I've got a cEdge (C1111-4PLTEEA)  running version 16.12.3 IOS-XE SDWAN. All controllers are running 19.2.2. I am using Cisco automated certificates on controllers and onbox certificate option for hardware.

 

I am struglling to make this router talk to vSmart and vManage. I've checked several things:

  • clock matches with controllers
  • whitelist on vManage and vSmart has the correct serial number and chassis number. Org name is also correct.
  • local properties on cEdge are fine 
  • certificate is installed
  • root certificate is installed
  • i can ping all public IPs of controllers
  • color is public on all controllers and cEdge

 

PEER                      PEER                                                                            
PEER     PEER     PEER             SITE        DOMAIN PEER             PRIVATE  PEER             PUBLIC                                   LOCAL      REMOTE     REPEAT               
TYPE     PROTOCOL SYSTEM IP        ID          ID     PRIVATE IP       PORT     PUBLIC IP        PORT    LOCAL COLOR      STATE           ERROR      ERROR      COUNT DOWNTIME       
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vmanage  dtls     1.1.1.1          1000        0      172.29.28.10     12446    193.xx.xx.100   36060   public-internet  connect         DCONFAIL   NOERR      1     2020-05-11T15:46:09+0200
vsmart   dtls     1.1.1.3          1000        1      172.29.28.11     12446    193.xx.xx.102   9899    public-internet  connect         DCONFAIL   NOERR      1     2020-05-11T15:46:09+0200
PEER    PEER PEER            SITE       DOMAIN PEER                                    PRIV  PEER                                    PUB                                           GROUP      
TYPE    PROT SYSTEM IP       ID         ID     PRIVATE IP                              PORT  PUBLIC IP                               PORT  LOCAL COLOR     PROXY STATE UPTIME      ID         
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vsmart  dtls 1.1.1.3         1000       1      172.29.28.11                            12446 193.xx.xx.102                          9899  public-internet No    connect            0           
vbond   dtls 0.0.0.0         0          0      193.xx.xx.101                          12346 193.xx.xx.101                          12346 public-internet -     up     0:00:10:54  0           
vmanage dtls 1.1.1.1         1000       0      172.29.28.10                            12446 193.xx.xx.100                          36060 public-internet No    connect            0 

I am not sure what am I missing. If vBond could establish connection why are vManage and vSmart not working huh?

Any ideas on how to troubleshoot this? Is there a way to do tcpdump on cEdge?

 

Rudi

15 Replies 15

ekhabaro
Cisco Employee
Cisco Employee

Hi there. DCONFAIL is always about connectivity, e.g.: routing incorrect, NAT is not configured, firewalls filtering and so on. Troubleshooting approach should be exactly the same as for vEdge routers. More here:https://www.cisco.com/c/en/us/support/docs/routers/sd-wan/214509-troubleshoot-control-connections.html

Hi

I've already checked all those tips and could not make it work. The router uses a Cellular interface which is impossible to do a capture on using EPC since the Cellular interface is not supported (bug opened CSCvm14612). I  am not sure what else can I do to troubleshoot client side. 

It looks like ISP is performing somekind of NAT since IP on the cellular interface is not the same as the one hitting my controllers. 1) Could that be an issue if I am not running stun server on vBond?

 

2) Can you confirm that vManage and vSmart initiate connection to the router rather than router initiating connection to vManage and vSmart. Please see the output of tcpdump on vManage and vSmart. Note that vBond is fine and the connection is in up state.

 

vSmart-1# tcpdump vpn 0 interface eth1 | i internet
18:38:04.314216 IP 172.29.28.11.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16
18:38:04.314252 IP 172.29.28.11.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16
18:38:09.324816 IP 172.29.28.11.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16
18:38:14.339803 IP 172.29.28.11.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16
18:38:14.940539 IP 172.29.28.11.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16

 

vManage# tcpdump vpn 0 interface eth1 | i internet 
18:35:28.503389 IP 172.29.28.10.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16
18:35:30.845409 IP 172.29.28.10.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16
18:35:30.845435 IP 172.29.28.10.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16
18:35:35.860776 IP 172.29.28.10.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16
18:35:40.872419 IP 172.29.28.10.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16
18:35:43.649331 IP 172.29.28.10.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16

I am assuming this is the challenge packet that the router never replies due to being "lost" in the ISP network. 

Also check this out. the ip above is 188.XX.XX.85 however, the router local properties show this:

...
...
 PUBLIC          PUBLIC PRIVATE         PRIVATE                                 PRIVATE                              MAX   RESTRICT/           LAST         SPI TIME    NAT  VM
INTERFACE                IPv4            PORT   IPv4            IPv6                                    PORT    VS/VM COLOR            STATE CNTRL CONTROL/     LR/LB  CONNECTION   REMAINING   TYPE CON
                                                                                                                                                   STUN                                              PRF
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Cellular0/2/0            188.XX.XX.85   1502   100.XX.XX.85    ::                                      12406    0/0  public-internet  up     2      no/yes/no   No/No  0:00:00:02   0:10:41:51  N    5  

Any ideas?

 

Rudi

 

1) yes, it can be, but usually it causing issues with data plane tunnels rather
2) just did capture to confirm, here is my lab output where .217 is edge router just reloaded:

09:25:03.960722 IP 192.168.20.213.12346 > 192.168.109.217.30252: UDP, length 16
09:25:03.963853 IP 192.168.109.217.30252 > 192.168.20.213.12346: UDP, length 154
09:25:03.964012 IP 192.168.20.213.12346 > 192.168.109.217.30252: UDP, length 48
09:25:04.020408 IP 192.168.109.217.30252 > 192.168.20.213.12346: UDP, length 174

You can configure ACL on Cellular interface to count packets to make sure they are reaching your router. Can you just do very basic thing: can you ping public router ip from the vSmart? Do you use NAT on the edgre router as well (for DIA for example)?

So I reconfigured the router and used wired conenction. Then, I was able to use EPC and got the result. So I found out that FW was blocking DTLS Hello packets destined to vManage and vSmart. @ekhabaro can you elaborate on the ports these routers use. Mine used these:

 

A DTLS Client hello packet going from router --> vmanage has source UDP port 12406 and destination port 33888

A DTLS Client hello packet going from router --> vsmart has source UDP port 12406 and destination port 1894

 

I am a bit confused on what ports should I open on the firewall. I thought I only need UDP 12346. Can you please advise me what ports to limit. 

 

Rudi

@ekhabaro to answer your questions.

Yes I can ping any public IP on any controller. Note tho, they are all behind firewall with 1:1 public/private IP mapping. Currently, there are no firewall rules for the source IP I am trying to connect from. To answer you second question; no, I am not using NAT on my router. I simply pluged it into my home modem and it's got a public ip. I can see the same ip as on the interface hitting the controllers so there is no nat on the ISP side. 

 

Note that I can successfuly build control connections with vBond so I am puzzeled on why vManage and vSmart have issues. See the vbond_handshake.png attached. No issues there.

 

vBond is .101

vManage is .100

vSmart .102

router .12

 

Then the router tries to connect to vManage/vSmart (see captures attached). I can see client hello message sent by the router. However, nothing is coming back. The cotrollers are indeed getting those client hello messages, they are just not responding (see below). They keep shooting these short "Len=16" UDP messages. I am not sure what are these, but they contain almost no data.

 

Here is a tcpdump from vManage to prove the point that firewall is not blocking any traffic and messages are hitting the controllers. Additionaly, I confirmed that in the FW console where I cannot find anything blocked traffic from or to the routers ip.

 

 

vManage# tcpdump vpn 0 interface eth1 | i abc
15:14:39.484312 IP 172.29.28.10.12446 > 93-xx-xx-12.abc.12406: UDP, length 16
15:14:44.495975 IP 172.29.28.10.12446 > 93-xx-xx-12.abc.12406: UDP, length 16
15:14:48.068752 IP 172.29.28.10.12446 > 93-xx-xx-12.abc.12406: UDP, length 16
15:14:48.081467 IP 93-xx-xx-12.abc.12406 > 172.29.28.10.36496: UDP, length 144
15:14:49.504772 IP 172.29.28.10.12446 > 93-xx-xx-12.abc.12406: UDP, length 16
15:14:49.504826 IP 172.29.28.10.12446 > 93-xx-xx-12.abc.12406: UDP, length 16
15:14:51.128740 IP 93-xx-xx-12.abc.12406 > 172.29.28.10.36496: UDP, length 144
15:14:54.514592 IP 172.29.28.10.12446 > 93-xx-xx-12.abc.12406: UDP, length 16
15:14:57.142641 IP 93-xx-xx-12.abc.12406 > 172.29.28.10.36496: UDP, length 144
15:14:59.521085 IP 172.29.28.10.12446 > 93-xx-xx-12.abc.12406: UDP, length 16
15:15:04.527726 IP 172.29.28.10.12446 > 93-xx-xx-12.abc.12406: UDP, length 16

I assume these 144 long UDP packets are client hello messages sent by the router. Note that .12 is the router and .10 is vManage private IP. It seems that the vManage just doesn't give a **bleep** about the client hello messages. 

 

Am I hitting a bug here? 

I am running: 

vManage 19.2.2

vBond, vSmart 19.2.1 (tried with 19.2.2 same thing, did not work)

router IOS-XE SDWAN 16.12.3 (also tried with a lower version, no luck)

 

*all controllers and the router have public color assigned (no restrict)

 

compatibility matrix shows no isses with versions

https://www.cisco.com/c/en/us/solutions/enterprise-networks/sd-wan/compatibility-matrix.html

 

What versions are you running? Do you have any more ideas?

 

Rudi

 

 

Just to follow up on the issue.

I've managed to solve the problem with some help and it turned out to be a NAT problem.

Communication from vManage --> vBond and vSmart --> vBond must be NAT-ed in a way to change the source IP of the vManage/vSmart into a public IP AND you need to preserve source port. Don't do PAT. The later is very important and that broke my connections with vManage and vSmart. This is why you can see weird port numbers hitting the controllers below

 

The vBond learned the wrong port numbers of vManage and vSmart and advertised those to routers. Routers then connected to a correct public IP but were hitting a wrong UDP port therefore connection didn't come up.

 

09:25:03.960722 IP 192.168.20.213.12346 > 192.168.109.217.30252: UDP, length 16
09:25:03.963853 IP 192.168.109.217.30252 > 192.168.20.213.12346: UDP, length 154
09:25:03.964012 IP 192.168.20.213.12346 > 192.168.109.217.30252: UDP, length 48
09:25:04.020408 IP 192.168.109.217.30252 > 192.168.20.213.12346: UDP, length 17

imortada
Cisco Employee
Cisco Employee

my cEdge router is not able to establish a connection due to this error.: 

*Apr 7 14:15:12.842: %Cisco-SDWAN-RP_0-vdaemon-6-INFO-1400002: Notification: 4/7/2023 14:15:12 control-connection-auth-fail severity-level:major host-name:"Router" system-ip:100.0.0.1 personality:vedge peer-type:vbond peer-system-ip::: local-system-ip:100.0.0.1 local-color:biz-internet reason:"ERR_CERT_VER_FAIL"

 

how I can fix it. I saw many reply on vedge which is different to cedge. Appreciate your help!!

Hi @imortada ,

ERR_CERT_VER_FAIL  - means received certificate is not valid. You should ensure that certificate passes validation (the same organization name , time sync , root cert is installed).

HTH,
Please rate and mark as an accepted solution if you have found any of the information provided useful.

Router#show sdwan control local-properties | include chassis-num|serial-num
chassis-num/unique-id IR1101-K9-FCW2449PBST
serial-num 080760531384706592BC
subject-serial-num FCW2449PBST
enterprise-serial-num No certificate installed

it shows the cert is not installed, it means the root cert here or what another cert should be?

 

Router#request platform software sd root-cert-chain install boo
Router#$tform software sd root-cert-chain install bootflash:sdwan/RootCA.crt
Uploading root-ca-cert-chain via VPN 0
Changing ownership of vedge_certs to binos...
Changing ownership of /usr/share/viptela/backup_certs to binos.
Copying /bootflash/sdwan/RootCA.crt to /tmp/vconfd/root-ca.crt.tmp via VPN 0
Changing ownership of /usr/share/viptela/backup_certs to binos.
Moving /tmp/vconfd/root-ca.crt.tmp to /usr/share/viptela/root-ca.crt via VPN 0
Updating the root certificate chain..
send_install_root_ca_crt_chain_notification
Successfully installed the root certificate chain
Successfully installed the root certificate chain Warn: Use /bootflash/sdwan - any other path support will be deprecated.
Filename /flash1/sdwan/RootCA.crt will be required to be located in /bootflash/sdwan directory. PLATFORM_TYPE "cedge"
Use /bootflash/sdwan - any other path support will be deprecated.
Filename /flash1/sdwan/RootCA.crt will be required to be located in /bootflash/sdwan directory. PLATFORM_TYPE "cedge"

 

imortada
Cisco Employee
Cisco Employee

working now after changing the rootca on the router. thanks, Kanan Huseynli.

 

Hi,

Router#show sdwan control local-properties | include chassis-num|serial-num
chassis-num/unique-id IR1101-K9-FCW2449PBST
serial-num 080760531384706592BC
subject-serial-num FCW2449PBST
enterprise-serial-num No certificate installed

here, bold line indicates enterprise certificate (besides on-board certificate of router, there can be enterprise certificate also). For root certificate there is separate line and as I see after installing it, everything works. Glad that my comment helped

HTH,
Please rate and mark as an accepted solution if you have found any of the information provided useful.

imortada
Cisco Employee
Cisco Employee

Do i have to install the Enterprise cert as well on the router after exporting it from vManage? 

imortada
Cisco Employee
Cisco Employee

in addition, i don't see the connection to the dedge router, even the device show up in vmanage.

vsmart# show control connections
PEER PEER
PEER PEER PEER SITE DOMAIN PEER PRIV PEER PUB
INDEX TYPE PROT SYSTEM IP ID ID PRIVATE IP PORT PUBLIC IP PORT ORGANIZATION REMOTE COLOR STATE UPTIME
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 vbond dtls 1.1.1.3 0 0 172.27.167.86 12346 172.27.167.86 12346 Cisco-systems default up 0:10:51:44
0 vmanage dtls 1.1.1.1 100 0 172.27.167.69 12346 172.27.167.69 12346 Cisco-systems default up 0:10:51:00
1 vbond dtls 1.1.1.3 0 0 172.27.167.86 12346 172.27.167.86 12346 Cisco-systems default up 0:10:51:41

vsmart#

 

 

nothing show up on vBond:

 

vbond# show control connections

vbond#
vbond#
vbond#

 

vmanage# show control connections
PEER PEER PEER
PEER PEER PEER CONFIGURED SITE DOMAIN PEER PRIV PEER PUB
INDEX TYPE PROT SYSTEM IP SYSTEM IP ID ID PRIVATE IP PORT PUBLIC IP PORT ORGANIZATION REMOTE COLOR STATE UPTIME
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 vedge dtls 100.0.0.1 100.0.0.1 100 1 172.27.167.66 12426 172.27.167.66 12426 Cisco-systems public-internet up 0:05:34:40
0 vsmart dtls 1.1.1.2 1.1.1.2 100 1 172.27.167.87 12346 172.27.167.87 12346 Cisco-systems default up 0:10:53:11
0 vbond dtls 1.1.1.3 1.1.1.3 0 0 172.27.167.86 12346 172.27.167.86 12346 Cisco-systems default up 0:10:53:11
1 vbond dtls 0.0.0.0 - 0 0 172.27.167.86 12346 172.27.167.86 12346 Cisco-systems default up 0:10:53:12

 

 

Hi,

 

Do i have to install the Enterprise cert as well on the router after exporting it from vManage? 

enterprise cert is needed if in vmanage settings for hardware box-es you want to use enterprise CA option. I don't know your configuration.

vbond# show control connections

on vBond, replace keywrod "control" with "orchestrator". True command is : show orchestrator connections

 

HTH,
Please rate and mark as an accepted solution if you have found any of the information provided useful.