cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Announcements

Choose one of the topics below for SD-WAN Resources to help you on your journey with SD-WAN

This community is for technical, feature, configuration and deployment questions.
For production deployment issues, please contact the TAC!
We will not comment or assist with your TAC case in these forums.

752
Views
0
Helpful
6
Replies
Highlighted
Beginner

cEdge stuck in state "connect", DCONFAIL

Hi

 

I've got a cEdge (C1111-4PLTEEA)  running version 16.12.3 IOS-XE SDWAN. All controllers are running 19.2.2. I am using Cisco automated certificates on controllers and onbox certificate option for hardware.

 

I am struglling to make this router talk to vSmart and vManage. I've checked several things:

  • clock matches with controllers
  • whitelist on vManage and vSmart has the correct serial number and chassis number. Org name is also correct.
  • local properties on cEdge are fine 
  • certificate is installed
  • root certificate is installed
  • i can ping all public IPs of controllers
  • color is public on all controllers and cEdge

 

PEER                      PEER                                                                            
PEER     PEER     PEER             SITE        DOMAIN PEER             PRIVATE  PEER             PUBLIC                                   LOCAL      REMOTE     REPEAT               
TYPE     PROTOCOL SYSTEM IP        ID          ID     PRIVATE IP       PORT     PUBLIC IP        PORT    LOCAL COLOR      STATE           ERROR      ERROR      COUNT DOWNTIME       
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vmanage  dtls     1.1.1.1          1000        0      172.29.28.10     12446    193.xx.xx.100   36060   public-internet  connect         DCONFAIL   NOERR      1     2020-05-11T15:46:09+0200
vsmart   dtls     1.1.1.3          1000        1      172.29.28.11     12446    193.xx.xx.102   9899    public-internet  connect         DCONFAIL   NOERR      1     2020-05-11T15:46:09+0200
PEER    PEER PEER            SITE       DOMAIN PEER                                    PRIV  PEER                                    PUB                                           GROUP      
TYPE    PROT SYSTEM IP       ID         ID     PRIVATE IP                              PORT  PUBLIC IP                               PORT  LOCAL COLOR     PROXY STATE UPTIME      ID         
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vsmart  dtls 1.1.1.3         1000       1      172.29.28.11                            12446 193.xx.xx.102                          9899  public-internet No    connect            0           
vbond   dtls 0.0.0.0         0          0      193.xx.xx.101                          12346 193.xx.xx.101                          12346 public-internet -     up     0:00:10:54  0           
vmanage dtls 1.1.1.1         1000       0      172.29.28.10                            12446 193.xx.xx.100                          36060 public-internet No    connect            0 

I am not sure what am I missing. If vBond could establish connection why are vManage and vSmart not working huh?

Any ideas on how to troubleshoot this? Is there a way to do tcpdump on cEdge?

 

Rudi

6 REPLIES 6
Highlighted
Cisco Employee

Hi there. DCONFAIL is always about connectivity, e.g.: routing incorrect, NAT is not configured, firewalls filtering and so on. Troubleshooting approach should be exactly the same as for vEdge routers. More here:https://www.cisco.com/c/en/us/support/docs/routers/sd-wan/214509-troubleshoot-control-connections.html

Highlighted

Hi

I've already checked all those tips and could not make it work. The router uses a Cellular interface which is impossible to do a capture on using EPC since the Cellular interface is not supported (bug opened CSCvm14612). I  am not sure what else can I do to troubleshoot client side. 

It looks like ISP is performing somekind of NAT since IP on the cellular interface is not the same as the one hitting my controllers. 1) Could that be an issue if I am not running stun server on vBond?

 

2) Can you confirm that vManage and vSmart initiate connection to the router rather than router initiating connection to vManage and vSmart. Please see the output of tcpdump on vManage and vSmart. Note that vBond is fine and the connection is in up state.

 

vSmart-1# tcpdump vpn 0 interface eth1 | i internet
18:38:04.314216 IP 172.29.28.11.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16
18:38:04.314252 IP 172.29.28.11.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16
18:38:09.324816 IP 172.29.28.11.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16
18:38:14.339803 IP 172.29.28.11.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16
18:38:14.940539 IP 172.29.28.11.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16

 

vManage# tcpdump vpn 0 interface eth1 | i internet 
18:35:28.503389 IP 172.29.28.10.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16
18:35:30.845409 IP 172.29.28.10.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16
18:35:30.845435 IP 172.29.28.10.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16
18:35:35.860776 IP 172.29.28.10.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16
18:35:40.872419 IP 172.29.28.10.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16
18:35:43.649331 IP 172.29.28.10.12446 > internet-188-XX-XX-85.abcd.si.8872: UDP, length 16

I am assuming this is the challenge packet that the router never replies due to being "lost" in the ISP network. 

Also check this out. the ip above is 188.XX.XX.85 however, the router local properties show this:

...
...
 PUBLIC          PUBLIC PRIVATE         PRIVATE                                 PRIVATE                              MAX   RESTRICT/           LAST         SPI TIME    NAT  VM
INTERFACE                IPv4            PORT   IPv4            IPv6                                    PORT    VS/VM COLOR            STATE CNTRL CONTROL/     LR/LB  CONNECTION   REMAINING   TYPE CON
                                                                                                                                                   STUN                                              PRF
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Cellular0/2/0            188.XX.XX.85   1502   100.XX.XX.85    ::                                      12406    0/0  public-internet  up     2      no/yes/no   No/No  0:00:00:02   0:10:41:51  N    5  

Any ideas?

 

Rudi

 

Highlighted

1) yes, it can be, but usually it causing issues with data plane tunnels rather
2) just did capture to confirm, here is my lab output where .217 is edge router just reloaded:

09:25:03.960722 IP 192.168.20.213.12346 > 192.168.109.217.30252: UDP, length 16
09:25:03.963853 IP 192.168.109.217.30252 > 192.168.20.213.12346: UDP, length 154
09:25:03.964012 IP 192.168.20.213.12346 > 192.168.109.217.30252: UDP, length 48
09:25:04.020408 IP 192.168.109.217.30252 > 192.168.20.213.12346: UDP, length 174

You can configure ACL on Cellular interface to count packets to make sure they are reaching your router. Can you just do very basic thing: can you ping public router ip from the vSmart? Do you use NAT on the edgre router as well (for DIA for example)?
Highlighted

So I reconfigured the router and used wired conenction. Then, I was able to use EPC and got the result. So I found out that FW was blocking DTLS Hello packets destined to vManage and vSmart. @ekhabaro can you elaborate on the ports these routers use. Mine used these:

 

A DTLS Client hello packet going from router --> vmanage has source UDP port 12406 and destination port 33888

A DTLS Client hello packet going from router --> vsmart has source UDP port 12406 and destination port 1894

 

I am a bit confused on what ports should I open on the firewall. I thought I only need UDP 12346. Can you please advise me what ports to limit. 

 

Rudi

Highlighted

@ekhabaro to answer your questions.

Yes I can ping any public IP on any controller. Note tho, they are all behind firewall with 1:1 public/private IP mapping. Currently, there are no firewall rules for the source IP I am trying to connect from. To answer you second question; no, I am not using NAT on my router. I simply pluged it into my home modem and it's got a public ip. I can see the same ip as on the interface hitting the controllers so there is no nat on the ISP side. 

 

Note that I can successfuly build control connections with vBond so I am puzzeled on why vManage and vSmart have issues. See the vbond_handshake.png attached. No issues there.

 

vBond is .101

vManage is .100

vSmart .102

router .12

 

Then the router tries to connect to vManage/vSmart (see captures attached). I can see client hello message sent by the router. However, nothing is coming back. The cotrollers are indeed getting those client hello messages, they are just not responding (see below). They keep shooting these short "Len=16" UDP messages. I am not sure what are these, but they contain almost no data.

 

Here is a tcpdump from vManage to prove the point that firewall is not blocking any traffic and messages are hitting the controllers. Additionaly, I confirmed that in the FW console where I cannot find anything blocked traffic from or to the routers ip.

 

 

vManage# tcpdump vpn 0 interface eth1 | i abc
15:14:39.484312 IP 172.29.28.10.12446 > 93-xx-xx-12.abc.12406: UDP, length 16
15:14:44.495975 IP 172.29.28.10.12446 > 93-xx-xx-12.abc.12406: UDP, length 16
15:14:48.068752 IP 172.29.28.10.12446 > 93-xx-xx-12.abc.12406: UDP, length 16
15:14:48.081467 IP 93-xx-xx-12.abc.12406 > 172.29.28.10.36496: UDP, length 144
15:14:49.504772 IP 172.29.28.10.12446 > 93-xx-xx-12.abc.12406: UDP, length 16
15:14:49.504826 IP 172.29.28.10.12446 > 93-xx-xx-12.abc.12406: UDP, length 16
15:14:51.128740 IP 93-xx-xx-12.abc.12406 > 172.29.28.10.36496: UDP, length 144
15:14:54.514592 IP 172.29.28.10.12446 > 93-xx-xx-12.abc.12406: UDP, length 16
15:14:57.142641 IP 93-xx-xx-12.abc.12406 > 172.29.28.10.36496: UDP, length 144
15:14:59.521085 IP 172.29.28.10.12446 > 93-xx-xx-12.abc.12406: UDP, length 16
15:15:04.527726 IP 172.29.28.10.12446 > 93-xx-xx-12.abc.12406: UDP, length 16

I assume these 144 long UDP packets are client hello messages sent by the router. Note that .12 is the router and .10 is vManage private IP. It seems that the vManage just doesn't give a **bleep** about the client hello messages. 

 

Am I hitting a bug here? 

I am running: 

vManage 19.2.2

vBond, vSmart 19.2.1 (tried with 19.2.2 same thing, did not work)

router IOS-XE SDWAN 16.12.3 (also tried with a lower version, no luck)

 

*all controllers and the router have public color assigned (no restrict)

 

compatibility matrix shows no isses with versions

https://www.cisco.com/c/en/us/solutions/enterprise-networks/sd-wan/compatibility-matrix.html

 

What versions are you running? Do you have any more ideas?

 

Rudi

 

 

Highlighted

Just to follow up on the issue.

I've managed to solve the problem with some help and it turned out to be a NAT problem.

Communication from vManage --> vBond and vSmart --> vBond must be NAT-ed in a way to change the source IP of the vManage/vSmart into a public IP AND you need to preserve source port. Don't do PAT. The later is very important and that broke my connections with vManage and vSmart. This is why you can see weird port numbers hitting the controllers below

 

The vBond learned the wrong port numbers of vManage and vSmart and advertised those to routers. Routers then connected to a correct public IP but were hitting a wrong UDP port therefore connection didn't come up.

 

09:25:03.960722 IP 192.168.20.213.12346 > 192.168.109.217.30252: UDP, length 16
09:25:03.963853 IP 192.168.109.217.30252 > 192.168.20.213.12346: UDP, length 154
09:25:03.964012 IP 192.168.20.213.12346 > 192.168.109.217.30252: UDP, length 48
09:25:04.020408 IP 192.168.109.217.30252 > 192.168.20.213.12346: UDP, length 17