cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
94458
Views
46
Helpful
11
Comments
svemulap@cisco.com
Cisco Employee
Cisco Employee

This article describes control connection problems that might arise between SD-WAN routers (cEdge / vEdge) and vSmart controllers and between SDWAN routers (cEdge / vEdge) and vManage NMSs when you are bringing up these devices.  All the troubleshooting steps below apply to both vEdge and cEdge routers, but all the captures are taken from vEdge Router.

 

Check Control Connection Status

To check the status of the control connections of all SD-WAN routers, in the vManage Dashboard, view the Control Status pane. Click any row to display a table with device details.

To check the status of a single vEdge router's control connections, in vManage NMS, select Monitor ► Network, locate the desired vEdge router, and click its hostname. In the left pane, click Control Connections.

To display active control connections from the CLI, issue the show control connections [vEdge] or show sdwan control connections [cEdge] command. If a control connection is not listed in the command output, that connection is not operational.

If the vManage NMS screens or the command output indicates connection problems between a vEdge router and a vSmart controller or vManage NMS, see the sections below to troubleshoot the problem.

General checks

Before starting to troubleshoot, make sure to confirm the SD-WAN router in question has been configured properly.

It includes:

  • Valid Certificate has been installed.
  • Following configs put in place under “system” block
    • System-IP
    • Site-ID,
    • Organization-Name
    • vBond address
  • VPN 0 Transport interface is configured with Tunnel option and IP address
  • System Clock is configured correctly on the vEdge matching with other devices/controllers
    • show clock will confirm the current time set
    • Use clock set to set the right time on the device.

If the portion of the vEdge router's configuration that establishes control connections is correct such that control connections are up, the show control local-properties command output looks similar to this example, in Releases 16.3 and later:

 

vEdge# show control local-properties
personality                  vedge
sp-organization-name Viptela, Inc. organization-name Viptela, Inc. certificate-status Installed root-ca-chain-status Installed certificate-validity Valid certificate-not-valid-before Sep 06 22:39:01 2016 GMT certificate-not-valid-after Sep 06 22:39:01 2017 GMT dns-name trainingvbond.viptela.com site-id 10 domain-id 1 protocol dtls tls-port 0 system-ip 172.1.10.1 chassis-num/unique-id 66cb2a8b-2eeb-479b-83d0-0682b64d8190 serial-num 12345718 vsmart-list-version 0 keygen-interval 1:00:00:00 retry-interval 0:00:00:17 no-activity-exp-interval 0:00:00:12 dns-cache-ttl 0:00:02:00 port-hopped TRUE time-since-last-port-hop 20:16:24:43 number-vbond-peers 0 number-active-wan-interfaces 1 NAT TYPE: E -- indicates End-point independent mapping A -- indicates Address-port dependent mapping N -- indicates Not learned Note: Requires minimum two vbonds to learn the NAT type PUBLIC PUBLIC PRIVATE PRIVATE PRIVATE MAX RESTRICT/ LAST SPI TIME NAT VM INTERFACE IPv4 PORT IPv4 IPv6 PORT VS/VM COLOR STATE CNTRL CONTROL/ LR/LB CONNECTION REMAINING TYPE CON STUN PRF -------------------------------------------------------------------------------------------------------------------------------------------------------------- ge0/4 73.241.233.20 12386 192.168.0.20 2601:647:4380:ca75::c2 12386 2/1 public-internet up 2 no/yes/no No/Yes 0:10:34:16 0:03:03:26 E 5

 

If the control connections are down, debug the vEdge router's configuration as discussed in the sections below.

Check that a Valid Certificate Is Installed on the vEdge Router

A valid certificate must be installed on the vEdge router.

To view the status of the router's certificate, in vManage NMS, select the Configuration ► Certificates screen, and select the vEdge List tab. In the table, the Validate column should be green, to indicate that the certificate is valid, and the State column should show a green icon. From the CLI, use the show control local-properties command. 

Check the vEdge Router Configuration

The vEdge router configuration must include a system IP address, a site ID, an organization name, and a vBond orchestrator IP address or DNS name.

To view the device's running configuration in vManage NMS:

  1. Select the Configuration ► Devices screen.
  2. Select the vEdge List tab.
  3. For the desired vEdge router, click the More Actions icon to the right of the row.
  4. Click Running Configuration.

To view the device's running configuration from the CLI, use the show running-config [vEdge] or show sdwan running-config [cEdge] command.

Check the Clock Time

The vEdge router's clock must have the same time configured as other devices in the overlay network.

To view the system time on the device, use the CLI command show clock

Check for Routing Issues

Control connections might not come up if the overlay network has routing issues. If this is the case, the State column in the show control connections command output has a value of "connect", which indicates that a connection attempt is in progress. If the control connection is up, the State column shows a value of "up".

                                                                        PEER                    PEER                                                 CONTROLLER
PEER       PEER       PEER         SITE      DOMAIN      PEER           PRIVATE    PEER         PUBLIC                                               GROUP
TYPE       PROTOCOL   SYSTEM IP    ID        ID          PRIVATE IP     PORT       PUBLIC IP    PORT         LOCAL COLOR      STATE       UPTIME     ID
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
vbond      dtls       -             0          0         1.3.25.25      12346      1.3.25.25    12346         gold            connect                0
vbond      dtls       -             0          0         1.3.25.25      12346      1.3.25.25    12346         silver          connect                0

Check for routing issues as discussed in the sections below.

Check the Route Table for a Valid Route to the Next Hop

The route table (RIB) might not contain a valid route to the correct next hop.

To verify the entries in the route table in vManage NMS:

  1. Select the Monitor ► Network screen.
  2. Select the vEdge router by clicking its IP address.
  3. In the screen that opens, select Real Time in the left pane.
  4. In the Device Options drop-down, select IP Routes.

To verify the entries in the route table from the CLI, use the command show ip routes vpn 0 or show ip routes vpn 0 prefix/length.

Check for a Leaked TLOC IP Address

The TLOC IP address might be being leaked between upstream ISPs.

To verify connectivity, ping the default gateway. Check for correct distance values and protocols for the IP prefix.

Use Error Codes to Troubleshoot Control Connection Problems

If control connections are down, issue the show control connections-history [vEdge] or show sdwan control local-properties [cEdge] on CLI command on the vEdge router to display information about control plane connection attempts initiated by the router. The Local Error or Remote Error columns in the output report any errors that occurred with the connection initialization attempts. The following errors are related to issues related to configuration and establishing control tunnels:

 

  • BIDNTVRFD, CRTREJSER, SERNTPRES—Missing serial number
  • CTORGNMMIS—Organization name mismatch in the certificate
  • CRTVERFL—Certificate verification failure
  • DCONFAIL—DLTS connection failure
  • DISCVBD or SYSIPCHNG—Transient error conditions
  • DISTLOC—TLOC is disabled
  • LISFD—Socket errors
  • NOVMCFG—vEdge router template is not attached in vManage NMS
  • RDSIGFBD, TXCHTOBD—For a hardware vEdge router, the board identifier did not initialize
  • VB_TMO, VM_TMO, VP_TMO, VS_TMO—Peer timeout
  • VSCRTREV—Revoked or invalidated certificate

 

BIDNTVRFD, CRTREJSER, SERNTPRES: Missing Serial Number

Problem Statement

A device's serial number is missing from the vSmart controllers.

Identify the Problem

Issue the show control connections-history command. In the Local Error column of the output, the values BIDNTVRFD, CRTREJSER, and SERNTPRES indicate a missing serial number. BIDNTVRFD indicates a missing serial number for vBond orchestrators. CRTREJSER indicates a missing serial number for vEdge routers and vSmart controllers. SERNTPRES on a vBond orchestrator indicates a serial number mismatch between vSmart controllers.

                                                                    PEER                    PEER                                                                              
PEER     PEER       PEER          SITE      DOMAIN     PEER         PRIVATE   PEER          PUBLIC                                                      REPEAT               
TYPE     PROTOCOL   SYSTEM IP     ID        ID         PRIVATE IP   PORT      PUBLIC IP     PORT       LOCAL  COLOR    STATE        LOCAL/REMOTE        COUNT            DOWNTIME       
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vbond    dtls       -             0          0         1.4.30.30    12346     1.4.30.30     12346      mpls            tear_down    CRTREJSER  NOERR    161   2016-10-14T11:19:39-0700
vbond    dtls       -             0          0         1.4.30.30    12346     1.4.30.30     12346      silver          tear_down    CRTREJSER  NOERR    161   2016-10-14T11:19:38-0700
vbond    dtls       -             0          0         1.3.25.25    12346     1.3.25.25     12346      mpls            tear_down    CRTREJSER  NOERR    160   2016-10-14T11:19:22-0700
vbond    dtls       -             0          0         1.3.25.25    12346     1.3.25.25     12346      silver          tear_down    CRTREJSER  NOERR    160   2016-10-14T11:19:22-0700

Resolve the Problem

Send the device's serial number to the controllers:

  1. In vManage NMS, select the Configuration ► Certificates screen.
  2. In the vEdge List tab, select the device whose serial number is missing.
  3. Click Send to Controllers.

CTORGNMMIS: Organization Name Mismatch

Problem Statement

The organization name is not identical among all devices in the overlay network.

Identify the Problem

Issue the show control connections-history command. In the Local Error column of the output, the value CTORGNMMIS indicates an organization name mismatch in the overlay network.

                                                             PEER                   PEER                                                                             
PEER     PEER      PEER        SITE    DOMAIN   PEER         PRIVATE   PEER         PUBLIC                            LOCAL        REMOTE    REPEAT               
TYPE     PROTOCOL  SYSTEM IP   ID      ID       PRIVATE IP   PORT      PUBLIC IP    PORT    LOCAL COLOR   STATE       ERROR        ERROR     COUNT      DOWNTIME       
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vbond    dtls      -           0       0        1.3.25.25    12346     1.3.25.25    12346   mpls          tear_down   CTORGNMMIS   NOERR      19        2016-10-06T00:39:37+0000
vbond    dtls      -           0       0        1.3.25.25    12346     1.3.25.25    12346   gold          tear_down   CTORGNMMIS   NOERR      28        2016-10-06T10:39:20-0000

Resolve the Problem

To configure the correct organization name on every device in the overlay network, use the organization-name command.

 

CRTVERFL: Certificate Verification Failure

Problem Statement

Verification of the vEdge router's certificate failed.

Identify the Problem

Issue the show control connections-history command. In the Local Error column of the output, the value CRTVERFL indicates certificate verification failure.

                                                               PEER                  PEER                                                                             
PEER     PEER      PEER         SITE    DOMAIN   PEER          PRIVATE   PEER        PUBLIC                             LOCAL      REMOTE    REPEAT               
TYPE     PROTOCOL  SYSTEM IP    ID      ID       PRIVATE IP    PORT      PUBLIC IP   PORT    LOCAL COLOR    STATE       ERROR      ERROR     COUNT      DOWNTIME       
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vbond    dtls      -            0       0        1.3.25.25     12346     1.3.25.25   12346   mpls           tear_down   CRTVERFL   NOERR     142        2016-10-03T00:39:37+0000
vbond    dtls      -            0       0        1.3.25.25     12346     1.3.25.25   12346   gold           tear_down   CRTVERFL   NOERR     213        2016-10-03T10:39:20-0000

Resolve the Problem

 

DCONFAIL: DTLS Connection Failure

Problem Statement

The vEdge router does not establish DTLS connections to controllers in the overlay network.

Identify the Problem

Issue the show control connections-history command. In the Local Error column of the output, the value DCONFAIL indicates DTLS connection failure.

                                                                      PEER                   PEER                                                                             
PEER     PEER      PEER            SITE       DOMAIN   PEER           PRIVATE   PEER         PUBLIC                               LOCAL       REMOTE    REPEAT               
TYPE     PROTOCOL  SYSTEM IP       ID         ID       PRIVATE IP     PORT      PUBLIC IP    PORT    LOCAL COLOR      STATE       ERROR       ERROR     COUNT      DOWNTIME       
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vbond    dtls      -               0          0        1.3.25.25      12346     1.3.25.25    12346   mpls             connect     DCONFAIL    NOERR     1          2016-09-22T10:49:04-0700
vbond    dtls      -               0          0        1.3.25.25      12346     1.3.25.25    12346   gold             connect     DCONFAIL    NOERR     1          2016-09-22T10:49:03-0700

Resolve the Problem

  1. Verify that the next hop is reachable:
    1. Issue the show ip routes vpn 0 command. In the Next Hop Address column of the output, make sure the next-hop address is present.
    2. If the next-hop address is missing, configure a static route to it in VPN 0.
    3. If the next-hop address is present, ping it. If the ping fails, troubleshoot connectivity to the next hop.
  2. Verify that a default gateway is installed in the route table:
    1. Issue the show ip routes vpn 0 command. In the Prefix column of the output, make sure the default gateway, 0.0.0.0/0, is present
    2. If the default gateway is missing, configure a static route for it in VPN 0.
  3. Verify that the DTLS ports are open in any firewall you use. For more information, see Firewall Ports for Viptela Deployments.
  4. To verify that the default gateway is correctly mapping the IP address to the MAC address correctly, use the show arp command.
  5. Verify reachability:
    1. Ping the default gateway.
    2. Ping the Google DNS (8.8.8.8 or 8.8.4.4).
    3. If the ICMP service is allowed on the vEdge router, ping the vBond orchestrator.
    4. Run a traceroute command to the vBond orchestrator's DNS address.

 

DISCVBD or SYSIPCHNG: Transient Conditions

Problem Statement

The vEdge router experiences transient control connection errors.

Identify the Problem

Issue the show control connections-history command. In the Local Error column of the output, the value DISCVBD indicates that the vEdge router's connection to the vBond orchestrator has been taken down. This is normal behavior. The value SYSIPCHNG indicates a change in the vEdge router's system IP address.

                                                                PEER                  PEER                                                                              
PEER     PEER       PEER         SITE     DOMAIN   PEER         PRIVATE   PEER        PUBLIC                                                REPEAT               
TYPE     PROTOCOL   SYSTEM IP    ID       ID       PRIVATE IP   PORT      PUBLIC IP   PORT    LOCAL COLOR   STATE       LOCAL/REMOTE        COUNT   DOWNTIME       
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vbond    dtls       -            0        0        1.3.25.25    12346     1.3.25.25   12346   lte           tear_down   DISCVBD/NOERR       0       2016-09-27T19:06:16+0000
vmanage  tls        172.1.0.18   200      0        1.4.28.28    12346     1.3.25.25   12346   lte           tear_down   SYSIPCHNG/NOERR     0       2016-09-27T19:05:30+0000
vsmart   tls        172.1.0.19   103      1        1.3.26.26    12346     1.3.25.25   12346   lte           tear_down   SYSIPCHNG/NOERR     0       2016-09-27T19:05:30+0000
vsmart   tls        172.1.0.16   104      1        1.3.29.29    12346     1.3.25.25   12346   lte           tear_down   SYSIPCHNG/NOERR     0       2016-09-27T19:05:30+0000
vbond    dtls       -            0        0        1.4.30.30    12346     1.3.25.25   12346   lte           tear_down   DISCVBD/NOERR       0       2016-09-27T17:56:30+0000

Resolve the Problem

These issues are part of normal operation of the overlay network. They have no impact on production traffic, and they resolve by themselves, with no action required.

DISTLOC: Disabled TLOC

Problem Statement

A TLOC, or transport location, is disabled on the vEdge router. A TLOC identifies the physical interface where a vEdge router connects to the WAN transport network or to a NAT gateway. 

 

Identify the Problem

Issue the show control connections-history command. In the Local Error column of the output, the value DISTLOC indicates that a TLOC is disabled.

                                                                 PEER                       PEER
PEER     PEER      PEER         SITE      DOMAIN     PEER        PRIVATE     PEER           PUBLIC                                  LOCAL      REMOTE     REPEAT
TYPE     PROTOCOL  SYSTEM IP    ID        ID         PRIVATE IP  PORT        PUBLIC IP      PORT    LOCAL COLOR      STATE          ERROR      ERROR      COUNT      DOWNTIME
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vmanage  dtls      172.1.0.18   1001      0          1.4.28.28   12346       1.4.28.28      12346    gold           tear_down       DISTLOC    NOERR        0        2016-09-25T18:00:41-0700
vsmart   dtls      172.1.0.19   1013      1          1.3.29.29   12346       1.3.29.29      12346    gold           tear_down       DISTLOC    NOERR        0        2016-09-25T18:00:41-0700
vsmart   dtls      172.1.0.16   1011      1          1.3.26.26   12346       1.3.26.26      12346    gold           tear_down       DISTLOC    NOERR        0        2016-09-25T18:00:41-0700

Resolve the Problem

A TLOC might be disabled if:

  • Control connections have been cleared.
  • The TLOC color has changed.
  • The system IP has changed.

Use the show running-config command to check the configuration. Reconfigure any attributes necessary.

 

LISFD: Socket Errors

Problem Statement

Socket error messages occur.

Identify the Problem

Issue the show control connections-history command. In the Local Error column of the output, the value LISFD indicates a socket error.

                                                                    PEER                   PEER
PEER     PEER      PEER         SITE        DOMAIN     PEER         PRIVATE   PEER         PUBLIC                                                    LOCAL     REMOTE     REPEAT
TYPE     PROTOCOL  SYSTEM IP    ID          ID         PRIVATE IP   PORT      PUBLIC IP    PORT      LOCAL COLOR    STATE       ERROR      ERROR     COUNT     DOWNTIME
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vsmart   dtls      1.1.1.8      7            0         1.4.28.28    12447     1.4.28.28    12447     gold           up          LISFD      NOERR     45        2016-07-26T23:53:32-0700
vmanage  dtls      1.1.1.7      7            1         1.3.29.29    12647     1.3.29.29    12647     gold           up          LISFD      NOERR     70        2016-07-26T23:53:32-0700
vmanage  dtls      1.1.1.7      7            1         1.3.26.26    12747     1.3.26.26    12867     gold           up          LISFD      NOERR     69        2016-07-26T23:53:32-0700

Resolve the Problem

Socket errors might occur if:

  • The overlay network contains duplicate IP addresses, especially duplicate transport addresses.
  • Packets have been corrupted.
  • The vEdge router receives a reset request from the remote device.
  • The vEdge router and the vSmart controller are not both using DTLS or TLS ports, but one device is using TLS while the other is using DTLS.
  • Forwarding ports are not open.

 

NOVMCFG: vEdge Router Template Not Attached in vManage NMS

Problem Statement

The vEdge router's configuration template was not attached to the router during the bringup of the router.

Identify the Problem

Issue the show control connections-history command. In the Remote Error column of the output, the value NOVMCFG indicates that the vEdge router's template was not attached during bringup.

                                                               PEER                  PEER                                                                              
PEER     PEER       PEER          SITE   DOMAIN   PEER         PRIVATE   PEER        PUBLIC                                                REPEAT               
TYPE     PROTOCOL   SYSTEM IP     ID     ID       PRIVATE IP   PORT      PUBLIC IP   PORT    LOCAL COLOR    STATE        LOCAL/REMOTE      COUNT      DOWNTIME       
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vmanage  tls        172.1.0.18    200    0        1.4.28.28    12346     1.3.25.25   12346   lte            tear_down    RXTRDWN/NOVMCFG   78         2016-10-06T08:10:06+0000 

Resolve the Problem

During bringup from ZTP, if the device is not attached with a template on the vManage, then you will see “No Config. in vManage for device”.    Make sure that the template is assigned on the vManage for the device in question.

RDSIGFBD, TXCHTOBD: Board ID Not Initialized

Problem Statement

In an unstable network, when connections are frequently going down and coming up, the Trusted Board ID chip on a hardware vEdge router might not initialize.

Identify the Problem

Issue the show control connections-history command. In the Local Error column of the output, the values RDSIGFBD and TXCHTOBD indicate a problem with initializing the Trusted Board ID chip. RDSIGFBD indicates that the network failed to read the signature from board ID. TXCHTOBD indicates that the network failed to send a challenge to board ID.

                                                                PEER                     PEER
PEER     PEER      PEER         SITE     DOMAIN    PEER         PRIVATE    PEER          PUBLIC                                                    LOCAL      REMOTE     REPEAT
TYPE     PROTOCOL  SYSTEM IP    ID       ID        PRIVATE IP   PORT       PUBLIC IP     PORT    LOCAL COLOR      STATE       ERROR      ERROR     COUNT      DOWNTIME
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vbond    dtls                   0         0        172.16.1.1    12346     172.16.1.1    12346   publicinternet   challenge   TXCHTOBD   NOERR     0         2016­07­01T15:47:40+000
vbond    dtls     ­              0         0        172.16.1.1    12346     172.16.1.1    12346   publicinternet   challenge   TXCHTOBD   NOERR     0         2016­07­01T15:47:40+0000

Resolve the Problem

 Sometimes due to locking issues, sending challenge to board-id fails and when that happens, we reset the board-ID and try again.    It shouldn’t happen often, it delays the bring up of control connections. This should be fixed in newer versions.

VB_TMO, VM_TMO, VP_TMO, VS_TMO: Peer Timeout

Problem Statement

A peer timeout occurs if the vEdge router loses reachability to a controller in the overlay network.

Identify the Problem

Issue the show control connections-history command. In the Local Error column of the output, the values VB_TMO, VM_TMO, VP_TMO, and VS_TMO indicate a peer timeout error for vBond orchestrator, vManage NMS, peer vEdge routers, and vSmart controllers, respectively.

                                                                       PEER                    PEER                                                                              
PEER     PEER       PEER            SITE      DOMAIN    PEER           PRIVATE   PEER          PUBLIC                                             REPEAT               
TYPE     PROTOCOL   SYSTEM IP       ID        ID        PRIVATE IP     PORT      PUBLIC IP     PORT    LOCAL COLOR    STATE       LOCAL/REMOTE    COUNT       DOWNTIME       
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vmanage    tls      172.1.0.18      200       0         1.4.28.28      12346     1.3.25.25     12346   default        tear_down   VM_TMO/NOERR     0          2016-10-01T10:54:20-0700

Issue the show control connections-history detail command to check the hello counters. A discrepancy between the transfer and receive Hello packet counters indicates packet loss between the vEdge router and the controller.

Tx Statistics-
--------------
  hello                   1467659
  connects                0
  registers               0
  register-replies        0
  challenge               0
  challenge-response      1
  challenge-ack           0
  teardown                1
  teardown-all            0
  vmanage-to-peer         0
  register-to-vmanage     0
  
Rx Statistics-
--------------
  hello                   1467279
  connects                0
  registers               0
  register-replies        0
  challenge               1
  challenge-response      0
  challenge-ack           1
  teardown                0
  vmanage-to-peer         0
  register-to-vmanage     0

Resolve the Problem

  • Troubleshoot reachability to the controller using ping, traceroute, and rapid ping.
  • Increase the hello-interval and hello-tolerance values on the interface to prevent packet loss.

Also, sometimes the problem could be caused by the underlay, where the devices in the underlay could be rate-limiting the TLS/DTLS packets.   What has been observed is that if the packets are rate-limited to below 1Mbps, control connection(s) mayn't be formed and you will see "VM_TMO" errors.   Make sure to look into the underlay, for any potential BW / throughput issues. 

 

VECRTREV, VSCRTREV: Revoked Certificate

Problem Statement

The certificate for the vEdge router or vSmart controller has been revoked.

Identify the Problem

Issue the show control connections-history command. In the Local Error column of the output, the value VECRTREV indicates a revoked certificate on a vEdge router. The value VSCRTREV indicates a revoked certificate on a remote vSmart controller. If a certificate is revoked on a local vSmart controller, the value VSCRTREV displays in the Remote Error column.

                                                                              PEER                    PEER                                                                                                                                                                                                                          
          PEER     PEER      PEER           SITE     DOMAIN     PEER          PRIVATE    PEER         PUBLIC                                LOCAL      REMOTE    REPEAT               
INSTANCE  TYPE     PROTOCOL  SYSTEM IP      ID       ID         PRIVATE IP    PORT       PUBLIC IP    PORT     REMOTE COLOR    STATE        ERROR      ERROR     COUNT        DOWNTIME       
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0         vsmart   dtls      172.1.0.16     1011     1          1.3.26.26     12346      1.3.26.26    12346    default         tear_down    VSCRTREV   NOERR     0       2016-10-14T11:59:13-0700

Resolve the Problem

Certification verification failure is when certificate cannot be verified with the root cert installed:

1) Check time - it should be at least  within vBond's certificate validity range.

show clock

 

2) This can be caused by root cert corruption on vEdge

Open a Cisco / Viptela Support case to resolve the issue.

Comments
rudimocnik
Level 1
Level 1

Thank you for this excellent post. 

 

I have another problem not mention in the post.

 

I built a test lab on ESXi and I am using my own CA (tinyCA) for signing CSRs and issue certificates. I was able to onboard vmanage, vbond and vsmart with signed certificates from tinyCA. Next I performed initial config on vEdge and installed root cert from my CA. It has been successfully authorized by vBond however in the vManage it shows with "Certificate installation failed" status. I am aware that there is an option "WAN Edge Cloud Certificate Authorization" which is currently set to Automated.

 

My question: Is Automated option supported when using private CA? Can you describe what goes on in the vManage when Automated option is enabled or please point me to an article.

 

Note that I was able to onboard vEdge when switching "WAN Edge Cloud Certificate Authorization" to Manual and signing the CSR on my CA but I want it to happen automatically.

 

 

Rudi

 

svemulap@cisco.com
Cisco Employee
Cisco Employee

Hi Rudi -

 

Thx., for the comment on the post. 

On your question:

Your understanding is correct.  No support for automated option for private (enterprise) CA.   For private (enterprise) CA,
there is a manual intervention involved.  You need CSR signed by your CA server, then install signed PEM on vManage.

 

- SV 

rudimocnik
Level 1
Level 1
Ok great. Thanks for confirmation. I've sucessfully joined vEdges the manual way.


rudimocnik
Level 1
Level 1

I have another connection issue but this time I am trying to join vEdgeCloud 19.1.0 to controllers running 18.4.1. According to compatibility matrix that should not be an issue. 

 

I've done the following that worked on 18.4.1 vedges:

  1. Do the initial config on vedge (org, system-ip, vbond, site-id) and set ip on vpn0 and vpn 512 and next hop for vpn0 
  2. I've installed Enterprise CA root cert on the vedge
  3. Activated vedge with a chassis-id and token already imported on vManage
  4. At this point it showed up as "Autorized by vBond" and moved on to making connections to vManage
  5. I signed the CSR on my CA and installed the cert
  6. At this point the vedge went back to vBond and is stuck there with REMOTE ERROR: BIDNTVRFD 

I went onto vManage and sent serial number to controllers manually multiple times and the proccess succeded but the vEdge was still stuck. I've also made sure the time on all devices is synced. Moreover I went through deleteing all certs (root and vedge') and reinstalled them with no luck of getting any further.

 

I went on and did the debug on vedge:

local7.debug: Jul 30 09:28:24 vEdge30 VDAEMON[880]: create_ssl_conn_to_peer[5721]: %VDAEMON_DBG_MISC-3: SSL : Connecting to peer from ge0_0 to 10.0.0.3:12346
local7.debug: Jul 30 09:28:24 vEdge30 VDAEMON[880]: create_ssl_conn_to_peer[5727]: %VDAEMON_DBG_MISC-3: SSL_connect ERR_WANT (server 10.0.0.3:12346) ... retrying
local7.debug: Jul 30 09:28:24 vEdge30 VDAEMON[880]: vdaemon_verify_callback[409]: %VDAEMON_DBG_MISC-1: Verify failed: self signed certificate! No need to panic!! 
local7.debug: Jul 30 09:28:24 vEdge30 last message repeated 2 times
local7.debug: Jul 30 09:28:24 vEdge30 VDAEMON[880]: ssl_connect_timer_cb[408]: %VDAEMON_DBG_MISC-3: SSL_connect succeeded from ge0_0 to 10.0.0.3:12346
local7.debug: Jul 30 09:28:24 vEdge30 VDAEMON[880]: vbond_parse_msg[2146]: %VDAEMON_DBG_MISC-3: Received a Challenge request from the Server
local7.debug: Jul 30 09:28:24 vEdge30 VDAEMON[880]: vbond_parse_challenge[6480]: %VDAEMON_DBG_MISC-3: Parsing CHALLENGE .. 
local7.debug: Jul 30 09:28:24 vEdge30 VDAEMON[880]: vbond_proc_challenge[4298]: %VDAEMON_DBG_MISC-3: Received CHALLENGE on ge0_0..
local7.debug: Jul 30 09:28:24 vEdge30 VDAEMON[880]: vdaemon_dtls_verify_vbond_cert[798]: %VDAEMON_DBG_MISC-3: Certificate validation Successful 
local7.debug: Jul 30 09:28:24 vEdge30 VDAEMON[880]: vdaemon_dtls_verify_vbond_cert[812]: %VDAEMON_DBG_MISC-3: O-name vIPtela Inc, OU name SRC Internal (in cfg SRC Internal) 
local7.debug: Jul 30 09:28:24 vEdge30 VDAEMON[880]: vdaemon_mark_n_sweep_vmanage_serial_numbers[4227]: %VDAEMON_DBG_MISC-3: vManage serial file DB does not exist yet. Add all.
local7.debug: Jul 30 09:28:24 vEdge30 VDAEMON[880]: vdaemon_send_challenge_ack[4624]: %VDAEMON_DBG_MISC-3: Send Challenge ACK ... (board_id_present No)
local7.debug: Jul 30 09:28:24 vEdge30 VDAEMON[880]: vbond_parse_msg[2166]: %VDAEMON_DBG_MISC-3: Received a TEAR DOWN from the peer
local7.debug: Jul 30 09:28:24 vEdge30 VDAEMON[880]: vbond_parse_tear_down[6912]: %VDAEMON_DBG_MISC-3: Parsing TEAR DOWN.. 
local7.debug: Jul 30 09:28:24 vEdge30 VDAEMON[880]: vbond_proc_msg[5801]: %VDAEMON_DBG_MISC-3: Received a TEAR DOWN (dev type 4) (teardown Just this peer) on ge0_0
local7.debug: Jul 30 09:28:24 vEdge30 VDAEMON[880]: vdaemon_find_next_active_wan_intf[1465]: %VDAEMON_DBG_MISC-1: Next wan interface to connect to vmanage = none
local7.debug: Jul 30 09:28:24 vEdge30 VDAEMON[880]: vdaemon_cleanup_peer_frag[1777]: %VDAEMON_DBG_MISC-3: Cleanup Fragments ..
local7.debug: Jul 30 09:28:24 vEdge30 VDAEMON[880]: vdaemon_ftm_send_ctrl_tun[242]: %VDAEMON_DBG_MISC-3: Local-TLOC 10.0.0.30:12346 Remote-TLOC 10.0.0.3:12346 msg-type Delete

I found some clues (red marked) but I am not sure what exactly those mean. vBond is supposed to authenticate vEdge via valid-vedge list and tell the vEdge the ip of valid vManage right? So I went onto the vBond and did this:

 

image.png

 

As you can see there are valid vSmarts and vEdges (including the one stuck - serial number 12) but the vManage list is empty. Is that expected? Note that I have other vEdges joined but the fact that the list is empty is a bit strange. 

 

Why is vEdge not sending its board_id (is this chassis-id?)? I assume due to this vBond doesn't know who is trying to autehnticate and tears down the connection.

 

Any hint on what should I do next?

svemulap@cisco.com
Cisco Employee
Cisco Employee

Hi - 

 

- Starting 17.1, vManage will sign the vEdge cloud certificates

- On the vManage GUI, you need to add the serial numbers generated from PnP

- do send to controllers, then generate the bootstrap configuration for the vEdge cloud on the vManage

- On the vEdge make sure the reachability is there to the controllers

- Then execute the command on the vEdge cloud
    request vEdge-cloud activate chassis xxxxx token xxxxx

- Then the vEdge cloud should be on-boarded on the overlay

 

Note: vManage sign the cert for the vEdge cloud devices, based on your note above step 5 is not needed, assuming you are generating the CSR on your vEdge cloud and signing it with your enterprise CA.

garrettc134
Level 1
Level 1

I'm in the same situation as mocnikr above
"At this point the vedge went back to vBond and is stuck there with REMOTE ERROR: BIDNTVRFD ""

Luis Rueda
Cisco Employee
Cisco Employee

My vEDGE is stuck at *CRTVERFL*

 

NFVIS-FCH2035V0ZN-vEDGE# show control connections-history
Legend for Errors
ACSRREJ    - Challenge rejected by peer.               NOVMCFG   - No cfg in vmanage for device.
BDSGVERFL  - Board ID Signature Verify Failure.        NOZTPEN   - No/Bad chassis-number entry in ZTP.
BIDNTPR    - Board ID not Initialized.                 OPERDOWN  - Interface went oper down.
BIDNTVRFD  - Peer Board ID Cert not verified.          ORPTMO    - Server's peer timed out.
BIDSIG    - Board ID signing failure.                  RMGSPR    - Remove Global saved peer.
CERTEXPRD  - Certificate Expired                       RXTRDWN   - Received Teardown.
CRTREJSER  - Challenge response rejected by peer.      RDSIGFBD  - Read Signature from Board ID failed.
CRTVERFL   - Fail to verify Peer Certificate.          SERNTPRES - Serial Number not present.
CTORGNMMIS - Certificate Org name mismatch.            SSLNFAIL  - Failure to create new SSL context.
DCONFAIL   - DTLS connection failure.                  STNMODETD - Teardown extra vBond in STUN server mode.
DEVALC     - Device memory Alloc failures.             SYSIPCHNG - System-IP changed.
DHSTMO     - DTLS HandShake Timeout.                   SYSPRCH   - System property changed
DISCVBD    - Disconnect vBond after register reply.    TMRALC    - Timer Object Memory Failure.
DISTLOC    - TLOC Disabled.                            TUNALC    - Tunnel Object Memory Failure.
DUPCLHELO  - Recd a Dup Client Hello, Reset Gl Peer.   TXCHTOBD  - Failed to send challenge to BoardID.
DUPSER     - Duplicate Serial Number.                  UNMSGBDRG - Unknown Message type or Bad Register msg.
DUPSYSIPDEL- Duplicate System IP.                      UNAUTHEL  - Recd Hello from Unauthenticated peer.
HAFAIL     - SSL Handshake failure.                    VBDEST    - vDaemon process terminated.
IP_TOS     - Socket Options failure.                   VECRTREV  - vEdge Certification revoked.
LISFD      - Listener Socket FD Error.                 VSCRTREV  - vSmart Certificate revoked.
MGRTBLCKD  - Migration blocked. Wait for local TMO.    VB_TMO    - Peer vBond Timed out.
MEMALCFL   - Memory Allocation Failure.                VM_TMO    - Peer vManage Timed out.
NOACTVB    - No Active vBond found to connect.         VP_TMO    - Peer vEdge Timed out.
NOERR      - No Error.                                 VS_TMO    - Peer vSmart Timed out.
NOSLPRCRT  - Unable to get peer's certificate.         XTVMTRDN  - Teardown extra vManage.
NEWVBNOVMNG- New vBond with no vMng connections.       XTVSTRDN  - Teardown extra vSmart.
NTPRVMINT  - Not preferred interface to vManage.       STENTRY    - Delete same tloc stale entry.
EMBARGOFAIL - Embargo check failed

                                                                       PEER                      PEER
PEER     PEER     PEER             SITE        DOMAIN PEER             PRIVATE  PEER             PUBLIC                                   LOCAL      REMOTE     REPEAT
TYPE     PROTOCOL SYSTEM IP        ID          ID     PRIVATE IP       PORT     PUBLIC IP        PORT    LOCAL COLOR      STATE           ERROR      ERROR      COUNT DOWNTIME
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vbond    dtls     0.0.0.0          0           0      10.201.146.159   12346    10.201.146.159   12346   default          tear_down       CRTVERFL   NOERR      39    2019-09-24T16:39:51+0000
vbond    dtls     0.0.0.0          0           0      10.201.146.159   12346    10.201.146.159   12346   default          tear_down       SYSIPCHNG  NOERR      0     2019-09-24T16:27:34+0000

NFVIS-FCH2035V0ZN-vEDGE#

What else can I do as remediation or other troubleshooting commands?

I had the similar issue as mocnikr and garrettc134.
vEdge Cloud did not connect to fabric with REMOTE ERROR: BIDNTVRFD
In my case it was a bug CSCvp75927 and provided workaround helped me.

yingyuwang
Level 1
Level 1

svemulap@cisco.com wrote:

Hi Rudi -

 

Thx., for the comment on the post. 

On your question:

Your understanding is correct.  No support for automated option for private (enterprise) CA.   For private (enterprise) CA,
there is a manual intervention involved.  You need CSR signed by your CA server, then install signed PEM on vManage.

 

- SV 

 

Thank you for clarification on using private/enterprise CA. For onboarding a vEdge/cEdge nodes, are the following steps correct? For my lab, I use vManage as the enterprise CA.

--> Install enterprise root cert(pem) on the vEdge/cEdge 

--> generate CSR from vEdge/cEdge cli and have it signed by enterprise CA

--> import the signed vEdge/cEdge.crt into vManage in the GUI

 

Thank you for the reply in advance.

 

rudimocnik
Level 1
Level 1

Hi svemulap@cisco.com 

 

I have a C1111-4PLTEEA  router running 16.10.4 code. The controllers are using Cisco signed certificates and for the SDWAN routers I have Onbox certificate option checked. 

 

I prepared templates in vManage and assigned them to the router. Then I used the  initial config and usb to provision the initial config.

The router boots up and successfully grabs the config from the usb. I can ping the internet and also the public IP of my onPrem vBond. 

 

When I check control connections on the router I see this familiar error:

PEER     PEER     PEER             SITE        DOMAIN PEER             PRIVATE  PEER             PUBLIC                                   LOCAL      REMOTE     REPEAT               
TYPE     PROTOCOL SYSTEM IP        ID          ID     PRIVATE IP       PORT     PUBLIC IP        PORT    LOCAL COLOR      STATE           ERROR      ERROR      COUNT DOWNTIME       
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vbond    dtls     -                0           0      193.XX.XX.101   12346    193.XX.XX.101   12346   lte              tear_down       CRTVERFL   NOERR      88    2020-05-07T09:47:24+0200
vbond    dtls     -                0           0      193.XX.XX.101   12346    193.XX.XX.101   12346   lte              connect         DCONFAIL   NOERR      24    2020-05-07T09:38:38+0200
vbond    dtls     -                0           0      193.XX.XX.101   12346    193.XX.XX.101   12346   lte              connect         DCONFAIL   NOERR      0     2020-05-07T09:13:47+0200

I've checked the org-name and other paramters and it looks fine. Furthermore I've checked the SUDI number and the serial number that the vBond has and it matches with the router. How can I troubleshoot this error deeper? I did not find any debug commands on the box.

 

Note: I have all controllers behind a firewall with private IPs and 1:1 static mappings with public IP for each controller.

 

Any ideas?

 

Rudi 

 

Tonai1
Level 1
Level 1

Hello all,

My problem is here

 

HN_THD_WAN_2#sh sdwan control connection-history | in silver
vmanage dtls 1.1.1.6 4294946755 0 10.0.2.229 12946 3.1.66.91 12946 silver tear_down DISTLOC NOERR 0 2020-05-20T16:00:03+0700
vsmart dtls 1.1.1.4 4294946754 1 10.0.2.116 12346 3.0.25.255 12346 silver tear_down DISTLOC NOERR 0 2020-05-20T16:00:03+0700
vsmart dtls 1.1.1.5 4294946753 1 10.0.5.13 12346 52.64.3.185 12346 silver tear_down DISTLOC NOERR 0 2020-05-20T16:00:03+0700
vbond dtls 0.0.0.0 0 0 52.64.213.149 12346 52.64.213.149 12346 silver tear_down DISCVBD NOERR 1 2020-05-19T15:11:49+0700
vbond dtls 0.0.0.0 0 0 13.251.153.180 12346 13.251.153.180 12346 silver tear_down DISCVBD NOERR 1 2020-05-19T15:11:44+0700
vsmart dtls 1.1.1.4 4294946754 1 10.0.2.116 12346 3.0.25.255 12346 silver up RXTRDWN VP_TMO 0 2020-05-19T15:10:49+0700
HN_THD_WAN_2#sh sdw
HN_THD_WAN_2#sh sdwan con
HN_THD_WAN_2#sh sdwan contr
HN_THD_WAN_2#sh sdwan control con
HN_THD_WAN_2#sh sdwan control connections
PEER PEER CONTROLLER
PEER PEER PEER SITE DOMAIN PEER PRIV PEER PUB GROUP
TYPE PROT SYSTEM IP ID ID PRIVATE IP PORT PUBLIC IP PORT LOCAL COLOR PROXY STATE UPTIME ID
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vsmart dtls 1.1.1.5 4294946753 1 10.0.5.13 12346 52.64.3.185 12346 silver No up 0:02:29:23 0
vsmart dtls 1.1.1.4 4294946754 1 10.0.2.116 12346 3.0.25.255 12346 silver No up 0:02:29:27 0
vbond dtls 0.0.0.0 0 0 13.251.153.180 12346 13.251.153.180 12346 silver - up 0:02:29:35 0


Please let me know how to resolve it!

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking for a $25 gift card