cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1303
Views
0
Helpful
0
Comments
Anna Komarovska
Cisco Employee
Cisco Employee

When ZTP fails, how to troubleshoot it?

When ZTP fails, the reason is always listed on the Plug and Play Connect Portal (software.cisco.com). One can click on the "Show Log..." drop-down for their device to see the latest error messages shown in the screenshot.

PnP Connect CG.png

If the logs suggest the issue originates from the device or is not sufficiently detailed, the only other debugging option is to access the root shell of the device and open the log file called /var/log/pnp/pnp.log. This will indicate failures from the device's PnP agent context instead of the PnP server context. In this case, you must contact our Cisco Technical Assistance Center: Technical support team for Cisco customers.

Is there any situation in which ZTP will not be possible?

Yes, if the device cannot reliably ping devicehelper.cisco.com (the PnP Connect Server), then ZTP is not possible. Failure to connect could be due to various modem-related reasons, such as an invalid SIM card, bad signal strength, or incorrect modem settings preventing the modem from attaching to the network. Note that even if ping to devicehelper.cisco.com works, it is still possible that the connection is not reliable enough (bad signal quality). Therefore, it is best to check for the cellular signal strength and ensure its quality using the 'show cellular 1 radio' CLI.

Can one bootstrap the device?

You can manually configure the device at any time instead of going through the PnP/ZTP process. The PnP/ZTP process is primarily for convenient, automated deployment but does not actually prevent the device from being used as advertised.

If ZTP is not possible, how does one manually configure all the control parameters?

If you can't use ZTP for some reason, one can manually configure all the control connection parameters as below. Please note that the organization name, system IP, vbond, and site ID information will have to correspond to the deployment. For example, you should change the organization-name to match the deployed organization name, the system-ip to be unique on the network, the vbond IP address needs to match the actual vBond address, and the site-id should also be unique.

Below is just a sample that you should replace with the appropriate values.

config terminal
viptela-system:system
sp-organization-name "CG-LTE-Test"
organization-name "CG-LTE-Test"
system-ip 1.1.1.120
vbond 13.50.70.148 port 12346
site-id 320
domain-id 1
commit
gw-system:interface Cellular1
 tunnel-interface
  no border
  color private1
  no last-resort-circuit
  no low-bandwidth-link
  max-control-connections       2
  no vbond-as-stun-server
  vmanage-connection-preference 8
  port-hop
  carrier                       default
  nat-refresh-interval          5
  hello-interval                1000
  hello-tolerance               12
  allow-service all
  no allow-service bgp
  allow-service dhcp
  allow-service dns
  allow-service icmp
  no allow-service sshd
  no allow-service netconf
  no allow-service ntp
  no allow-service ospf
  no allow-service stun
  allow-service https
  no allow-service snmp
  no allow-service bfd
  encapsulation ipsec weight 1
 !
!
commit
end

Furthermore, you need to install the root certificate:

CellularGateway# request root-cert-chain install /flash/root-ca.crt

The file root-ca.crt (name does not matter, it can be some other name) is the Enterprise Root CA certificate file that should have been already generated for the deployment. For example, it should match the file on the vBond/vManage controllers. Since without ZTP, you need to manually copy it to the device's /flash storage, you will need to set up a TFTP server to the device and download it to the CG418-E/CG522-E.

CellularGateway# gw-action:request file download tftpip <tftp ip address> filename root-ca.crt

This will store the root-ca.crt from the TFTP server to the device's /flash partition. 

How to look at control connections/local properties/history?

The show control connections CLI will show the control connections to both vBond and vManage at any point in time.

When the control connection first begins to establish (as soon as all the correct configs and certificates are in place on the device, either through PnP or manually), the device will first reach out to the vBond orchestrator.

This means the output of this CLI will show only a vBond peer entry along with a STATE such as connect or challenge.

The final, correct output when the device is connected to vManage, is just a single peer entry for vManage:

CellularGateway# show control connections
                                                          PEER                  PEER                                    CONTROLLER
PEER    PEER PEER         SITE       DOMAIN PEER          PRIV  PEER            PUB                                     GROUP     
TYPE    PROT SYSTEM IP    ID         ID     PRIVATE IP    PORT  PUBLIC IP       PORT  ORGANIZATION      LOCAL COLOR     PROXY STATE UPTIME      ID        
------------------------------------------------------------------------------------------------------
vmanage dtls 1.1.1.4      4294964286 0      10.0.2.60     12346 13.50.168.216   12346 spaal-LTE-Test    private1        No    up     0:01:20:45  0          

If there is any vBond entry that remains after more than approximately 5 minutes, then the device is likely still in the process of connecting and has not yet disconnected from the vBond. Depending on the STATE (connect, challenge), this could either mean the control connection has not yet finished due to many potential reasons:

  • such as a poor cellular signal strength
  • invalid configuration

If the output is completely empty, then either the control connection configs/certificate are not present (the most common reason) or, in rare cases, the device requires a restart.

The show control local-properties CLI presents the current relevant information needed for the control connection, such as the configured system parameters and certificate validity/installation status. It has a lot of overlap with show running-config except it also shows additional Root CA certificate information. Below is a valid working output:

Note that the root-ca-chain-status should say Installed, the certificate-not-valid-after should be a valid date and match the Root CA validity, and the configured system properties should be listed here as well.

CellularGateway# show control local-properties
personality                       vedge
sp-organization-name              CG-LTE-Test
organization-name                 CG-LTE-Test
root-ca-chain-status              Installed
certificate-status                Installed
certificate-validity              Valid
certificate-not-valid-before      Mar 21 08:17:09 2020 GMT
certificate-not-valid-after       Aug 09 20:58:26 2099 GMT
enterprise-cert-status            Not-Applicable
enterprise-cert-validity          Not Applicable
enterprise-cert-not-valid-before  Not Applicable
enterprise-cert-not-valid-after   Not Applicable
dns-name                          vbond-dev-xxxxxx.viptela.info
site-id                           320
domain-id                         1
protocol                          dtls
tls-port                          0
system-ip                         1.1.1.120
chassis-num/unique-id             CG418-E-XXXXXXXXXXX
serial-num                        XXXXXXXX
subject-serial-num                XXXXXXXXXXX
enterprise-serial-num             No certificate installed
token                             Invalid
keygen-interval                   1:00:00:00
retry-interval                    0:00:00:15
no-activity-exp-interval          0:00:00:20
dns-cache-ttl                     0:00:02:00
port-hopped                       FALSE
time-since-last-port-hop          0:00:00:00
embargo-check                     success
number-vbond-peers                0
number-active-wan-interfaces      1
 NAT TYPE: E -- indicates End-point independent mapping
           A -- indicates Address-port dependent mapping
           N -- indicates Not learned
           Note: Requires minimum two vbonds to learn the NAT type
INTERFACE PUBLIC           PUBLIC PRIVATE      PRIVATE PRIVATE VS/VM COLOR
          IPv4             PORT   IPv4         IPv6    PORT
-------------------------------------------------------------------------------------
srcr2     174.194.132.195 4602   192.168.1.1 ::     12346    0/1  private1
STATE    MAX     RESTRICT/  LR/LB   LAST         SPI TIME   NAT  VM
         CNTRL   CONTROL/           CONNECTION   REMAINING  TYPE CON
                 STUN                                            PRF
-------------------------------------------------------------------------------------
up        2      no/yes/no   No/No  0:01:29:15   0:00:00:00  N    5

The “show control connections-history CLI will show the latest status in establishing the control connection if any. The output also comes with a table to indicate what the errors/status in each entry mean. For example, the below table shows a single entry that simply shows that the vBond was disconnected, which is the correct behavior after establishing the control connection to vManage.

Common errors messages and solutions

CRTVERFL - Root cert invalid. Either it is not installed, or the device system time is factually incorrect.

DCONFAIL - Cannot ping the vBond. Check the cellular connection quality/reachability.

CellularGateway# show control connections-history
PEER   PEER     PEER    SITE DOMAIN  PEER           PEER     PEER
TYPE   PROTOCOL SYSTEM  ID   ID      PRIVATE IP     PRIVATE  PUBLIC
                IP                                  PORT     IP
-------------------------------------------------------------------------
vbond  dtls     0.0.0.0  0    0       13.50.70.148   12346    13.50.70.148
PEER   LOCAL    STATE     LOCAL    REMOTE  REPEAT        DOWNTIME  
PUBLIC COLOR              ERROR    ERROR   COUNT
                                           ORGANIZATION
-------------------------------------------------------------------------
12346  private1 tear_down DISCVBD   NOERR   0    2021-11-29T23:01:39+0000

Legend for Errors

BDSGVERFL 

 Board ID Signature Verify Failure

BIDNTPR   

 Board ID not Initialized

BIDNTVRFD 

 Peer Board ID Cert not verified

BIDSIG   

 Board ID signing failure

CERTEXPRD 

 Certificate Expired

CRTREJSER 

 Challenge response rejected by peer

CRTVERFL  

 Fail to verify Peer Certificate

CTORGNMMIS

 Certificate Org name mismatch

DCONFAIL  

 DTLS connection failure

DEVALC    

 Device memory Alloc failures

DHSTMO    

 DTLS HandShake Timeout

DISCVBD   

 Disconnect vBond after register reply

DISTLOC   

 TLOC Disabled

DUPCLHELO 

 Recd a Dup Client Hello, Reset Gl Peer

DUPSER    

 Duplicate Serial Number

DUPSYSIPDEL

 Duplicate System IP

HAFAIL    

 SSL Handshake failure

IP_TOS    

 Socket Options failure

LISFD     

 Listener Socket FD Error

MGRTBLCKD 

 Migration blocked Wait for local TMO

MEMALCFL  

 Memory Allocation Failure

NOACTVB   

 No Active vBond found to connect

NOERR     

 No Error

NOSLPRCRT 

 Unable to get peer's certificate

NEWVBNOVMNG

 New vBond with no vMng connections

NTPRVMINT 

 Not preferred interface to vManage

HWCERTREN 

 Hardware vEdge Enterprise Cert Renewed   

NOZTPEN  

 No/Bad chassis

OPERDOWN 

 Interface went oper down.

ORPTMO   

 Server's peer timed out.

RMGSPR   

 Remove Global saved peer.

RXTRDWN  

 Received Teardown.

RDSIGFBD 

 Read Signature from Board ID failed.

SERNTPRES

 Serial Number not present.

SSLNFAIL 

 Failure to create new SSL context.

STNMODETD

 Teardown extra vBond in STUN server mode.

SYSIPCHNG

 System

SYSPRCH  

 System property changed 

TMRALC   

 Timer Object Memory Failure.

TUNALC   

 Tunnel Object Memory Failure.

TXCHTOBD 

 Failed to send challenge to BoardID. 

UNMSGBDRG

 Unknown Message type or Bad Register msg.

UNAUTHEL 

 Recd Hello from Unauthenticated peer.

VBDEST   

 vDaemon process terminated.

VECRTREV 

 vEdge Certification revoked.

VSCRTREV 

 vSmart Certificate revoked.

VB_TMO   

 Peer vBond Timed out.

VM_TMO   

 Peer vManage Timed out.

VP_TMO   

 Peer vEdge Timed out.

VS_TMO   

 Peer vSmart Timed out.

XTVMTRDN 

 Teardown extra vManage.

XTVSTRDN 

 Teardown extra vSmart.

STENTRY   

 Delete same tloc stale entry.

HWCERTREV

 Hardware vEdge Enterprise Cert Revoked.

How to update the root certificate if using enterprise root?

The update process is the same as the initial manual installation process.

CellularGateway# request root-cert-chain install /flash/root-ca.cr

How to debug vDaemon?

Usually, the best way to debug vDaemon is through these commands:

show control affinity config/status
show control connections
show control connections-history
show control connections-info
show control local-properties
show control statistics
show control summary
show control valid-vmanage-id
show control valid-vsmarts

Since these outputs are a very reliable indicator of vDaemon’s current status and the data it is currently operating on.

For more advanced debugging in rare situations

It can potentially be helpful to turn on the following debugs

For certificates, challenge verification during hello packet exchange, and other miscellaneous information:

CellularGateway# debug vdaemon misc high

For confd-related information surrounding the vdaemon subscription/callbacks (such as system configs):

CellularGateway# debug vdaemon confd high

Technically, it is possible to set maximum debugs using:

CellularGateway# debug vdaemon all high

But since this will output a flood of hello control messages, please do not do so without also setting:

CellularGateway# debug vdaemon hello low

Is it possible to do a packet capture/TCPDump to troubleshoot control issues if needed?

No. There is no packet capture capability on the Cellular Gateway devices currently.

If a template push fails, what logs to look at to investigate which commands it rejected?

If a template push fails, if the vManage GUI itself does not already indicate the offending config, collect an Admin-Tech in an SD-WAN Environment and Upload It to a TAC Case. Navigate to Tools -> Operational Commands. Click on Generate Admin Tech for vManage. 

AnnaKomarovska_1-1667254935391.png

Include all three available options on the screen and click Generate.

AnnaKomarovska_2-1667255152034.png

Click 'Show Admin Tech List' as shown in the image. It will take a couple of minutes to generate the logs.

AnnaKomarovska_3-1667255904727.png

Click the 'Download' icon.

AnnaKomarovska_4-1667256205255.png

Download admin tech logs from the local system and upload them to a Service Request (SR).

The device logs may not be helpful in this scenario because the configs may not have reached the device yet due to the template push failure.

Please also check with vManage experts for advanced template push debugging.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: