cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Announcements

SD-WAN Security - Troubleshooting Guide

4525
Views
15
Helpful
0
Comments

Router Upgrade

Unable to upgrade image - stuck in rommon

System Bootstrap, Version 16.7(3r), RELEASE SOFTWARE
Copyright (c) 1994-2017  by cisco Systems, Inc.
Current image running: *Upgrade in progress* Boot ROM0
Last reset cause: BootRomUpgrade
ISR4331/K9 platform with 16777216 Kbytes of main memory

rommon 1 > 

Upgrade the image to IOS-XE CCO image 16.10.1 which will automatically upgrade the rommon. If upgrading to 16.10.1 IOS-XE SD-WAN image then rommon upgrade has to be done manually.

Watch the video here: https://youtu.be/qugfIlEmSEM

Template Error

Unable to attach template with security

[5-Jan-2019 7:34:40 UTC] Configuring device with feature template: Single_Site_Dual_Homed_CSR1000v_Device_Template_Sec_Policy_For_Compliance
[5-Jan-2019 7:34:40 UTC] Generating configuration from template
[5-Jan-2019 7:34:47 UTC] Checking and creating device in vManage
[5-Jan-2019 7:34:53 UTC] Device is online
[5-Jan-2019 7:34:53 UTC] Updating device configuration in vManage
[5-Jan-2019 7:34:59 UTC] Pushing configuration to device
[5-Jan-2019 7:36:27 UTC] Pre config validation failed. Device is not configured to accept new configuration.
Error while doing pre config validation of type utd for CSR-6FEA589A-A60F-4445-87A8-1F65F9B5CFC2-100.100.30.1.
Error Reason: Failed to get cpu share, memory and disk specs

Solution:

It appears that the router previously had app-hosting configured and now you are trying to attach a template
which would require app-hosting again.  It is not able to get the CPU and memory it needs (‘cos it already is running and has acquired CPU and memory).
Use CLI stop and uninstall container using the following commands and make sure there is no service running when you issue "show app-hosting list"

SD-WAN-Router#app-hosting stop appid utd
SD-WAN-Router#app-hosting uninstall appid utd

then attach the template again. Make sure the matching app-hosting image file is loaded to the software repository prior to attaching the template.

Unable to attach template

While attaching template even before configuring security policies, it fails.

Error on line 22: invalid value for: address in /vmanage-cfs:templates/vmanage-cfs:template
[vmanage-cfs:template-name='vip_internal_temp_device_CSR-c3d127e1-a15c-4d5d-b687-741670093000']
/vmanage-cfs:vpn/vmanage-cfs:vpn-instance[vmanage-cfs:vpn-id='1']/vmanage-cfs:interface[vmanage-cfs:
if-name='GigabitEthernet1']/vmanage-cfs:ip/vmanage-cfs:address: "192.168.30.1" is an invalid value.

Looks like the IP address that is configured does not include a mask.  Try configuring IP address with 192.168.30.1/24 and it should work.

Controllers Issues

 vManage Cert shows error

Picture1.png

This is not an issue as vManage web cert is always self-signed

Linking smart account in vManage results in credential error

vManage accout-credentials-error.jpg 

 

 

 

 

 

Failed to connect to server

Issuing "show sdwan cert serial" shows error below:

Router#show sdwan certificate serial   
Failed to connect to server

 "Failed to connect to server" message, this is due to the PNP process running, which blocks access to the viptela service running on the ISR.  You can run "pnpa service discovery stop" and then it will let you run the show command. Try the "show crypto pki certificates CISCO_IDEVID_SUDI" command.  Using the SUDI serial number, PID, and serial number add the device to the PNP portal following the guide outlined in the video (https://www.youtube.com/watch?v=qugfIlEmSEM)
Export the Provisioning File from Plug and Play connect under Controller Profiles.

Control Connections drop with NAT enabled for DIA

When NAT is configured on cEdge interface to access internet traffic, the control connections drop after few minutes and configurations rolleback. NAT configurations appear to create NAT translations for control connections as well.

ip nat inside source list nat-dia-vpn-hop-access-list interface GigabitEthernet0/0/0 overload
ip nat translation tcp-timeout 60
ip nat translation udp-timeout 1

interface GigabitEthernet0/0/0
 description Internet Circuit
 no shutdown
 arp timeout 1200
 ip address dhcp client-id GigabitEthernet0/0/0
 ip dhcp client default-router distance 1
 ip mtu 1500
 ip nat outside 
 mtu 1500
 negotiation auto
 service-policy output shape_GigabitEthernet0/0/0
exit
  • Make sure vManage is at least on 18.3.1 as there is config push issue in previous code with cEdge.
  • Remove NAT and verify if Control connection comes up.
  • If control connection comes up without NAT then push the NAT configurations again. Using the feature template or CLI, make sure that default route to VPN0 under the service VPN is added.
  • If the configuration push is successful and device did not roll back verify all control connections
show sdwan control local-properties
show sdwan control connections
show ip nat translations
  • Device may still show control connection to vManage getting NAT translations and is down. The control connection to vSmart and vBond are up.
  • TCP Dump on ISR shows connection is getting RST from either vManage or Modem in the path.
  • Capture on vManage confirms that vManage is resting the connection
  • The cEdge is selecting a random high number port for the control connection formation to vManage. This, in itself, isn’t necessarily a problem as the connection can still form with vManage using that port for communication.
  • The second, and main problem, is that we confirmed with packet captures on the cEdge that it was NATing the initial SYN towards vManage but the ACK and future packets were not. Since this traffic is sourced by the CPU on the cEdge, it shouldn’t be NATed.
  • Check if there is a NAT ACL that is matching on WAN IP and forcing the translation for even control CPU generated connections.
show ip access-list dynamic

Solution:

  • The bug involves an issue where control packets may be treated as datapath traffic. Because of that, some of the traffic gets NATed unexpected, which in this case is the TCP SYN when we try to build the control connection.
  • The ACK is properly treated as control and is not NATed and this is the reason we see differences in the source port on the vManage capture before it resets the connection.
  • To workaround this, we need to convert vManage to use DTLS for now. This bug will be fixed in the GA release but wasn’t picked up in time for the EFT build.
  • This needs to be done for all controllers, not just vManage.
  • vBond already uses DTLS.
  • There is no preference. With DTLS udp connections will be torn down quicker than TCP connections on any intermediate NAT box.
  • To change the control connection in CLI go to Security->Control-> Protocl and select DTLS

vManage doesn’t show images in the repository

This is a known issue where vManage is not able to show the images in repository. This fix is committed for next release.

Error when trying to apply the security policy

 

  • security policy will not work if the virtual app file is not copied to the software repository and the factory default template is not picked for UTD. This is mandatory step for IPS and UTD features before attaching the template
  • Also verify the VPNs are correctly listed in the CLI policy. If VPN2 is used in the security policy, but that is not defined on the device template, then security policy will fail.

Firewall

Config push may fail with ZBFW and IPS policy

Picture1.png

  • Try to configure the template without preview. May get below error
  • If the ZBFW policy and IPS policy name is the same, the config push may fail. To fix the issue, have unique names. This behavior will be fixed in upcoming releases.

Firewall dashboard does not populate data

Double check and make sure the firewall is building sessions.

Connect to the router via CLI and issue the following command and make sure session are being built and they show up in "SIS_OPEN" state.

Raleigh-CSR#sh policy-map type inspect zone-pair sessions 
  Zone-pair: ZP_INSIDE_OUTSIDE_FIREWALL-TEST
  Service-policy inspect : FIREWALL-TEST

    Class-map: FIREWALL-TEST-seq-1-cm_ (match-all)  
      Match: access-group name FIREWALL-TEST-seq-1-acl_
      Inspect
        Established Sessions
         Session ID 0x0005454C (192.168.10.10:33284)=>(52.11.201.245:443) https SIS_OPEN
          Created 00:07:05, Last heard 00:07:05
          Bytes sent (initiator:responder) [1164:3826]


      Service-policy inspect avc : My-drop-app-list-pm_

        Class-map: My-drop-app-list-cm0_ (match-any)  
          0 packets, 0 bytes
          30 second offered rate 0000 bps, drop rate 0000 bps
          Match: protocol attribute application-family peer-to-peer
            0 packets, 0 bytes
            30 second rate 0 bps
          Match: protocol attribute application-family webmail
            0 packets, 0 bytes
            30 second rate 0 bps
          Match: protocol gtalk
            0 packets, 0 bytes
            30 second rate 0 bps
          Match: protocol gtalk-chat
            0 packets, 0 bytes
            30 second rate 0 bps
          Match: protocol youtube
            0 packets, 0 bytes
            30 second rate 0 bps
          Deny

        Class-map: class-default (match-any)  
          4847090 packets, 3040398370 bytes
          30 second offered rate 160000 bps, drop rate 0000 bps
          Match: any
          Allow

    Class-map: FIREWALL-TEST-seq-11-cm_ (match-all)  
      Match: access-group name FIREWALL-TEST-seq-11-acl_
      Pass
        15 packets, 1470 bytes

    Class-map: FIREWALL-TEST-seq-21-cm_ (match-all)  
      Match: access-group name FIREWALL-TEST-seq-21-acl_
      Inspect
        Established Sessions
         Session ID 0x00054724 (192.168.10.10:53865)=>(208.67.222.222:53) dns SIS_OPEN
          Created 00:00:09, Last heard 00:00:04
          Bytes sent (initiator:responder) [154:0]
        
        Half-open Sessions
         Session ID 0x00054726 (192.168.10.10:35795)=>(208.67.220.220:53) dns SIS_OPENING
          Created 00:00:01, Last heard 00:00:01
          Bytes sent (initiator:responder) [81:0]


    Class-map: FIREWALL-TEST-seq-31-cm_ (match-all)  
      Match: class-map match-any FIREWALL-TEST-s31-l4-cm_
        Match: protocol https
          0 packets, 0 bytes
      Inspect


    Class-map: FIREWALL-TEST-seq-41-cm_ (match-all)  
      Match: class-map match-any FIREWALL-TEST-s41-l4-cm_
        Match: protocol http
          0 packets, 0 bytes
      Inspect


    Class-map: class-default (match-any)  
      Match: any
      Drop
        672 packets, 60480 bytes

IPS

How to verify IPS is functioning?

Make sure to issue the command "show utd engine standard status" and make sure the engine status is Green. Also make sure that the signature package file name below "29111.8.s" has a ".s" indicating it is subscriber signature set and not have ".c" to indicate community signature set that comes default with the virtual app-hosting image.

Raleigh-CSR#sh policy-map type inspect zone-pair sessions 
  Zone-pair: ZP_INSIDE_OUTSIDE_FIREWALL-TEST
  Service-policy inspect : FIREWALL-TEST
    Class-map: FIREWALL-TEST-seq-1-cm_ (match-all)  
      Match: access-group name FIREWALL-TEST-seq-1-acl_
      Inspect
        Established Sessions
         Session ID 0x0005454C (192.168.10.10:33284)=>(52.11.201.245:443) https SIS_OPEN
          Created 00:07:05, Last heard 00:07:05
          Bytes sent (initiator:responder) [1164:3826]
      Service-policy inspect avc : My-drop-app-list-pm_
        Class-map: My-drop-app-list-cm0_ (match-any)  
          0 packets, 0 bytes
          30 second offered rate 0000 bps, drop rate 0000 bps
          Match: protocol attribute application-family peer-to-peer
            0 packets, 0 bytes
            30 second rate 0 bps
          Match: protocol attribute application-family webmail
            0 packets, 0 bytes
            30 second rate 0 bps
          Match: protocol gtalk
            0 packets, 0 bytes
            30 second rate 0 bps
          Match: protocol gtalk-chat
            0 packets, 0 bytes
            30 second rate 0 bps
          Match: protocol youtube
            0 packets, 0 bytes
            30 second rate 0 bps
          Deny
        Class-map: class-default (match-any)  
          4847090 packets, 3040398370 bytes
          30 second offered rate 160000 bps, drop rate 0000 bps
          Match: any
          Allow
    Class-map: FIREWALL-TEST-seq-11-cm_ (match-all)  
      Match: access-group name FIREWALL-TEST-seq-11-acl_
      Pass
        15 packets, 1470 bytes

    Class-map: FIREWALL-TEST-seq-21-cm_ (match-all)  
      Match: access-group name FIREWALL-TEST-seq-21-acl_
      Inspect
        Established Sessions
         Session ID 0x00054724 (192.168.10.10:53865)=>(208.67.222.222:53) dns SIS_OPEN
          Created 00:00:09, Last heard 00:00:04
          Bytes sent (initiator:responder) [154:0]
   
        Half-open Sessions
         Session ID 0x00054726 (192.168.10.10:35795)=>(208.67.220.220:53) dns SIS_OPENING
          Created 00:00:01, Last heard 00:00:01
          Bytes sent (initiator:responder) [81:0]
    Class-map: FIREWALL-TEST-seq-31-cm_ (match-all)  
      Match: class-map match-any FIREWALL-TEST-s31-l4-cm_
        Match: protocol https
          0 packets, 0 bytes
      Inspect
    Class-map: FIREWALL-TEST-seq-41-cm_ (match-all)  
      Match: class-map match-any FIREWALL-TEST-s41-l4-cm_
        Match: protocol http
          0 packets, 0 bytes
      Inspect
    Class-map: class-default (match-any)  
      Match: any
      Drop
        672 packets, 60480 bytes

Dashboard does not populate any data for IPS

Both overall dashboard as well as device specific dashboards do not show IPS events even though there is some traffic that would trigger signatures are seen by the router.

1. Make sure subscriber signature set is running.  The default that comes is the free community signature set with only about 600 signaures. Use the IPS signature update (under settings) and provide CEC credentials and get the subscriber signature set.

IPS-Signature-Update.jpg

 

 

2. Issue the following command and make sure subscriber signature package has been downloaded. Either by going to Monitor >> Network >> Choose Router >> Intrusion Prevention

12_39_58.jpg

 

OR

3. Via CLI connect to the router and issue the following command and make sure the subscriber signature package has been downloaded.

Cary-CSR#sh utd engine standard threat-inspection signature update status
Current signature package version: 29111.35.s
Current signature package name: UTD-STD-SIGNATURE-29111-35-S.pkg ========> S indicates subscriber sig
Previous signature package version: 29111.34.s
---------------------------------------
Last update status: Successful
---------------------------------------
Last successful update time: Fri Jan 4 03:30:16 2019 UTC
Last successful update method: Manual
Last successful update server: None
Last successful update speed: 3917236 bytes in 20 secs
---------------------------------------
Last failed update time: None
Last failed update method: None
Last failed update server: None
Last failed update reason: None
---------------------------------------
Last attempted update time: Fri Jan 4 03:30:16 2019 UTC
Last attempted update method: Manual
Last attempted update server: None
---------------------------------------
Total num of updates successful: 5
Num of attempts successful: 5
Num of attempts failed: 0
Total num of attempts: 5
---------------------------------------
Next update scheduled at: None
---------------------------------------
Current status: Idle

URL-F

Error when updating the device template with the Security Policy

Picture1.png

  • Confirm virtual image for IPS and URL-F were uploaded

Picture1.png

  • On the protocol/destination port, in this case we need to give the protocol number, mainly it will be TCP Or UDP. If no protocol is defined then both TCP & UDP are used.

Performance Issues with URL-F and IPS

Policy is applied to VPN1 and seeing worst performance when DIA is disabled and sent traffic through DC.
case1) With DIA and Url filter and IPS turned ON, everything works normal.
case2) When we disable DIA and keeping the URL filter and IPS turned ON, we do see the response is pretty slow.
case3) When we disable DIA, URL and IPS all of them turned OFF, everything works as normal and the requests from client behind the router are relatively fast.

-  It does take upto 10 mins for the IPS to work normally after applying the policy. This is expected from container perspective, DT team also confirmed this behavior. After 10mins we do see things working normal.

DNS/web-layer Security

DNS Request not resolving

  • In case we have 2 default routes on underlay and one is not having internet reachability (assume it is MPLS) and DNS resolution doesn't work due to ECMP.
  • Need to implement static routes for RFC1918 subnets pointing to MPLS next-hop and default route to internet next-hop.

Umbrella Registration failing

 

ISR4331-1#sh sdwan umbrella device-registration
umbrella-ios-oper-data umbrella-dev-reg-data 1
status UNKNOWN
tag vpn1
device-id ""
description "Domain lookup failed for 'api.opendns.com',retrying"

 

  • Check if DNS is working, ping Google we get the following:
ISR4331-1#ping www.google.com
% Unrecognized host or address, or protocol not running.
  • The device needs to be configured with a DNS service and domain look-up enabled from vManage. Under the VPN0 template, add DNS Server. Verify the running configurations
sh sdwan running-config | | name-server
ip name-server 8.8.8.8
  • We have two interfaces, an INET and MPLS. Traffic utilizing the MPLS next hop is using a source IP of the INET interface and traffic utilizing the INET next hop is using a source IP of the MPLS interface
  • The FIB shows the correct interface mapping, but traffic flows appear to ignore the source IPs. G0/0/0 is INET and gi0/0/1 is MPLS
ISR4331-1#sh ip cef
Prefix Next Hop Interface
0.0.0.0/0 10.1.5.1 GigabitEthernet0/0/0
10.2.5.1 GigabitEthernet0/0/1
0.0.0.0/8 drop
0.0.0.0/32 receive
10.1.5.0/28 attached GigabitEthernet0/0/0
10.1.5.0/32 receive GigabitEthernet0/0/0
10.1.5.1/32 attached GigabitEthernet0/0/0
10.1.5.3/32 receive GigabitEthernet0/0/0
10.1.5.15/32 receive GigabitEthernet0/0/0
10.2.5.0/28 attached GigabitEthernet0/0/1
10.2.5.0/32 receive GigabitEthernet0/0/1
10.2.5.1/32 attached GigabitEthernet0/0/1
10.2.5.2/32 receive GigabitEthernet0/0/1
10.2.5.15/32 receive GigabitEthernet0/0/1
127.0.0.0/8 drop
192.0.2.0/30 attached VirtualPortGroup1
192.0.2.0/32 receive VirtualPortGroup1
192.0.2.1/32 receive VirtualPortGroup1
192.0.2.2/32 attached VirtualPortGroup1
192.0.2.3/32 receive VirtualPortGroup1
224.0.0.0/4 drop
224.0.0.0/24 receive
240.0.0.0/4 drop
255.255.255.255/32 receive
  • The DNS/Traffic is taking the route of MPLS because of ECMP. Need to implement static routes for RFC1918 subnets pointing to MPLS next-hop and default route to internet next-hop.
  • Verify Umbrella registration now
ISR4331-1#sh sdwan umbrella device-registration
umbrella-ios-oper-data umbrella-dev-reg-data 1
status UNKNOWN
tag vpn1
device-id ""
description "Socket connection failed for'api.opendns.com',retrying"
  • Verify DNS is resolving ping opendns.com from the transport VPN
  • Match the Umbrella token from Umbrella and the one that is configured in vManage.
  • Verify that “allow-services all” is configured under the interface facing the internet.
  • Now verify Umbrella registration with “sh umbrella config” - should show the device registered and with a valid org ID.

The Security push fails after Umbrella is detached from the policy

11-Oct-2018 21:34:15 UTC] Configuring device with feature template: 4331-Branch-v1-GreatWALL
[11-Oct-2018 21:34:15 UTC] Generating configuration from template
[11-Oct-2018 21:34:24 UTC] Checking and creating device in vManage
[11-Oct-2018 21:34:32 UTC] Device is online
[11-Oct-2018 21:34:32 UTC] Updating device configuration in vManage
[11-Oct-2018 21:34:44 UTC] Pushing configuration to device
[11-Oct-2018 21:34:55 UTC] Failed to process device request. Error response : rpc-reply error: <rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="10">
  <rpc-error>
    <error-type>application</error-type>
    <error-tag>invalid-value</error-tag>
    <error-severity>error</error-severity>
    <error-message unknown:lang="en">inconsistent value: Device refused one or more commands</error-message>
    <error-info>
      <severity xmlns="http://cisco.com/yang/cisco-ia">error_cli</severity>
      <detail xmlns="http://cisco.com/yang/cisco-ia">
        <bad-cli>
          <bad-command>utd engine standard multi-tenancy</bad-command>
          <error-location>0</error-location>
          <parser-response>%Config update in progress; please wait and retry</parser-response>
        </bad-cli>
        <bad-cli>
          <bad-command> no web-filter block page profile block-GreatWall_URL</bad-command>
          <error-location>5</error-location>
          <parser-response/>        </bad-cli>
      </detail>
    </error-info>
  </rpc-error>
</rpc-reply>
  • Umbrella can be detached by going to Config > Security > go to one of the tabs (say firewall) click the ... and select "detach"
  • Wait for UTD Engine status to turn green and push the policy again monitor -> network -> device -> realtime -> "UTD Engine Status"
  • To fix the issue a) sufficient time was given to avoid error "config update in progress" b) Security Policy was detached from Device template and modified and reattached.

 

 

 

 

 

CreatePlease to create content
Content for Community-Ad
July's Community Spotlight Awards