cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
516
Views
0
Helpful
8
Replies
Highlighted
Beginner

NSO touching non-service config on service re-deploy - please explain why

It would be appreciated if someone explained the NSO behavior described below to me.

 

I am deploying a service instance which enables interface SVI on Cisco Neuxs, which causes the device to enable 'Vlan 1' SVI - this one is, however, not explicitly configured in the service templates. So when we configure a service, we'll get a config on the device:

 

 

NX_LEAF111# show run int Vlan 1

interface Vlan1
  no ip redirects
  no ipv6 redirects

but NSO does not yet know anything about it (as the change has not been made by NSO, but by the device itself):

 

rslaski@ncs# show running-config devices device NX_LEAF111 config nx:interface Vlan 1
------------------------------------------------------------------^
syntax error: unknown argument

We need a sync-from, to get the config visible in NSO, and of course it is not a part of a service (note, no backpointers there):

rslaski@ncs# devices device NX_LEAF111 sync-from 
result true
rslaski@ncs# show running-config devices device NX_LEAF111 config nx:interface Vlan 1 | display service-meta-data 
devices device NX_LEAF111
 config
  nx:interface Vlan1
   shutdown
   no ip redirects
   no ipv6 redirects
  exit
 !
!

However, now the unexpected: when re-deploying the service, the NSO tries to remove that config, as it was a part of the service:

rslaski@ncs# TMPL EVPN EVPN_L3 RSL_L3_20200515a re-deploy dry-run                          
cli {
    local-node {
        data  devices {
                   device NX_LEAF111 {
                       config {
                           nx:interface {
              -                Vlan 1 {
              -                    shutdown;
              -                    ip {
              -                        redirects false;
              -                    }
              -                    ipv6 {
              -                        redirects false;
              -                    }
              -                }

Why the NSO is touching that non-service config? Something I do not understand there?

8 REPLIES 8
Highlighted
Cisco Employee

This might be related to tagging on your service XML template.

Before redeploying, can you try re-deploy reconcile with option keep-non-service-config (default).

(You can dry-run this as well)

 

If indeed this is related to tagging, then trying to redploy in the way that you did, should not remove the configs now.

Let us know here if this is the case, and then we can look into the tagging on the template.

 

In any case, if there are commands that you know will be implicitly added on the device when you configure other commands, then you should include those commands in your template.

Highlighted

Thanks for your reply!

 

Yes, what I have learnt already is that when coding services one should check for the default config applied by the device. So, basically the service deployment process should include 'sync-from' + 're-deploy dry-run', and only if no changes are reported by dry-run, the service template is correct.

 

However in my case it has nothing to do with templates or tags. We have made tests in a customer brown-field environment, and, for example NSO dry-run reported that a VRF will be configured for a new service, but at the same time, another 'brown-field' VRF will be removed (which was never a part of any service). Then, after the final commit, it wasn't really removed, but that config has been marked with backpointer.

 

That's still strange, and can't explain that, that's why I am asking for any advice.

 

robert,

Highlighted

Hi,

 

I think it would be useful if you insert your service template. it could well be that NSO believes that it created the Vlan interface and it needs to be deleted.

Also, the NX OS NED has a number of setting to re-create some bizare NX device behaviors. You can adjust a device to the right behavior by doing: admin@ncs(config)# devices device nx0 ned-settings cisco-nx behaviours ?

 

The list of behaviors are described in the README file. One of them that may be of interest is:

show-interface-all Enable this to use 'show running-config interface all' to get full configuration for all interfaces to avoid problems with
hidden defaults.

Highlighted

Roque,

In my case, service templates comprise of over a dozen files, with 2k lines of XML code, so posting that here would probably crash the community page ;-)

I will check NED behaviours knobs again, but I don't think that's the case - Vlan1 SVI was just an example, but this could be indeed a default config problem, which is easy to fix.

We have tested the service creation (which includes VRF creation) on a brownfield switches, and dry run reported the transaction would delete another brownfield VRF 'poc_nsx-t' (it was always there on the switch, and it was never a part of any service):

cli {
    local-node {
        data  devices {
                  device L201 {
                      config {
                          dcs:vrf {
             +                definition test-20200601-1 {
             +                }
                          }
                          dcs:ip {
                              routing {
             -                    vrf poc_nsx-t;
             +                    vrf test-20200601-1;
                              }
                          }

and other things related to that brownfield configuration, which seem rather random to me:

                              Vxlan 1 {
                                  vxlan {
                                      vrf {
             -                            name poc_nsx-t;
             +                            name test-20200601-1;
             -                            vni 10999;
             +                            vni 85000;

On another switch, the service commit told us, it would delete management VRF on a switch (sic!)

The service config did not specify that VRF, of course:

              TMPL {
                  EVPN {
             +        EVPN_L3 test-20200601-1 {
             +            customer-name CUST1;
             +            vnid 85000;
             +            vrf test-20200601-1;
             +            anycast-ipv4 10.10.234.254/24;
             +            dc 65034 {
             +                l2-service test-20200601-1;
             +            }
             +        }
                  }
              }

What was interesting, the full commit did not in fact deleted the config, but the suspicious config lines (like VRF 'poc_nsx-t') were marked with backpointers.

Tried various debugs and traces to find that, but did not find anything helpful.

robert,

Highlighted

One more thing: for VRF name templates just use a variable, which is set straight from the service model, so no manipulation, lookups, reservation, etc. there:

vars.add('VAR_VRF', service.vrf)

robert,

Highlighted

Ok, I think I've found the issue, and it has nothing to do with service code. The problem is with faulty NED YANG model of the device - it specifies the VRF inside 'Vxlan1' as container:

 container interface {
	 
    // interface Vxlan *
    list Vxlan {
	
      // interface Vxlan * / vxlan
      container vxlan {
	  
        // interface Vxlan * / vxlan vrf
        container vrf {
          tailf:info "Vrf which is mapped to a vni";
          tailf:cli-compact-syntax;
          tailf:cli-sequence-commands {
            tailf:cli-reset-siblings;
          }

          leaf name {
            tailf:cli-drop-node-name;
            tailf:non-strict-leafref {
              path "/dcs:vrf/definition/name";
            }
            type string {
              tailf:info "WORD;;VRF name";
            }
          }
          leaf vni {
            tailf:info "VXLAN Network Identifier configuration";
            type uint32 {
              tailf:info "<1-16777215>;;VXLAN Network Identifier";
            }
          }

while that should be a list, as the device allows a number of VRFs to be assigned under the interface:

AR-LEAF133#show run int vxlan 1
interface Vxlan1
   vxlan source-interface Loopback1
   vxlan udp-port 4789
   vxlan vlan 1201 vni 80111
   vxlan vrf TENANT77 vni 10077
   vxlan vrf poc_nsx-t vni 10999
   

So, the NED code processes the device configuration properly (no significant lines skipped), but it is then destroyed by NSO trying to align that to the NED YANG model, so from NSO perspective, under Vxlan1 interface all but one VXLAN VRF instances get lost:

rslaski@ncs# show running-config devices device AR_LEAF133 config dcs:interface Vxlan 1 
devices device AR_LEAF133
 config
  dcs:interface Vxlan1
   vxlan source-interface Loopback1
   vxlan udp-port 4789
   vxlan vlan 1201 vni 80111
   vxlan vrf poc_nsx-t vni 10999
  !
 !
!

So with service applied, the NSO following the faulty model, indeed tries to replace that single line with a new one, which can be seen in a service debug:

rslaski@ncs(config-EVPN_L3-RSL_L3_t2)# commit dry-run | debug service

Service: /TMPL/EVPN/EVPN_L3[name='RSL_L3_t2']
shared_create /devices/device[name='AR_LEAF133']/config/dcs:interface/Vxlan[id='1'], refcount: 3
shared_set /devices/device[name='AR_LEAF133']/config/dcs:interface/Vxlan[id='1']/vxlan/vrf/vni: 85777, refcount: 2, original: 10999
shared_set /devices/device[name='AR_LEAF133']/config/dcs:interface/Vxlan[id='1']/vxlan/vrf/name: TEST888, refcount: 2, original: poc_nsx-t

and in final config with backpointers/refcounts (look at 'originalvalue'):

devices device AR_LEAF133
 config
  ! Refcount: 3
  ! Backpointer: [ /TMPL:TMPL/TMPL:EVPN/TMPL:EVPN_L2[TMPL:name='RSL_L2_1'] /TMPL:TMPL/TMPL:EVPN/TMPL:EVPN_L3[TMPL:name='RSL_L3_t2'] ]
  dcs:interface Vxlan1
   ! Refcount: 2
   ! Originalvalue: Loopback1
   vxlan source-interface Loopback1
   ! Refcount: 2
   ! Originalvalue: 4789
   vxlan udp-port 4789
   ! Refcount: 1
   ! Refcount: 1 (/devices/device{AR_LEAF133}/config/dcs:interface/Vxlan{1}/vxlan/vlan{1201})
   ! Backpointer: [ /TMPL:TMPL/TMPL:EVPN/TMPL:EVPN_L2[TMPL:name='RSL_L2_1'] ] (/devices/device{AR_LEAF133}/config/dcs:interface/Vxlan{1}/vxlan/vlan{1201})
   vxlan vlan 1201 vni 80111
   ! Refcount: 2
   ! Originalvalue: poc_nsx-t
   vxlan vrf TEST888 vni 85777

Similar errors probably affect another parts of the NED model.

Lessons learnt: 
1) That was quite tricky
2) Using automated testing by NED developers would be appreciated
3) There's no easy way to find out which config parts did not fit into device model

robert,

Highlighted

Did you submitted a NED TAC ticket? If you send the TAC your target configuration in the device, an automatic test case will be created. That will allow you to remove any need to check this in future versions of the NED.

Moving NED models from containers to lists is pretty common as CLIs evolve. Same from leafs to containers.

Highlighted

Yes, I did today. I can send the TAC a sample final and complete configuration, as advised, but certainly it will not pass the test (I know there are other, less important issues in both NEDs that I am using). This will in turn open a thread of questions like "what is the business impact", "we do not have to support all the features", "this needs to be escalated through your AM", etc.