cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1411
Views
0
Helpful
5
Replies

sync-from performance

rainnomm56
Level 1
Level 1

What is the expected performance of NSO doing sync-from?

I have 110 alcatel "alu-sr" devices and sync from them takes quite some time, around 7 minutes. Last device that finished sync was "kjj-sr1".

<INFO> 6-May-2018::18:23:57.519 nso ncs[19682]: ncs syncing from kjj-sr1...

<INFO> 6-May-2018::18:23:57.832 nso ncs[19682]: ncs connecting NED kjj-sr1

<INFO> 6-May-2018::18:30:49.383 nso ncs[19682]: ncs syncing from kjj-sr1 ok

If i do single device sync-from that "kjj-sr1" then it is about a minute and thirty seconds:

<INFO> 6-May-2018::21:05:49.870 nso ncs[19682]: ncs syncing from kjj-sr1...

<INFO> 6-May-2018::21:07:23.059 nso ncs[19682]: ncs syncing from kjj-sr1 ok

It is still strangely long time. I can pull config over telnet in 17sec. Also i have a php script that translates most of config to json in ~1sec. So why it takes NSO to do the same job a minute and 30 seconds. What i'm missing?

1 Accepted Solution

Accepted Solutions

The NSO parser may take a little longer than your script to consume the config, since it's a general parser, but I don't think much time is spent there with NSO either. Most of that time is likely spent validating the configuration. Your script is not doing anything with the result, while NSO has to ensure all the constraints in the model are upheld, compute the diff, store that diff in the database and communicate the diff to any subscribers, all in a transactional manner.

View solution in original post

5 Replies 5

lmanor
Cisco Employee
Cisco Employee

Hello,

You can always turn on southbound tracing to the device, it will chronicle what is happening, and how long.

set devices device <dev-name> trace raw

request devices device <dev-name> disconnect

request devices device <dev-name> connect


-Larry

I did that and in NSO:

- config pull is 25sec

- then it gives no output, i presume its translating config to xml for 43sec?

- then for some reason it is doing second time config pull for 25sec. For second config pull it is not doing translation.

That particular device config is quite big:

- flattened (like juniper display set) config is 24 127 lines

- services from that are 23 618 lines

I have my own scripts that I've used to analyze and prepare mass changes on those devices:

- config pull is about the same 25sec

- i do not have full config parser, only services, but that's 97% of config, and my php script with ~220 regex matches does parsing in 0.7 seconds:

     rain@ohoo:~/project/parse$ php test.php kjj-sr1

     file load: 0.0027048587799072

     Total: 0.71830701828003

I do not understand how it takes NSO 43sec to do a job that should be like 1sec. Granted i do not do any database merge I just write json output to file, but still that's quite a time difference.

I have hourly cron running for those alcatel devices. I do not pull config, i use configuration backup files. My configuration parser is not threaded, it does parsing by single device and still, the whole lot (110 devices) takes about 30sec.

Setting my whine above aside. Can anybody give some indication of real performance of NSO server.

My test NSO server is debian9 virtual-machine with (2CPU, 16GB ram and 80GB hdd).

Device count:

xxx@ncs> show devices list | count

Count: 113 lines

Config lines count:

xxx@ncs> show configuration devices | display set | count

Count: 341137 lines

It takes over 7 minutes to do "sync-from" on those devices.

The NSO parser may take a little longer than your script to consume the config, since it's a general parser, but I don't think much time is spent there with NSO either. Most of that time is likely spent validating the configuration. Your script is not doing anything with the result, while NSO has to ensure all the constraints in the model are upheld, compute the diff, store that diff in the database and communicate the diff to any subscribers, all in a transactional manner.

Another question confusing matter. On live ISP network nobody can model from day 1 all the services. Usually you go live with couple most used services and leave some of service in old/manual provisioning. That automatically means that NSO will be out of sync with the device quite a lot of time. So, quite often i have to do sync-from and reconcile services just to be sure that everything is in sync. That adds quite a lot of delay.

On my testing i went for hierarchical service model for re-usability (mostly in tier3 part)

Tier1 - Product from sales system with all it's parameters, key is product_id - it's job is to provision Tier2 technical solution, no device config

     Tier2 - Technical solution (like l3vpn, l2vpn, etc) - it's jobs is pull/allocate technical parameters (vlan, ip, etc) and create Tier3 services, no device config

          Tier3 - single network element config, here happens yang data to device config

I chose to separate Tier2-technical from Tier1-product, because not all Tier2-l3vpn have single one-to-one mapped Tier1 counter parts. They might be older products, or internally used solutions or some temporary stuff.

So my basic l3vpn is looking like below. Currently it is covering only: service router + aggregation, no access device or cpe.

service - T1-"l3vpn-remote-office-no-cpe" {key: "product-id"

     service - T2-l3vpn-remote-office { key: "vprn-service-id interface"

          service T3-vrf-ip-interface {key: "device-name service-id interface"}

          service T3-mpls-l2pipe {key: "device-name service-id"}

     }

}

My issue is that doing reconcile for "T3-vrf-ip-interface" on my most config lines device "kjj-sr1" is taking 10 minutes. After that i have to reconcile from t3->t2 and t2->t1, then I have full sync of services.

     - currently my most busy device would have at least 1.5min (config sync) and 10min reconcile for T3 = 11.5min

     - what happens when T3-mpls-l2pipe device is also out of sync, also i planned to add access network and cpe to the product/service as well. I do not have live data to back me up, but how about 15..30min delay to deliver a new service or change some parameter on existing one from automation side alone?


That does not sound very automated   more like cheap manual labor hidden inside a black box.

Sry for posting before finalizing testing. Initial reconcile of T3 took 10min. Today 6 new services have been configured to "kjj-sr1" and T3 reconcile took ~45sec. That is ok. Of course this is just one device in the service path, but in general this is ok performance.