Solved: Re: Is there a bug?: transport timeout

JM Montenot · ‎02-20-2018

Hello all,

I am facing an issue with NSO v4.5 and I would like your opinion about it.

I have configured services to dynamically configure a cisco IOS XE 4331 router using NETCONF.

I am doing things like that:

1- My router is configured with an initial configuration, the NSO is configred and the sync with router is OK (devices check-sync ==> true)

2- I ask for a service to be activated on the router using REST API (curl request)

3- After a given time, I receive a request saying *** ALARM abort-error: RTR: transport timeout; closing session. When I check the synchronization, I have a info saying that Device is locked in a commit operation.

The fact is that after this error, I have my router desynchronized.

The second fact is that this error comes randomly (sometimes it work).

I must confess that we have an import latency (500ms) between NSO and router due to satellite latency.

Could you please explain me why this error occurs randomly (1 time out of 2 or 3)? Could it come from latency? XML encoding?

Thank you in advance.

Kind regards,

Alexandre

frjansso · ‎02-22-2018

I agree, this does look like a bug. I'd recommend that you open up a ticket for this.

View solution in original post

frjansso · ‎02-20-2018

If you suspect this is because of a slow network, you may want to try to increase the read (and possibly later the write) timeout between NSO and the device.

It can be set

per device /devices/device/foo/read-timeout

- or -

globally: /devices/global-settings/read-timeout

JM Montenot · ‎02-22-2018

Hello Fredrik,

After to days of further test (the bug is not easy to reproduce), I have concluded that this is not due to delay (timeout have been properly set and latency reduced to a few ms).

The bug is hard to reproduce since it happens the first time of use after 2/3 hours of inactivity. Once it has passed one time, it passes three or four times after.

From my side, this is an issue due to NSO since the router is reconfigured but the NSO doesn't add the service to the service list so that both are desynchronized.

Furthermore, I have the following error messages:

*** ALARM abort-error: RTR rtr_test: transport timeout; closing session.

If I check sync, I have the following message:

**** ALARM out of sync: device is locked in a commit operation by session 382.

Thanks in advance

davidmb · ‎02-22-2018

Hi

I have an issue that is very similar when using NSO 4.5.3. My service uses some reactive-redeploys and I have noticed that since upgrading to 4.5.3 the configurations to be deployed to devices during later reactive-redeploys are not sent to the devices. Looking through the devel.log I notice:

<DEBUG> 22-Feb-2018::01:49:27.951 macbook ncs[28484]: ncs Requestor {'sync-from',1308,<0.24276.1>} tries to acquire lock for device <<"xxxxxx-xxx02-4331">>

<DEBUG> 22-Feb-2018::01:49:30.795 macbook ncs[28484]: ncs Requestor {'sync-from',1315,<0.24496.1>} tries to acquire lock for device <<"xxxxxx-xxx02-2960">>

<DEBUG> 22-Feb-2018::01:50:28.496 macbook ncs[28484]: ncs Requestor {'sync-from',1343,<0.26041.1>} tries to acquire lock for device <<"xxxxxx-xxx01-4331">>

<DEBUG> 22-Feb-2018::01:51:18.515 macbook ncs[28484]: ncs Requestor {'sync-from',1392,<0.27137.1>} tries to acquire lock for device <<"xxxxxx-xxx02-4331">>

<DEBUG> 22-Feb-2018::01:51:23.869 macbook ncs[28484]: ncs Requestor {'sync-from',1400,<0.27398.1>} tries to acquire lock for device <<"xxxxxx-xxx01-4331">>

On the NSO CLI I get:

admin@ncs% *** ALARM service-activation-failure: xxxxxx-xxx02-4331: Device is locked in a sync-from operation by session 1392

This does not cause the service to abort, just not apply *some* configuration. The same service does not have this issue in earlier versions on NSO.

Can someone explain this ALARM and why it is occurring?

Thanks

David

frjansso · ‎02-22-2018

I agree, this does look like a bug. I'd recommend that you open up a ticket for this.