cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
285
Views
1
Helpful
1
Replies

Streaming Telemetry Dropouts

j-a326
Level 1
Level 1

Dear Community, we'd like to use streaming telemetry to acquire live data from our network but on some devices we continually run into a state where most to all telemetry messages get lost/dropped. The affected devices mainly do L3/routing, devices that only do switching seem to be working fine so far (if this is the discerning criteria). It normally helps to delete and reapply the subscriptions but it would be great if it would work without constant reconfiguration.

even a simple setup like

telemetry ietf subscription 1
encoding encode-kvgpb
filter xpath /interfaces-ios-xe-oper:interfaces/interface/statistics
stream yang-push
update-policy periodic 12000
receiver ip address 10.1.1.1 57001 protocol grpc-tcp
...
telemetry ietf subscription 2
filter xpath /process-cpu-ios-xe-oper:cpu-usage/cpu-utilization/five-seconds
update-policy periodic 60000
...
telemetry ietf subscription 3
filter xpath /transceiver-oper-data/transceiver/internal-temp
update-policy periodic 60000
...


ran into dropping messages after a few minutes. Is there any way to figure out why this happens? I'd guess it has nothing to do with the receiving end because I can see no packets sent to the receiver after the drops are starting (and the receiver continues to receive updates from the unaffected devices).

I found two metrics which might reveal helpful insights, but no further information whether they are relevant to the case or how they would contribute to the solution:

 

show telemetry internal performance subscription
Sub ID D-Max D-Min D-Avg S-Max S-Min S-Avg R-Max R-Min R-Avg Instances
------------------------------------------------------------------------------------------------------------------------
3 0.1 0.0 0.1 2.7 2.7 2.7 792.6 18.3 38.2 8
1 0.2 0.2 0.2 78.1 78.1 78.1 378.0 331.6 363.6 4
2 0.0 0.0 0.0 0.5 0.5 0.5 407.8 144.8 241.0 8

show telemetry internal subscription all stats
Telemetry subscription stats:
Subscription ID Msgs Sent Msgs Drop Bytes Sent Connection Info
---------------- ---------- ---------- ------------ -----------------------------------------
1 10500 12985 6506058 (10.1.1.1:57001)
3 1836 2061 287028 (10.1.1.1:57001)
2 1242 1356 144693 (10.1.1.1:57001)

Has anyone experienced similar behaviour? Is there a way to debug the inner workings of streaming telemetry or tweak the settings? Am I maybe missing some crucial point? Any help or ideas are welcome

 

 

 

1 Reply 1

Looking at your outputs, your device is dropping a huge number of messages, you can see the message drop count is higher than the sent count, over 50% by the looks of things. This says that the issue on the sending device to me. Your sub1 (interfaces/statistics) has a way higher sample avg (78.1ms) than the others you have. This would mean encoding the interface statistics data is a huge cost and is taking significantly longer than the CPU or transceiver data. 

I would start with looking at your device CPU (sorted), reduces your frequency of the most expensive data collections, reduce the scope of the filter, like you have interfaces, can you limit this to only the ones you need. You did not say what version you are running here?

Hope this helps.

Please mark this as helpful or solution accepted to help others
Connect with me https://bigevilbeard.github.io