Packet loss measurements on ASR9k

Mery · ‎03-26-2020

Hi everyone,

Is there any way to control the packet loss measurement for SLM/Y1731 on ASR9k? For example, I have created a probe to send 60 packets every second, but it happens I lose one packet, therefore the packet loss would be 1/60=1.67%, which is a considerable percentage. So my question is, if I can calculate the packet loss based only on 55 packets. Even I would lose 5 packets, I can still have 0%packet loss because I have received at least 55 packets out of 60 packets.

Thank you

smilstea · ‎03-26-2020

Yes and no, you can change the sample size, how many buckets there are, if you aggregate the buckets or not, etc. So say for instance you have 10 samples and from those 10 1 has loss and the other 9 don't, well each of those bins can be aggregated for the bucket or buckets and the percentage loss will be less for the bucket.

https://www.cisco.com/c/en/us/td/docs/routers/asr9000/software/asr9k_r4-3/interfaces/configuration/guide/hc43xasr9kbook/hc43eoam.html#45511

These 7 sections on configuring SLA profiles talk about SLM and buckets etc configuration.

Below are some key points I took from the document that you should review related to how statistics are calculated and limitations of SLM.

Limitations of Data Loss Measurement

1. Data loss measurement cannot be used in a multipoint service; it can only be used in a peer-to-peer service.

2. As a Loss Measurement Reply (LMR) contains no sequence IDs, the only field, which can be used to distinguish to which probe a given LMR corresponds, is the priority level. Also, the priority level is the only field that can determine whether the LMR is in response to an on-demand or proactive operation. This limits the number of Loss Measurement probes that can be active at a time for each local MEP to 16.

3. As loss measurements are made on a per-priority class basis, QoS policies, which alter the priority of packets processed by the network element, or re-order packets can affect the accuracy of the calculations. For the highest accuracy, packets must be counted after any QoS policies have been applied.

4. The accuracy of data loss measurement is highly dependent on the number of data packets that are sent. If the volume of data traffic is low, errors with the packet counts might be magnified. If there is no data traffic flowing, no loss measurement performance attributes can be calculated. If aggregate measurements are taken, then only 2 probes can be active at the same time: one proactive and one on-demand.

5. The accuracy of data loss measurement is highly dependent on the accuracy of platform-specific packet counters. Due to hardware limitations, it may not be possible to achieve completely accurate packet counters, especially if QoS policies are applied to the packets being counted.

6. Performing data loss measurement can have an impact on the forwarding performance of network elements; this is because of the need to count, as well as forward the packets.

7. Before starting any LMM probes,it is necessary to allocate packet counters for use with LMM on both ends (assuming both ends are running Cisco IOS XR Software).

Counters for packet loss, corruption and out-of-order packets are kept for each bucket, and in each case, a percentage of the total number of samples for that bucket is reported (for example, 4% packet corruption).

When aggregation is enabled using the aggregate command, bins are created to store a count of the samples that fall within a certain value range, which is set by the width keyword. Only a counter of the number of results that fall within the range for each bin is stored. This uses less memory than storing individual results. When aggregation is not used, each sample is stored separately, which can provide a more accurate statistics analysis for the operation, but it is highly memory-intensive due to the independent storage of each sample.

A bucket represents a time period during which statistics are collected. All the results received during that time period are recorded in the corresponding bucket. If aggregation is enabled, each bucket has its own set of bins and counters, and only results relating to the measurements initiated during the time period represented by the bucket are included in those counters.

By default, there is a separate bucket for each probe. The time period is determined by how long the probe lasts (configured by the probe, send (SLA), and schedule (SLA) commands).You can modify the size of buckets so that you can have more buckets per probe or fewer buckets per probe (less buckets allows the results from multiple probes to be included in the same bucket). Changing the size of the buckets for a given metric clears all stored data for that metric. All existing buckets are deleted and new buckets are created.

Thanks,

Sam

Mery · ‎03-27-2020

Thank you for your response. So does this mean I can calculate based on a threshold? So If I have above 30% of packet los ASR9k can start reporting it? And if I have less then 30% packet loss, it will not count that loss?

When aggregation is enabled using the aggregate command, bins are created to store a count of the samples that fall within a certain value range, which is set by the width keyword. Only a counter of the number of results that fall within the range for each bin is stored. This uses less memory than storing individual results. When aggregation is not used, each sample is stored separately, which can provide a more accurate statistics analysis for the operation, but it is highly memory-intensive due to the independent storage of each sample.

smilstea · ‎03-27-2020

So you can affect how many buckets and bins there are, but the result will still be reported as a value from the sample or samples (bucket) as an aggregate or average of the samples within said bucket.

As for if you can just exclude a certain value range I am not sure, I would need to do some extensive testing to see that.

How are you collecting the sample values, SNMP? If so then you can setup a trigger in your monitoring tool for only over a certain integer value for the SNMP poll result. At least that is what I would do, and you can time your SNMP polls to go along with how often a bucket fills, etc.

To put it another way, the ASR9K will always report the actual loss, but adding more samples and buckets can help make the data more accurate, and polling in SNMP can be tuned and monitored in a way so that you only send an alert based on a certain threshold. To see if using aggregate buckets and excluding a value range will exclude those samples would need testing which I have not done before or seen done before, I am not sure if it would go to a default bucket or not.

Sam

Mery · ‎03-27-2020

Thank you so much for your helpful support. So by using now two probes per bucket it lowered somehow the packet loss

Bucket started at 17:50:47 EDT Fri 27 March 2020 lasting 2min
Pkts sent: 120; Lost: 1 (0.8%); Corrupt: 0 (0.0%);
Misordered: 0 (0.0%); Duplicates: 0 (0.0%)
Result count: 2
Min: 0.000%, occurred at 17:50:47 EDT Fri 27 March 2020
Max: 1.667%, occurred at 17:51:47 EDT Fri 27 March 2020
Mean: 0.833%; StdDev: 0.833%; Overall: 0.833%

Bins:
Range Count **bleep**. Count Mean
----------- ---------- ---------- -----
0 to 30% 2 (100.0%) 2 (100.0%) 0.833%

30 to 60% 0 (0.0%) 2 (100.0%) -
60 to 100% 0 (0.0%) 2 (100.0%) -

This is my configuration for this:

profile SLM type cfm-synthetic-loss-measurement
probe
send burst every 1 minutes packet count 60 interval 1 seconds
priority 7
synthetic loss calculation packets 60
!
schedule
every 1 minutes for 1 minutes
!
statistics
measure one-way-loss-sd
aggregate bins 3 width 30
buckets size 2 probes
!
measure one-way-loss-ds
aggregate bins 3 width 30
buckets size 2 probes

If use more probes per bucket will it impact ASR9k performance? Based on your experience, is better to use more buckets or fewer buckets ?

smilstea · ‎03-30-2020

I honestly have not adjusted buckets to be large or to have a large amount of probes (buckets). I usually only have a handful of probes running.

So long as you keep within the scale limits of number of probes and pps etc then I don't think it will really matter whether you choose more buckets or fewer buckets, it will only impact the statistics / results you get.

Sam