Solved: Re: NCS540 is not passing the benchmark 10G RFC test

DanielGutierrez615 · ‎08-23-2022

NCS540 is not passing the benchmark 10G RFC test. The Benchmark test is failing all frame sizes. However, if I bypass the NCS540 network. I get not issues. I posted the Benchmark tests. One of them passes fine when I bypass the NCS540 devices. The other fails when I run the test thru the NCS540. Is there a bad configuration that could be causing this? both devices are currently on version 7.5.2/ncs540/ws. I also had the same issue on 7.3.2. I'm only able to get a good benchmark test but only 8.5G which is bad. I have been reading the release notes, and they do not mention this issue. I have been troubleshooting this for over a month but no luck. Any ideas?

smilstea · ‎09-01-2022

Yep its as I described. On maitland 540 you are seeing ingress drops due to the increased overhead in packet size, while your hundredgige links handle it no problem the system (disposition/maitland) counts the packets with +22B before the disposition process happens and we can see the dropped packets on TC 0 for the interface. Can you try applying the workaround with a 10G shaper to the ingress interface and the user-defined 22 on maitland?

Sam

View solution in original post

smilstea · ‎08-24-2022

This looks like an RFC2544 test?

Are you using L2VPN over the 540s? It could be due to conditional L2VPN qos exp marking, if you do a 'set cos x' and 'rewrite ingress tag pop 1 symmetric' on ingress to the 540 that may solve your problem. I have seen that solve the issue before.

What QoS is applied on the 540s?

In addition there are some additional things that may be needed if you are trying for linerate such as overhead accounting or modifying the queue-limit on the qos policy-map.

https://www.cisco.com/c/en/us/td/docs/iosxr/ncs5xx/qos/74x/b-qos-cg-74x-ncs540/m-config-mod-qos-congestion-mngmnt-ncs5xx.html#concept_ibq_hzp_25b

TAC has seen a number of these cases, if none of these quick solutions help I recommend opening a TAC SR with your 540 configs, software version, and test case (speed to be run at 100mbps or 10gbps etc) and they should be able to guide you on how to tune the box properly.

Thanks,

Sam

DanielGutierrez615 · ‎08-24-2022

This looks like an RFC2544 test? yes it is a RFC test

Are you using L2VPN over the 540s? Yes I'm using l2vpn evpn

It could be due to conditional L2VPN qos exp marking, if you do a 'set cos x' and 'rewrite ingress tag pop 1 symmetric' on ingress to the 540 that may solve your problem. I have seen that solve the issue before.

What QoS is applied on the 540s? I have not applied any QoS settings yet, but his is just a lab not in production yet. I would think it will pass if there aren't any traffic on this boxes. I forgot to mention that the RFC test passes if it's 8.5G or below.

smilstea · ‎08-24-2022

Like I mentioned there used to be an issue with conditional l2vpn qos exp marking, I am not positive if that got fixed. Can you try the workaround to see if that resolves the issue? Otherwise I would advise opening a TAC case to get a webex session to triage this live. Somewhere the overhead is not being accounted for properly and its leading to loss or apparent loss.

Sam

DanielGutierrez615 · ‎08-27-2022

I opened a TAC case with Cisco. They are reviewing the configuration. one thing I noticed is that if I run this test from the same box using bridge group the RFC test passes fine. I also notice that it is having a hard time processing frames from 64 to 1500, but not jumbo frames. I'm still reaching a lot of articles about this issue, but any ideas you guys can provide will be appreciated.

smilstea · ‎08-29-2022

You mentioned it working as a bridge domain, is that as opposed to an xconnect?
If I recall the times I've worked rfc and snake tests xconnect works better because there is no mac learning and flooding like with a bridge domain even with only 2 ports. With an xconnect the traffic has one entry and one exit and no issues with learn and flood.

I suggest pushing tac for a webex to try different configurations they have found. There are many examples of people opening tac cases for this exact scenario. Unfortunately there are multiple solutions for this so we need to design test and validate your individual network.

Sam

DanielGutierrez615 · ‎08-29-2022

Actually according to the RFC with the bridge domain works perfect. using xconnect gives me issues as well as L2vpn evpn. It's a little bit better, but it still give me issues. I already open a tac case with cisco. Unfortunately they do not have a solution yet. They said they are looking at the configs. Look at the RFC test. Jumbo frames don't give me any isssues only frame sizes from 64 to 1518.

smilstea · ‎09-01-2022

I reviewed the case and it looks like it is progressing. I wanted to see if you could try the below, it explains why you can't send 10G and where the drops are happening. In the example 1G is sent, but with the overhead of EVPN and you wanting to send closer to 10G that could be tweaked. Please see attached topology screenshot.

On NCS540 and NCS5500, we need to take into account of accounting overhead issue.

When we generate the traffic under 1Gbps, and inject it into imposition PE NCS540-1, since it is EVPN L2VPN solution, so when traffic leave NCS540-1 Tenge port, each packet has 22bytes outer header ( 18 bytes outter ethernet header + 4 bytes EVPN VPN labels, NCS540-1 and NCS540-2 is connected back to back, so LSP label is implicit null), so the traffic rate is actually exceeding 1Gbps, however the core facing interface is 10GE, so we don't see drop in imposition PE here.

When the traffic arriving at NCS540-2 TenGE port, the traffic rate is also exceeding 1Gbps ( each packet has 22 bytes overhead), NCS540-2 checking the internal 1Gbps shaping/police before packet disposition action, so each packet size is still 1540 bytes ( instead of 1518), traffic rate actually exceed 1Gbps, so traffic getting dropped for VoQ taildrop reason.

After we enable accounting overhead “account user-defined 22”, so this 1Gbps internal shape/police will not count those 22 bytes overhead, then we can prevent such drop.On other platform such as ASR9k and NCS6K, such internal shape/police is done after packet disposition, so the packet size is already changed back to 1518 byte, so we don’t see this kind of issue.

Policy-map TEST
  Class class-default
    Shape average 1 gbps
!

Inter GigabitEthernet0/0/0/3
  Service-policy output TEST account user-defined 22

Sam

jm-barreto · ‎01-20-2023

Hi im having a similar problem. I have in a lab 2 540 connected with a 100g link running mpls/ospf and doing a l2vpn xconnect. My rfc test keep failing with frame loss and only reach 800mb of bandwidth in the 1g port facing the test equipment. My question is where i apply the service policy? In the interface facing the test equipment or in the 100g mpls port?

Thanks in advance

smilstea · ‎01-20-2023

Hi, if you take a look at the screenshot with the config (https://community.cisco.com/kxiwq67737/attachments/kxiwq67737/5996-discussions-xr-os-and-platforms/13347/1/Screen%20Shot%202019-07-24%20at%2011.06.58.png) the loss happens on the disposition router (egress PE) and the policy-map is set on the CE facing interface.

Sam

jm-barreto · ‎01-23-2023

Hi Sam

I try the workaround on a 1GB test and is work perfectly. I try it with a untag and tag sub interface and the rfc test pass.

But i try do the same test on 10GB port with the policy-map shape at 10 gbps but the rfc test fail. Is there something else i can do?

smilstea · ‎01-23-2023

10G core and access ports? Have you tried without the user-defined 22B?

Sam

jm-barreto · ‎01-23-2023

Hi Sam

Thank you for your time and help.

I got the 2, 540 connected back to back with a 100Gb mpls/ospf link. And im using 2 testing equipment acting as CE (Exfo ftb-1).

As I mentioned before the test with 1gb interface pass without problem on a tag and untag sub-interface.

But when i try to do the same test but using 10Gb the test fail on a untag and tag scenario.

To answer your question, no i haven't try without the user-defined 22. And i run the test on both, tag and untag interface.

smilstea · ‎01-24-2023

The link speed change shouldn't be making a difference as its the same step-down. I recommend opening a TAC case to review your setup to see what might be missing or changed.

DanielGutierrez615 · ‎09-01-2022

I agree what you are saying here, however I have 100G transport link with both interfaces set to 9600 MTU settings In theory it should be able to handle the traffic thru that link. I also replace the QSFP and fibers just to rule out any layer 1 issues. Prior of running the RFC tests I clear the interfaces counters so I can look for any drops anywhere. I didn't see any drops see the attached imaged. The device that is generating the traffic is sending bytes of sizes 64,128,512,1024,1289,1418,9000. its MTU size is set to 9000

The RFC Test consist in the following parameters Throughput settings:
Trial duration : 60 secs
Maximum rate : 10000 Mbps
Minimum rate : 1 Mbps
Step size : 5 Mbps
Frame loss : 0.0%
Fine stepping : False
Binary duration : 2 secs
Frame sizes : 64 128 256 512 1024 1280 1518 9000 Bytes