cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1690
Views
15
Helpful
5
Replies

CSCuz48487 - Catalyst 3850 drops MacSec traffic: THIS BUG STILL AFFECTS OTHER CODE TRAINS

seraphi
Level 1
Level 1

I have exactly replicated the problem described in CSCuz48487.  I have two ASR1001-X routers, with a 3850 in the middle.

[ASR1001-X(L3 te0/0/1.101 WAN-MACSEC)]-----[(trunk) 3850 (trunk)]-----[(L3 te0/0/1.101 WAN-MACSEC)ASR1001-X]

I am using WAN MACSEC with dot1q encapsulation and tag-in-the-clear between the ASRs.
 

When MACSEC is enabled, I get packet loss on the 3850.  I can see IGR_MISC_FATAL_ERROR increase as the packets are lost (show plat fwd drop exceptions), just as described in CSCuz48487. 

If I simply turn MACSEC off between the ASRs, the packet loss resolves instantly. If I turn MACSEC back on, the packet loss  immediately returns.  This problem is very easy to reproduce.  

 

Hardware and software:
cisco ASR1001-X : asr1001x-universalk9.16.06.04.SPA.bin
WS-C3850-48P-S, Multiple IOS versions listed below. 

I have spent the time to try all of these code versions on my 3850. Here is the report:

16.0.9.02(Fuji - latest): Approximately 2%-8% packet loss occurs, just as described in CSCuz48487
16.0.6.05(Everest - latest): Approximately 2%-8% packet loss occurs, just as described in CSCuz48487
16.0.3.07(Denali - latest): Approximately 2%-8% packet loss occurs, just as described in CSCuz48487
3.7.5E(Catalyst - latest): Approximately 2%-8% packet loss occurs, just as described in CSCuz48487
3.6.9E(Catalyst - latest): NO PACKET LOSS OCCURS. The release notes specifically say that CSCuz48487 was fixed, and I can confirm that indeed it was. 

 

In my tests, the only thing I changed was the code version. Nothing else. 
 

The bug was identified in the 3.6.x train and was fixed. The bug is currently present and still affects all modern 16.x code trains, as well as 3.7.x. Currently, the only work around is to downgrade to 3.6.x. I really don't want to do this, but it is presently my only option. 

 

Please expand the scope of CSCuz48487 and re-open it.  My account does not have the ability to report this through TAC, so I am reporting it here and hoping for the best. 

5 Replies 5

nreisele
Level 1
Level 1

Can we get some eyes on this? I too have experienced this issue.

Really needs to be logged with TAC opened with BU and get them to provide a fix in release later than 3.6.6

I've observed this same behaviour with MACsec over AToM pseudowires with Cat9500s running IOS XE 16.9.3. I've raised a TAC case and will report back on the fix (presumably a code fix to port the XE 3.6.x fix into 16.9.x).

I've just closed the TAC case. Advice I received was my scenario is not the same as CSCuz48487 (although symptoms seem to be identical). A new Bug ID was created for me CSCvq85074 which has not yet been updated with the case notes, but the conclusion was Cleartag MACsec (DDTS id: CSCvg73574) was a feature enhancement introduced in 16.10 that resolves this behaviour. I requested the feature be back-ported into 16.9.x but this was deemed not feasible.

 

Resolution for me was to upgrade to recently released 16.12.1 and cross my fingers it's not too buggy :)

Priit Concrete
Level 1
Level 1

Did you ever find a solution to this?

My current theory is that this packet loss is caused by macsec replay protection window. Meaning that when your packets arrive out of order on the macsec link and you have macsec replay protection enabled (which is on by default with 0 window size), they will be dropped and it will increase the IGR_MISC_FATAL_ERROR counter.

There could be other reasons as well, but from my testing, it seems this can be one of the causes.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card