I have exactly replicated the problem described in CSCuz48487. I have two ASR1001-X routers, with a 3850 in the middle.
[ASR1001-X(L3 te0/0/1.101 WAN-MACSEC)]-----[(trunk) 3850 (trunk)]-----[(L3 te0/0/1.101 WAN-MACSEC)ASR1001-X]
I am using WAN MACSEC with dot1q encapsulation and tag-in-the-clear between the ASRs.
When MACSEC is enabled, I get packet loss on the 3850. I can see IGR_MISC_FATAL_ERROR increase as the packets are lost (show plat fwd drop exceptions), just as described in CSCuz48487.
If I simply turn MACSEC off between the ASRs, the packet loss resolves instantly. If I turn MACSEC back on, the packet loss immediately returns. This problem is very easy to reproduce.
Hardware and software:
cisco ASR1001-X : asr1001x-universalk9.16.06.04.SPA.bin
WS-C3850-48P-S, Multiple IOS versions listed below.
16.0.9.02(Fuji - latest): Approximately 2%-8% packet loss occurs, just as described in CSCuz48487
16.0.6.05(Everest - latest): Approximately 2%-8% packet loss occurs, just as described in CSCuz48487
16.0.3.07(Denali - latest): Approximately 2%-8% packet loss occurs, just as described in CSCuz48487
3.7.5E(Catalyst - latest): Approximately 2%-8% packet loss occurs, just as described in CSCuz48487
3.6.9E(Catalyst - latest): NO PACKET LOSS OCCURS. The release notes specifically say that CSCuz48487 was fixed, and I can confirm that indeed it was.
In my tests, the only thing I changed was the code version. Nothing else.
The bug was identified in the 3.6.x train and was fixed. The bug is currently present and still affects all modern 16.x code trains, as well as 3.7.x. Currently, the only work around is to downgrade to 3.6.x. I really don't want to do this, but it is presently my only option.
Please expand the scope of CSCuz48487 and re-open it. My account does not have the ability to report this through TAC, so I am reporting it here and hoping for the best.
I've observed this same behaviour with MACsec over AToM pseudowires with Cat9500s running IOS XE 16.9.3. I've raised a TAC case and will report back on the fix (presumably a code fix to port the XE 3.6.x fix into 16.9.x).
I've just closed the TAC case. Advice I received was my scenario is not the same as CSCuz48487 (although symptoms seem to be identical). A new Bug ID was created for me CSCvq85074 which has not yet been updated with the case notes, but the conclusion was Cleartag MACsec (DDTS id: CSCvg73574) was a feature enhancement introduced in 16.10 that resolves this behaviour. I requested the feature be back-ported into 16.9.x but this was deemed not feasible.
Resolution for me was to upgrade to recently released 16.12.1 and cross my fingers it's not too buggy :)