02-27-2015 03:05 PM
Asking all our customers.
Looking for some other examples where we had an issues on the a9k where we might have some gap in system resiliency. Looking for issues where we might have seen a HW or SW issue and the router didn't handle it in a way you would have preferred to allow network redundancy to take over. Please provide as much information about the situation as you can. As well as what behavior you believe should take place? If you have SRs or bug IDs please include as well.
Here are some examples of the past that we have already addressed.
CSCuc04493 - Disable LC interfaces if online-diags reports datapath error
This allows us to shutdown ports where datapath errors occur to allow network redundancy to kick in.
CSCun00493 - Need recovery mechanism for Punt/FPGA CRC errors in RSP440
This has the RSP perform a failover or reload when it loses communication to the fabric.
Thanks,
Bryan Garland
Bryan Garland CCIE#1942
Technical Leader, Engineering
HERO BU- Deployment & Escalation
03-01-2015 03:28 AM
Hi Bryan,
I see the NP performance as an area for improvements. There are certain NP lock conditions where automatic action is taken to recover from this situation, but there is no such thing when it comes to an NP overload scenario. There are certain NP counters which indicate an NP performance overload, but it is cumbersome for the customers to monitor these values. So it would be nice to at least have logging entries if the NP is overloaded or the ability to have actions taken. A typical example would be netflow, which is intensive for the NP because it needs to create frame copies for netflow. There is a punt policer that protects the line card CPU, but with very low sampling rates and small packets at a high rate the NP might get overloaded, leading to intermitting packet loss (rare but possible).
Cheers,
Florian
03-01-2015 11:50 AM
Florian,
Thanks for the feedback. This is indeed an area that we can probably do some work.
Thanks,
Bryan Garland
Bryan Garland CCIE#1942
Technical Leader, Engineering
HERO BU- Deployment & Escalation
03-04-2015 05:56 AM
Any other feedback?
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide