05-26-2022 09:33 AM
Hi,
We have a couple of 9500's running 16.12.05b using stack wise virtual. The config is pretty standard, with SSM configured and multiple SVIs running IGMP version 3. The issue we have is a very high level of (legitimate) multicast traffic for which the 9500's is the IGMP querying router.
There lies the problem (I think) we have approx 20% packet loss with SSH and pings to the 9500 due to levels of multicast traffic hitting the control plane policing.
If I take the multicast of the network, everything is fine. When it returns the switch drops approx 20% of traffic matching the "forus" queue. I have tried limiting the multicast on the SVI (which did nothing), along with modifying the default MCAST apps rates.
Any Ideas ?
Regards Jon
Output below shows the queues and the rates.
C9500-VSS#$show platform hardware fed switch active qos queue stats internal cpu policer CPU Queue Statistics ============================================================================================ (default) (set) Queue Queue QId PlcIdx Queue Name Enabled Rate Rate Drop(Bytes) Drop(Frames) -------------------------------------------------------------------------------------------- 0 11 DOT1X Auth Yes 1000 1000 0 0 1 1 L2 Control Yes 2000 2000 0 0 2 14 Forus traffic Yes 4000 4000 0 0 3 0 ICMP GEN Yes 600 600 0 0 4 2 Routing Control Yes 5400 5400 0 0 5 14 Forus Address resolution Yes 4000 4000 0 0 6 0 ICMP Redirect Yes 600 600 0 0 7 16 Inter FED Traffic Yes 2000 2000 0 0 8 4 L2 LVX Cont Pack Yes 1000 1000 0 0 9 19 EWLC Control Yes 13000 13000 0 0 10 16 EWLC Data Yes 2000 2000 0 0 11 13 L2 LVX Data Pack Yes 1000 1000 0 0 12 0 BROADCAST Yes 600 600 0 0 13 10 Openflow Yes 200 200 0 0 14 13 Sw forwarding Yes 1000 1000 0 0 15 8 Topology Control Yes 13000 13000 0 0 16 12 Proto Snooping Yes 2000 2000 0 0 17 6 DHCP Snooping Yes 400 400 0 0 18 13 Transit Traffic Yes 1000 1000 0 0 19 10 RPF Failed Yes 200 200 0 0 20 15 MCAST END STATION Yes 2000 2000 0 0 21 13 LOGGING Yes 1000 1000 0 0 22 7 Punt Webauth Yes 1000 1000 0 0 23 18 High Rate App Yes 13000 13000 0 0 24 10 Exception Yes 200 200 0 0 25 3 System Critical Yes 1000 1000 0 0 26 10 NFL SAMPLED DATA Yes 200 200 0 0 27 2 Low Latency Yes 5400 5400 0 0 28 10 EGR Exception Yes 200 200 0 0 29 5 Stackwise Virtual OOB Yes 8000 8000 0 0 30 9 MCAST Data Yes 400 400 292653487907 307048832 31 3 Gold Pkt Yes 1000 1000 0 0 * NOTE: CPU queue policer rates are configured to the closest hardware supported value CPU Queue Policer Statistics ==================================================================== Policer Policer Accept Policer Accept Policer Drop Policer Drop Index Bytes Frames Bytes Frames ------------------------------------------------------------------- 0 97780 739 0 0 1 320275 2187 0 0 2 36290 276 0 0 3 2057548 3850 0 0 4 0 0 0 0 5 11671943 62673 0 0 6 0 0 0 0 7 0 0 0 0 8 0 0 0 0 9 649682535 684964 292653502933 307048847 10 4533408 7042 0 0 11 0 0 0 0 12 0 0 0 0 13 17366 219 0 0 14 799091 9401 0 0 15 621566 7151 0 0 16 0 0 0 0 17 0 0 0 0 18 0 0 0 0 19 0 0 0 0 Second Level Policer Statistics ==================================================================== 20 356565 2463 0 0 21 655686365 709447 0 0 Policer Index Mapping and Settings -------------------------------------------------------------------- level-2 : level-1 (default) (set) PlcIndex : PlcIndex rate rate -------------------------------------------------------------------- 20 : 1 2 8 13000 13000 21 : 0 4 7 9 10 11 12 13 14 15 6000 6000 ==================================================================== Second Level Policer Config ==================================================================== level-1 level-2 level-2 QId PlcIdx PlcIdx Queue Name Enabled -------------------------------------------------------------------- 0 11 21 DOT1X Auth Yes 1 1 20 L2 Control Yes 2 14 21 Forus traffic Yes 3 0 21 ICMP GEN Yes 4 2 20 Routing Control Yes 5 14 21 Forus Address resolution Yes 6 0 21 ICMP Redirect Yes 7 16 - Inter FED Traffic No 8 4 21 L2 LVX Cont Pack Yes 9 19 - EWLC Control No 10 16 - EWLC Data No 11 13 21 L2 LVX Data Pack Yes 12 0 21 BROADCAST Yes 13 10 21 Openflow Yes 14 13 21 Sw forwarding Yes 15 8 20 Topology Control Yes 16 12 21 Proto Snooping Yes 17 6 - DHCP Snooping No 18 13 21 Transit Traffic Yes 19 10 21 RPF Failed Yes 20 15 21 MCAST END STATION Yes 21 13 21 LOGGING Yes 22 7 21 Punt Webauth Yes 23 18 - High Rate App No 24 10 21 Exception Yes 25 3 - System Critical No 26 10 21 NFL SAMPLED DATA Yes 27 2 20 Low Latency Yes 28 10 21 EGR Exception Yes 29 5 - Stackwise Virtual OOB No 30 9 21 MCAST Data Yes 31 3 - Gold Pkt No CPP Classes to queue map ====================================================================================== PlcIdx CPP Class : Queues -------------------------------------------------------------------------------------- 0 system-cpp-police-data : ICMP GEN/ BROADCAST/ ICMP Redirect/ 10 system-cpp-police-sys-data : Openflow/ Exception/ EGR Exception/ NFL SAMPLED DATA/ RPF Failed/ 13 system-cpp-police-sw-forward : Sw forwarding/ LOGGING/ L2 LVX Data Pack/ Transit Traffic/ 9 system-cpp-police-multicast : MCAST Data/ 15 system-cpp-police-multicast-end-station : MCAST END STATION / 7 system-cpp-police-punt-webauth : Punt Webauth/ 1 system-cpp-police-l2-control : L2 Control/ 2 system-cpp-police-routing-control : Routing Control/ Low Latency/ 3 system-cpp-police-system-critical : System Critical/ Gold Pkt/ 4 system-cpp-police-l2lvx-control : L2 LVX Cont Pack/ 8 system-cpp-police-topology-control : Topology Control/ 11 system-cpp-police-dot1x-auth : DOT1X Auth/ 12 system-cpp-police-protocol-snooping : Proto Snooping/ 6 system-cpp-police-dhcp-snooping : DHCP Snooping/ 14 system-cpp-police-forus : Forus Address resolution/ Forus traffic/ 5 system-cpp-police-stackwise-virt-control : Stackwise Virtual OOB/ 16 system-cpp-default : Inter FED Traffic/ EWLC Data/ 18 system-cpp-police-high-rate-app : High Rate App/ 19 system-cpp-police-ewlc-control : EWLC Control/ 20 system-cpp-police-ios-routing : L2 Control/ Topology Control/ Routing Control/ Low Latency/ 21 system-cpp-police-ios-feature : ICMP GEN/ BROADCAST/ ICMP Redirect/ L2 LVX Cont Pack/ Proto Snooping/ Punt Webauth/ MCAST Data/ Transit Traffic/ DOT1X Auth/ Sw forwarding/ LOGGING/ L2 LVX Data Pack/ Forus traffic/ Forus Address resolution/ MCAST END STATION / Openflow/ Exception/ EGR Exception/ NFL SAMPLED DATA/ RPF Failed/
Solved! Go to Solution.
05-26-2022 11:39 AM - edited 05-26-2022 11:42 AM
Hello Jon,
IGMP traffic should be accounted as queue 15
queue 9 should be used only when needed: during the time to program the FPGA.
The platform mght not be the best solution for multicast
You should also consider :
are your multicast flows short living ? if very short living the test might not reflect waht happens to a stream
how many changes happens in the population of receivers?
For special events like a CEO session ASM can be good.
For video confering or video sorvelliance Bidirectional PIM
You can use VideoLAN VLC player as a multicast source
Warning : the default TTL should be one so you need to rise it to a value like 32. Then play a movie and go around looking to it with a lsptop or smartphone..
Take in account that all Windows devices and Apple Bonjour are very noise creating a lot of multicast streams in scoped range 239.255.255.x
Finally a divide and conquer strategy can help:
your 2 x Cat9500 SVL is likely a collapsed core / distribution node.
in the core you can and you should disable igmp snooping
i,e, configure IGMP snooping on access layer switches your device should perform only inter VLAN mutlicast routing . This is a key point for scalabilty.
Hope to help
Giuseppe
05-26-2022 09:51 AM
Hello @joneaton ,
with IGMPv3 and SSM multicast IGMP packets match be examined to find the grooup G and an include or exclude directive about the source , this deep packet inspection means that all IGMP reports are examined = punted to main CPU.
9 system-cpp-police-multicast : MCAST Data/ 15 system-cpp-police-multicast-end-station : MCAST END STATION /
and:
* NOTE: CPU queue policer rates are configured to the closest hardware supported value CPU Queue Policer Statistics ==================================================================== Policer Policer Accept Policer Accept Policer Drop Policer Drop Index Bytes Frames Bytes Frames ------------------------------------------------------------------- 0 97780 739 0 0 1 320275 2187 0 0 2 36290 276 0 0 3 2057548 3850 0 0 4 0 0 0 0 5 11671943 62673 0 0 6 0 0 0 0 7 0 0 0 0 8 0 0 0 0 9 649682535 684964 292653502933 307048847 10 4533408 7042 0 0
But only Queue 9 multicast data has losses on it and it is labelled multicast data.
Multicast forwarding happens in hardware by programming the TCAM having so high rates of CPP = protection of supervisor makes me thinking of :
possible bad interaction with the device tracking system or a software bug
I would recommend either to open a Cisco TAC service request and/or to plan a software upgtrade
Hope to help
Giuseppe
05-26-2022 11:18 AM
Many thanks for the quick reply. I understand what you have highlighted, and will look to raise a TAC case. I've tried so many things to tweak or reduce the multicast hitting the processor, I'm out of other ideas.
Regards
Jon.
05-26-2022 11:39 AM - edited 05-26-2022 11:42 AM
Hello Jon,
IGMP traffic should be accounted as queue 15
queue 9 should be used only when needed: during the time to program the FPGA.
The platform mght not be the best solution for multicast
You should also consider :
are your multicast flows short living ? if very short living the test might not reflect waht happens to a stream
how many changes happens in the population of receivers?
For special events like a CEO session ASM can be good.
For video confering or video sorvelliance Bidirectional PIM
You can use VideoLAN VLC player as a multicast source
Warning : the default TTL should be one so you need to rise it to a value like 32. Then play a movie and go around looking to it with a lsptop or smartphone..
Take in account that all Windows devices and Apple Bonjour are very noise creating a lot of multicast streams in scoped range 239.255.255.x
Finally a divide and conquer strategy can help:
your 2 x Cat9500 SVL is likely a collapsed core / distribution node.
in the core you can and you should disable igmp snooping
i,e, configure IGMP snooping on access layer switches your device should perform only inter VLAN mutlicast routing . This is a key point for scalabilty.
Hope to help
Giuseppe
05-27-2022 03:43 AM
Hi Giuseppe,
Some good points for me to work on there, many thanks.
The multicast is multiple sources of 4K video and audio streams between multiple rooms with receivers showing the content in lecture theatres. The streams are pretty much constant for extended durations without much activity with leaving joining streams etc. Once they are receiving, they can be like that for days as the equipment tends to be left running. I'm working on the users to manage this better, but that is very much a staffing management issue rather than technical. Combined, the streams add up to approx 7.5Gbps of traffic which of course all hits the Designated Router etc etc.
I will organise a change window where I can try to divide and conquer, but I think disabling the IGMP snooping is a very good point I'd not thought of on the VSS in this collapsed core environment.
Once I have the change window organised, I'll report back my progress which hopefully will be positive and may prove useful to others in the community.
Many Thanks, Jon.
05-26-2022 06:53 PM - last edited on 06-27-2022 10:25 AM by Translator
Can I see the complete output for the following commands:
sh platform resources
sh platform software status con brief
05-27-2022 03:32 AM
Hi Leo,
Output from commands as requested.
C9500-VSS#sh platform resources **State Acronym: H - Healthy, W - Warning, C - Critical Resource Usage Max Warning Critical State ---------------------------------------------------------------------------------------------------- Control Processor 2.37% 100% 90% 95% H DRAM 2556MB(16%) 15634MB 90% 95% H TMPFS 171MB(1%) 15634MB 40% 50% H C9500-VSS# C9500-VSS#sh platform software status control-processor brief Load Average Slot Status 1-Min 5-Min 15-Min 1-RP0 Healthy 0.04 0.10 0.09 2-RP0 Healthy 0.08 0.08 0.06 Memory (kB) Slot Status Total Used (Pct) Free (Pct) Committed (Pct) 1-RP0 Healthy 16010152 2616544 (16%) 13393608 (84%) 3132472 (20%) 2-RP0 Healthy 16010152 2564152 (16%) 13446000 (84%) 3112688 (19%) CPU Utilization Slot CPU User System Nice Idle IRQ SIRQ IOwait 1-RP0 0 3.00 0.50 0.00 96.50 0.00 0.00 0.00 1 2.80 0.80 0.00 96.40 0.00 0.00 0.00 2 3.59 0.69 0.00 95.70 0.00 0.00 0.00 3 2.60 0.30 0.00 97.10 0.00 0.00 0.00 4 0.59 0.99 0.00 98.40 0.00 0.00 0.00 5 1.80 0.40 0.00 97.80 0.00 0.00 0.00 6 1.70 0.50 0.00 97.80 0.00 0.00 0.00 7 1.40 0.50 0.00 98.10 0.00 0.00 0.00 2-RP0 0 0.40 0.20 0.00 99.39 0.00 0.00 0.00 1 0.50 0.30 0.00 99.19 0.00 0.00 0.00 2 0.60 0.30 0.00 99.10 0.00 0.00 0.00 3 0.40 0.10 0.00 99.49 0.00 0.00 0.00 4 0.30 0.10 0.00 99.60 0.00 0.00 0.00 5 0.59 0.00 0.00 99.40 0.00 0.00 0.00 6 0.00 0.00 0.00 100.00 0.00 0.00 0.00 7 0.09 0.00 0.00 99.90 0.00 0.00 0.00 C9500-VSS#
Many Thanks, Jon
06-27-2022 08:04 AM
Moving the IGMP snooping helped reduce the CPU load and COPP drops.
It still isn't ideal, and appears that the maintenance contract isn't in place meaning not TAC case unfortunately.
Overall though, the device is much more responsive.
Many thanks.. Jon
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide