cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2668
Views
5
Helpful
7
Replies

9500 high packet loss due to CPP and high volume of multicast traffic.

joneaton
Level 1
Level 1

Hi,

 

We have a couple of 9500's running 16.12.05b using stack wise virtual. The config is pretty standard, with SSM configured and multiple SVIs running IGMP version 3. The issue we have is a very high level of (legitimate) multicast traffic for which the 9500's is the IGMP querying router.

 

There lies the problem (I think) we have approx 20% packet loss with SSH and pings to the 9500 due to levels of multicast traffic hitting the control plane policing.

 

If I take the multicast of the network, everything is fine. When it returns the switch drops approx 20% of traffic matching the "forus" queue. I have tried limiting the multicast on the SVI (which did nothing), along with modifying the default MCAST apps rates.

 

Any Ideas ?

 

Regards Jon

 

Output below shows the queues and the rates.

  

C9500-VSS#$show platform hardware fed switch active qos queue stats internal cpu policer 

                         CPU Queue Statistics                  
============================================================================================
                                              (default) (set)     Queue        Queue
QId PlcIdx  Queue Name                Enabled   Rate     Rate      Drop(Bytes)  Drop(Frames)
--------------------------------------------------------------------------------------------
0    11     DOT1X Auth                  Yes     1000      1000     0            0          
1    1      L2 Control                  Yes     2000      2000     0            0          
2    14     Forus traffic               Yes     4000      4000     0            0          
3    0      ICMP GEN                    Yes     600       600      0            0          
4    2      Routing Control             Yes     5400      5400     0            0          
5    14     Forus Address resolution    Yes     4000      4000     0            0          
6    0      ICMP Redirect               Yes     600       600      0            0          
7    16     Inter FED Traffic           Yes     2000      2000     0            0          
8    4      L2 LVX Cont Pack            Yes     1000      1000     0            0          
9    19     EWLC Control                Yes     13000     13000    0            0          
10   16     EWLC Data                   Yes     2000      2000     0            0          
11   13     L2 LVX Data Pack            Yes     1000      1000     0            0          
12   0      BROADCAST                   Yes     600       600      0            0          
13   10     Openflow                    Yes     200       200      0            0          
14   13     Sw forwarding               Yes     1000      1000     0            0          
15   8      Topology Control            Yes     13000     13000    0            0          
16   12     Proto Snooping              Yes     2000      2000     0            0          
17   6      DHCP Snooping               Yes     400       400      0            0          
18   13     Transit Traffic             Yes     1000      1000     0            0          
19   10     RPF Failed                  Yes     200       200      0            0          
20   15     MCAST END STATION           Yes     2000      2000     0            0          
21   13     LOGGING                     Yes     1000      1000     0            0          
22   7      Punt Webauth                Yes     1000      1000     0            0          
23   18     High Rate App               Yes     13000     13000    0            0          
24   10     Exception                   Yes     200       200      0            0          
25   3      System Critical             Yes     1000      1000     0            0          
26   10     NFL SAMPLED DATA            Yes     200       200      0            0          
27   2      Low Latency                 Yes     5400      5400     0            0          
28   10     EGR Exception               Yes     200       200      0            0          
29   5      Stackwise Virtual OOB       Yes     8000      8000     0            0          
30   9      MCAST Data                  Yes     400       400      292653487907  307048832  
31   3      Gold Pkt                    Yes     1000      1000     0            0          

* NOTE: CPU queue policer rates are configured to the closest hardware supported value

                      CPU Queue Policer Statistics               
====================================================================
Policer    Policer Accept   Policer Accept  Policer Drop  Policer Drop
  Index         Bytes          Frames        Bytes          Frames
-------------------------------------------------------------------
0          97780            739             0             0          
1          320275           2187            0             0          
2          36290            276             0             0          
3          2057548          3850            0             0          
4          0                0               0             0          
5          11671943         62673           0             0          
6          0                0               0             0          
7          0                0               0             0          
8          0                0               0             0          
9          649682535        684964          292653502933   307048847  
10         4533408          7042            0             0          
11         0                0               0             0          
12         0                0               0             0          
13         17366            219             0             0          
14         799091           9401            0             0          
15         621566           7151            0             0          
16         0                0               0             0          
17         0                0               0             0          
18         0                0               0             0          
19         0                0               0             0          

                  Second Level Policer Statistics             
====================================================================
20         356565           2463            0             0          
21         655686365        709447          0             0          

Policer Index Mapping and Settings
--------------------------------------------------------------------
level-2   :   level-1                      (default)   (set) 
PlcIndex  :   PlcIndex                       rate      rate 
--------------------------------------------------------------------
20        :   1  2  8                        13000     13000
21        :   0 4 7 9 10 11 12 13 14 15      6000      6000
====================================================================

               Second Level Policer Config                        
====================================================================
    level-1 level-2                            level-2
QId PlcIdx  PlcIdx  Queue Name                 Enabled
--------------------------------------------------------------------
0    11      21      DOT1X Auth                  Yes
1    1       20      L2 Control                  Yes
2    14      21      Forus traffic               Yes
3    0       21      ICMP GEN                    Yes
4    2       20      Routing Control             Yes
5    14      21      Forus Address resolution    Yes
6    0       21      ICMP Redirect               Yes
7    16      -       Inter FED Traffic           No 
8    4       21      L2 LVX Cont Pack            Yes
9    19      -       EWLC Control                No 
10   16      -       EWLC Data                   No 
11   13      21      L2 LVX Data Pack            Yes
12   0       21      BROADCAST                   Yes
13   10      21      Openflow                    Yes
14   13      21      Sw forwarding               Yes
15   8       20      Topology Control            Yes
16   12      21      Proto Snooping              Yes
17   6       -       DHCP Snooping               No 
18   13      21      Transit Traffic             Yes
19   10      21      RPF Failed                  Yes
20   15      21      MCAST END STATION           Yes
21   13      21      LOGGING                     Yes
22   7       21      Punt Webauth                Yes
23   18      -       High Rate App               No 
24   10      21      Exception                   Yes
25   3       -       System Critical             No 
26   10      21      NFL SAMPLED DATA            Yes
27   2       20      Low Latency                 Yes
28   10      21      EGR Exception               Yes
29   5       -       Stackwise Virtual OOB       No 
30   9       21      MCAST Data                  Yes
31   3       -       Gold Pkt                    No 

                        CPP Classes to queue map 
======================================================================================
PlcIdx CPP Class                                :  Queues
--------------------------------------------------------------------------------------
0      system-cpp-police-data                   :  ICMP GEN/ BROADCAST/ ICMP Redirect/ 
10     system-cpp-police-sys-data               :  Openflow/ Exception/ EGR Exception/ NFL SAMPLED DATA/ RPF Failed/ 
13     system-cpp-police-sw-forward             :  Sw forwarding/ LOGGING/ L2 LVX Data Pack/ Transit Traffic/ 
9      system-cpp-police-multicast              :  MCAST Data/ 
15     system-cpp-police-multicast-end-station  :  MCAST END STATION / 
7      system-cpp-police-punt-webauth           :  Punt Webauth/ 
1      system-cpp-police-l2-control             :  L2 Control/ 
2      system-cpp-police-routing-control        :  Routing Control/ Low Latency/ 
3      system-cpp-police-system-critical        :  System Critical/ Gold Pkt/ 
4      system-cpp-police-l2lvx-control          :  L2 LVX Cont Pack/ 
8      system-cpp-police-topology-control       :  Topology Control/ 
11     system-cpp-police-dot1x-auth             :  DOT1X Auth/ 
12     system-cpp-police-protocol-snooping      :  Proto Snooping/ 
6      system-cpp-police-dhcp-snooping          :  DHCP Snooping/ 
14     system-cpp-police-forus                  :  Forus Address resolution/ Forus traffic/ 
5      system-cpp-police-stackwise-virt-control :  Stackwise Virtual OOB/ 
16     system-cpp-default                       :  Inter FED Traffic/ EWLC Data/ 
18     system-cpp-police-high-rate-app          :  High Rate App/ 
19     system-cpp-police-ewlc-control           :  EWLC Control/ 
20     system-cpp-police-ios-routing            :  L2 Control/ Topology Control/ Routing Control/ Low Latency/ 
21     system-cpp-police-ios-feature            :  ICMP GEN/ BROADCAST/ ICMP Redirect/ L2 LVX Cont Pack/ Proto Snooping/ Punt Webauth/ MCAST Data/ Transit Traffic/ DOT1X Auth/ Sw forwarding/ LOGGING/ L2 LVX Data Pack/ Forus traffic/ Forus Address resolution/ MCAST END STATION / Openflow/ Exception/ EGR Exception/ NFL SAMPLED DATA/ RPF Failed/ 

 

1 Accepted Solution

Accepted Solutions

Hello Jon,

IGMP traffic should be accounted as queue 15

queue 9 should be used only when needed: during the time to program the FPGA.

The platform mght not be the best solution for multicast

 

You should also consider :

are your multicast flows short living ? if very short living the test might not reflect waht happens to a stream

how many changes happens in the population of receivers?

 

For special events like a CEO session ASM can be good.

For video confering or video sorvelliance Bidirectional PIM

You can use VideoLAN VLC player as a multicast source

Warning : the default TTL should be one so you need to rise it to a value like 32. Then play a movie and go around looking to it with a lsptop or smartphone..

Take in account that all Windows devices and Apple Bonjour are very noise  creating a lot of multicast streams in scoped range 239.255.255.x

 

Finally a divide and conquer strategy can help:

your  2 x Cat9500 SVL  is likely a collapsed core / distribution node.

in the core you can and you should disable igmp snooping

i,e, configure IGMP snooping on access layer switches your device should perform only inter VLAN mutlicast routing . This is a key point for scalabilty.

 

Hope to help

Giuseppe

 

View solution in original post

7 Replies 7

Giuseppe Larosa
Hall of Fame
Hall of Fame

Hello @joneaton ,

with IGMPv3 and SSM multicast IGMP packets match be examined to find the grooup G and an include or exclude directive about the source , this deep packet inspection means that all IGMP reports are examined = punted to main CPU.

 

9      system-cpp-police-multicast              :  MCAST Data/ 
15     system-cpp-police-multicast-end-station  :  MCAST END STATION /

and:

* NOTE: CPU queue policer rates are configured to the closest hardware supported value

                      CPU Queue Policer Statistics               
====================================================================
Policer    Policer Accept   Policer Accept  Policer Drop  Policer Drop
  Index         Bytes          Frames        Bytes          Frames
-------------------------------------------------------------------
0          97780            739             0             0          
1          320275           2187            0             0          
2          36290            276             0             0          
3          2057548          3850            0             0          
4          0                0               0             0          
5          11671943         62673           0             0          
6          0                0               0             0          
7          0                0               0             0          
8          0                0               0             0          
9          649682535        684964          292653502933   307048847  
10         4533408          7042            0             0          

 

But only Queue 9 multicast data has losses on it and it is labelled multicast data.

 

Multicast forwarding happens in hardware by programming the TCAM having so high rates of CPP = protection of supervisor makes me thinking of :

possible bad interaction with the device tracking system or a software bug

 

I would recommend either to open a Cisco TAC service request and/or to plan a software upgtrade

 

Hope to help

Giuseppe

 

Many thanks for the quick reply. I understand what you have highlighted, and will look to raise a TAC case. I've tried so many things to tweak or reduce the multicast hitting the processor, I'm out of other ideas.

Regards

Jon.

Hello Jon,

IGMP traffic should be accounted as queue 15

queue 9 should be used only when needed: during the time to program the FPGA.

The platform mght not be the best solution for multicast

 

You should also consider :

are your multicast flows short living ? if very short living the test might not reflect waht happens to a stream

how many changes happens in the population of receivers?

 

For special events like a CEO session ASM can be good.

For video confering or video sorvelliance Bidirectional PIM

You can use VideoLAN VLC player as a multicast source

Warning : the default TTL should be one so you need to rise it to a value like 32. Then play a movie and go around looking to it with a lsptop or smartphone..

Take in account that all Windows devices and Apple Bonjour are very noise  creating a lot of multicast streams in scoped range 239.255.255.x

 

Finally a divide and conquer strategy can help:

your  2 x Cat9500 SVL  is likely a collapsed core / distribution node.

in the core you can and you should disable igmp snooping

i,e, configure IGMP snooping on access layer switches your device should perform only inter VLAN mutlicast routing . This is a key point for scalabilty.

 

Hope to help

Giuseppe

 

Hi Giuseppe,

 

Some good points for me to work on there, many thanks.

The multicast is multiple sources of 4K video and audio streams between multiple rooms with receivers showing the content in lecture theatres. The streams are pretty much constant for extended durations without much activity with leaving joining streams etc. Once they are receiving, they can be like that for days as the equipment tends to be left running. I'm working on the users to manage this better, but that is very much a staffing management issue rather than technical. Combined, the streams add up to approx 7.5Gbps of traffic which of course all hits the Designated Router etc etc.

 

I will organise a change window where I can try to divide and conquer, but I think disabling the IGMP snooping is a very good point I'd not thought of on the VSS in this collapsed core environment.

 

Once I have the change window organised, I'll report back my progress which hopefully will be positive and may prove useful to others in the community.

 

Many Thanks, Jon.

Leo Laohoo
Hall of Fame
Hall of Fame

Can I see the complete output for the following commands: 

  • sh platform resources
  • sh platform software status con brief

Hi Leo,

Output from commands as requested.

C9500-VSS#sh platform resources 
**State Acronym: H - Healthy, W - Warning, C - Critical                                             
Resource                 Usage                 Max             Warning         Critical        State
----------------------------------------------------------------------------------------------------
 Control Processor       2.37%                 100%            90%             95%             H    
  DRAM                   2556MB(16%)           15634MB         90%             95%             H    
  TMPFS                  171MB(1%)             15634MB         40%             50%             H    

C9500-VSS#
C9500-VSS#sh platform software status control-processor brief 
Load Average
 Slot  Status  1-Min  5-Min 15-Min
1-RP0 Healthy   0.04   0.10   0.09
2-RP0 Healthy   0.08   0.08   0.06

Memory (kB)
 Slot  Status    Total     Used (Pct)     Free (Pct) Committed (Pct)
1-RP0 Healthy 16010152  2616544 (16%) 13393608 (84%)   3132472 (20%)
2-RP0 Healthy 16010152  2564152 (16%) 13446000 (84%)   3112688 (19%)

CPU Utilization
 Slot  CPU   User System   Nice   Idle    IRQ   SIRQ IOwait
1-RP0    0   3.00   0.50   0.00  96.50   0.00   0.00   0.00
         1   2.80   0.80   0.00  96.40   0.00   0.00   0.00
         2   3.59   0.69   0.00  95.70   0.00   0.00   0.00
         3   2.60   0.30   0.00  97.10   0.00   0.00   0.00
         4   0.59   0.99   0.00  98.40   0.00   0.00   0.00
         5   1.80   0.40   0.00  97.80   0.00   0.00   0.00
         6   1.70   0.50   0.00  97.80   0.00   0.00   0.00
         7   1.40   0.50   0.00  98.10   0.00   0.00   0.00
2-RP0    0   0.40   0.20   0.00  99.39   0.00   0.00   0.00
         1   0.50   0.30   0.00  99.19   0.00   0.00   0.00
         2   0.60   0.30   0.00  99.10   0.00   0.00   0.00
         3   0.40   0.10   0.00  99.49   0.00   0.00   0.00
         4   0.30   0.10   0.00  99.60   0.00   0.00   0.00
         5   0.59   0.00   0.00  99.40   0.00   0.00   0.00
         6   0.00   0.00   0.00 100.00   0.00   0.00   0.00
         7   0.09   0.00   0.00  99.90   0.00   0.00   0.00

C9500-VSS#

Many Thanks, Jon

Moving the IGMP snooping helped reduce the CPU load and COPP drops.

 

It still isn't ideal, and appears that the maintenance contract isn't in place meaning not TAC case unfortunately.

 

Overall though, the device is much more responsive. 

 

Many thanks.. Jon

Review Cisco Networking for a $25 gift card