cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3438
Views
0
Helpful
5
Replies

High CPU in Catalyst 3850 (03.02.03.SE)

emfusion1
Level 1
Level 1

Hey everyone,

I have been looking all over the interent for ansers but still haven't found one, so I decided to jump in and post my problem here. Hopefully it could be of help to others. I'll try to keep things to the point :

I have a lot of unicast flooding on my network (one particular vlan) due to the Windows NLB solution. 

The Machines that are doing all the unicast flooding are on our 3850 stack switch.

The 3850 stack is connected to a 4500 switch (LACP bundle)

I followed a high CPU troubleshooting guide and found this : 

------------------------------------------------------------START-------------------------------------------------

#sh processes cpu sorted

Core 0: CPU utilization for five seconds: 16%; one minute: 35%;  five minutes: 35%
Core 1: CPU utilization for five seconds: 12%; one minute: 34%;  five minutes: 33%
Core 2: CPU utilization for five seconds: 95%; one minute: 55%;  five minutes: 57%
Core 3: CPU utilization for five seconds: 99%; one minute: 98%;  five minutes: 98%
PID    Runtime(ms) Invoked  uSecs  5Sec     1Min     5Min     TTY   Process
5627   3036796     16724644 395    51.33    50.83    50.89    1088  fed
9252   3232367     32054903 190    2.91     3.39     3.43     0     iosd
6135   981562      30123953 50     0.63     0.86     0.87     0     pdsd

 

-------

# sh platform punt client

 tag      buffer        jumbo    fallback     packets   received   failures
                                            alloc   free  bytes    conv  buf
 27       0/1024/2048     0/5       0/5        0     0          0     0     0
 65536    0/1024/1600     0/0       0/512  125547205 125547205 3671151438     0     0
 65537    0/ 512/1600     0/0       0/512  383788449 542440234 2501960978     0     0
 65538    0/   5/5        0/0       0/5        0     0          0     0     0
 65539    1/2048/1600     0/16      0/512  60754141 60754140 3989497879     0     0
 65540    0/ 128/1600     0/8       0/0        0     0          0     0     0
 65541    2/ 128/1600     0/16      0/32   608389383 608389600 2017679793     0    41
 65542    0/ 768/1600     0/4       0/0    2139682 25973549  187090182     0  2594
 65544    0/  96/1600     0/4       0/0        0     0          0     0     0
 65545    0/  96/1600     0/8       0/32       0     0          0     0     0
 65546    0/ 512/1600     0/32      0/512  157373665 157373665  339147508     0     0
 65547    0/  96/1600     0/8       0/32       0     0          0     0     0
s65548  512/ 512/1600     0/32    165/256  136689752 136689075 3882833072     0     1
 65551    0/ 512/1600     0/0       0/256      0     0          0     0     0

------

#show pds tag all | in Active|Tags|65548
   Active   Client Client
     Tags   Handle Name                                         TDA             SDA             FDA           TBufD                TBytD
    65548  6681472 Punt Rx Forus Addr Reso Ctrl           136690009       136690009               0       136690009              9413424

------------------------------------------------------------END-------------------------------------------------

The s on the 65548 is parmanent, meaning that the queue is "stuck permanently" (got this from the Cisco guide).

My question is as follows :

Is there a way to forcefully empty this queue without rebooting the switch ?

We are also working on replacing the Windows NLB and upgrading the switch...but before then, i'll like to have more options on the table.

Sorry about the long post. I tried to keep it as simple as possible, but if you need more info, i'll be more than glad to reply.

Many many thanks

 

Tony

 

 

5 Replies 5

Leo Laohoo
Hall of Fame
Hall of Fame

Can you please check if your downlink ports are experiencing Output Drops?  

 

Can you post the output to the command "sh interface count err"?

I entered the commande but got all zeros, except for this interfaces :

3850#sh interface count err

Port        Align-Err     FCS-Err    Xmit-Err     Rcv-Err  UnderSize  OutDiscards
Gi1/0/1             0           0           0           0          0            0
Gi1/0/2             0           0           0           0          0            0
Gi1/0/3             0           0           0           0          0            0
...

...

Gi6/0/43            0           0           0           0          0            0
Gi6/0/44            0           0           0           0          0            0
Gi6/0/45            0           0           0           0          0            0

Port        Align-Err     FCS-Err    Xmit-Err     Rcv-Err  UnderSize  OutDiscards
Gi6/0/46            0           0           0           0          0            0
Gi6/0/47            0           0           0           0          0            0
Gi6/0/48            0           0           0           0          0            0
Po1                 0          88           0          90          0            0
Po5                 0           0           0           0          0            0
Po6                 0           0           0           0          0            0

Port      Single-Col  Multi-Col   Late-Col  Excess-Col  Carri-Sen      Runts
Gi1/0/1            0          0          0           0          0          0
Gi1/0/2            0          0          0           0          0          0
Gi1/0/3            0          0          0           0          0          0

...

...


3850#sh inter Po1
Port-channel1 is up, line protocol is up (connected)
  Hardware is EtherChannel, address is c472.95cf.09b7 (bia c472.95cf.09b7)
  Description: LACP vers c4506
  MTU 1500 bytes, BW 20000000 Kbit/sec, DLY 10 usec,
     reliability 255/255, txload 3/255, rxload 2/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 10Gb/s, link type is auto, media type is
  input flow-control is off, output flow-control is unsupported
  Members in this channel: Te1/1/3 Te3/1/3
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 00:00:00, output never, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 164675000 bits/sec, 32164 packets/sec
  5 minute output rate 298613000 bits/sec, 43274 packets/sec
     2704668126 packets input, 2264269697 bytes, 0 no buffer
     Received 1504866730 broadcasts (1076897602 multicasts)
     0 runts, 0 giants, 0 throttles
     90 input errors, 88 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 1076897602 multicast, 0 pause input
     0 input packets with dribble condition detected
     722603732 packets output, 3697525479 bytes, 0 underruns
     0 output errors, 0 collisions, 1 interface resets
     0 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out

 

 

Hitesh Vinzoda
Level 4
Level 4

This looks like issue with ARP, please use as illustration and find out which interface is culprit and you may shut it down to check if the issue gets resolved.

if its resolved then you would like to find what traffic is going through it and what caused the issue by running some packet captures using SPAN etc.

NOTE: Also ensure that when you run debug it is sent to buffer and not on the console so that you dont loose the access if the debug is overwhelming.

3850-2#show platform punt client
  tag      buffer        jumbo    fallback     packets   received   failures
                                            alloc   free  bytes    conv  buf
 27       0/1024/2048     0/5       0/5        0     0          0     0     0
 65536    0/1024/1600     0/0       0/512      0     0          0     0     0
 65537    0/ 512/1600     0/0       0/512   1530  1530     244061     0     0
 65538    0/   5/5        0/0       0/5        0     0          0     0     0
 65539    0/2048/1600     0/16      0/512      0     0          0     0     0
 65540    0/ 128/1600     0/8       0/0        0     0          0     0     0
 65541    0/ 128/1600     0/16      0/32       0     0          0     0     0
 65542    0/ 768/1600     0/4       0/0        0     0          0     0     0
 65544    0/  96/1600     0/4       0/0        0     0          0     0     0
 65545    0/  96/1600     0/8       0/32       0     0          0     0     0
 65546    0/ 512/1600     0/32      0/512      0     0          0     0     0
 65547    0/  96/1600     0/8       0/32       0     0          0     0     0
 65548    0/ 512/1600     0/32      0/256      0     0          0     0     0
 65551    0/ 512/1600     0/0       0/256      0     0          0     0     0
 65556    0/  16/1600     0/4       0/0        0     0          0     0     0
 65557    0/  16/1600     0/4       0/0        0     0          0     0     0
 65558    0/  16/1600     0/4       0/0        0     0          0     0     0
 65559    0/  16/1600     0/4       0/0        0     0          0     0     0
 65560    0/  16/1600     0/4       0/0        0     0          0     0     0
s65561  421/ 512/1600     0/0       0/128  79565859 131644697  478984244     0 37467
 65563    0/ 512/1600     0/16      0/256      0     0          0     0     0
 65564    0/ 512/1600     0/16      0/256      0     0          0     0     0
 65565    0/ 512/1600     0/16      0/256      0     0          0     0     0
 65566    0/ 512/1600     0/16      0/256      0     0          0     0     0
 65581    0/   1/1        0/0       0/0        0     0          0     0     0
 131071   0/  96/1600     0/4       0/0        0     0          0     0     0
fallback pool: 98/1500/1600
jumbo pool:    0/128/9300

Determine the tag for which the most packets have been allocated. In this example, it is 65561.

Then, enter this command:

3850-2#show pds tag all | in Active|Tags|65561
   Active   Client Client
     Tags   Handle Name                 TDA       SDA         FDA   TBufD       TBytD
    65561  7296672 Punt Rx Proto Snoop  79821397  79821397    0  79821397   494316524

This ouput shows that the queue is Rx Proto Snoop.

The s before the 65561 in the output of the show platform punt client command means that the FED handle is suspended and overwhelmed by the number of incoming packets. If the s does not vanish, it means the queue is stuck permanently.

Step 3: Dump the Packet Sent to the CPU

In the results of the show pds tag all command, notice a handle, 7296672, is reported next to the Punt Rx Proto Snoop.

Use this handle in the show pds client <handle> packet last sink command. Notice that you must enable debug pds pktbuf-lastbefore you use the command. Otherwise you encounter this error:

3850-2#show pds client 7296672 packet last sink
% switch-2:pdsd:This command works in debug mode only. Enable debug using
"debug pds pktbuf-last" command

With the debug enabled, you see this output:

3850-2#show pds client 7296672 packet last sink
Dumping Packet(54528) # 0 of Length 60
-----------------------------------------
Meta-data
0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0010 00 00 16 1d 00 00 00 00 00 00 00 00 55 5a 57 f0  ............UZW.
0020 00 00 00 00 fd 01 10 df 00 5b 70 00 00 10 43 00  .........[p...C.
0030 00 10 43 00 00 41 fd 00 00 41 fd 00 00 00 00 00  ..C..A...A......
0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0050 00 00 00 3c 00 00 00 00 00 01 00 19 00 00 00 00  ...<............
0060 01 01 b6 80 00 00 00 4f 00 00 00 00 00 00 00 00  .......O........
0070 01 04 d8 80 00 00 00 33 00 00 00 00 00 00 00 00  .......3........
0080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0090 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00a0 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00 00  ................
Data
0000 ff ff ff ff ff ff aa bb cc dd 00 00 08 06 00 01  ................
0010 08 00 06 04 00 01 aa bb cc dd 00 00 c0 a8 01 0a  ................
0020 ff ff ff ff ff ff c0 a8 01 14 00 01 02 03 04 05  ................
0030 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11              ............

This command dumps the last packet received by the sink, which is IOSd in this example. This shows that it dumps the header and it can be decoded with Terminal-based Wireshark (TShark). The Meta-data is for internal use by the system, but the Data output provides actual packet information. The Meta-data, however, remains extremely useful.

Notice line that starts with 0070. Use the first 16 bits after that as shown here:

3850-2#show platform port-asic ifm iif-id 0x0104d88000000033
Interface Table
Interface IIF-ID        : 0x0104d88000000033
Interface Name          : Gi2/0/20
Interface Block Pointer : 0x514d2f70
Interface State         : READY
Interface Stauts        : IFM-ADD-RCVD, FFM-ADD-RCVD
Interface Ref-Cnt       : 6
Interface Epoch         : 0
Interface Type          : ETHER
        Port Type         : SWITCH PORT
        Port Location     : LOCAL
        Slot              : 2
        Unit              : 20
        Slot Unit         : 20
        Acitve            : Y
        SNMP IF Index     : 22
        GPN               : 84
        EC Channel        : 0
        EC Index          : 0
        ASIC              : 0
        ASIC Port         : 14
        Port LE Handle    : 0x514cd990
Non Zero Feature Ref Counts
        FID : 48(AL_FID_L2_PM), Ref Count : 1
        FID : 77(AL_FID_STATS), Ref Count : 1
        FID : 51(AL_FID_L2_MATM), Ref Count : 1
        FID : 13(AL_FID_SC), Ref Count : 1
        FID : 26(AL_FID_QOS), Ref Count : 1
Sub block information
        FID : 48(AL_FID_L2_PM), Private Data &colon; 0x54072618
        FID : 26(AL_FID_QOS), Private Data &colon; 0x514d31b8

The culprit interface is identified here. Gig2/0/20 is where there is a traffic generator that pumps ARP traffic. If you shut this down, then it would resolve the problem and minimize the CPU usage.

 

HTH

Hitesh

@Hitesh Vinzoda 

Yes, I have tried that methode but since the queue is "stuck", I'm not capturing any packets :

-----------------------START------------

3850#show pds tag all | in Active|Tags|65548
   Active   Client Client
     Tags   Handle Name                                         TDA             SDA             FDA           TBufD                TBytD
    65548  6681472 Punt Rx Forus Addr Reso Ctrl           136690009       136690009               0       136690009              9413424
3850#show pds client 6681472  packet last sink
% switch-1:pdsd:No packets in client 6681472 Queue

-----------------------END---------------

I also noticed that the switch is no longer learing new arp enteries. I i'm getting a few " Incomplete" when I do the 'show arp' commande.

 

petetritt
Level 1
Level 1

Hello emfusion1,
Did you ever resolve this Problem?
Im looking at the same stuff on my Network:
38_a8zbL2#
38_a8zbL2#show pds tag all | in Active|Tags|65548
   Active   Client Client
     Tags   Handle Name                                         TDA             SDA             FDA           TBufD                TBytD
    65548  7296608 Punt Rx Forus Addr Reso Ctrl            22647608        22647608               0        22647608           1450122596
38_a8zbL2#                    
38_a8zbL2#
38_a8zbL2#


and would be glad to receive some advice.

 

regards,
Pete

Review Cisco Networking for a $25 gift card