cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3781
Views
0
Helpful
0
Replies

DHCP / Broadcast storm isloation

rtjensen4
Level 4
Level 4

I'm stumpped...

Yesterday afternoon, the CPUs on 4 of my Cat 4506 (Sup-II+) spiked up to 80+%

Doing a show proc cpu sort | e 0.00 reveals the following:

sh proc cpu sort | e 0.00

CPU utilization for five seconds: 85%/2%; one minute: 88%; five minutes: 92%

PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process

  52   440279804 322571476       1364 30.95% 29.32% 30.67%   0 Cat4k Mgmt LoPri

147    28747668  96973101        296 28.31% 31.23% 33.73%   0 DHCPD Receive

109    40035412 135991272        294 14.15% 15.25% 16.89%   0 IP Input

  51   4586419641101973901        416  6.31%  6.89%  6.69%   0 Cat4k Mgmt HiPri

  61         816       171       4771  1.11%  0.99%  0.24%   1 SSH Process

115    24551844 123358787        199  0.95%  0.85%  0.86%   0 Spanning Tree

218     1499404 547141208          2  0.23%  0.20%  0.21%   0 HSRP Common

Ok, So I assume it's a DHCP storm causing my problems. I do a "Debug ip dhcp server events" and it shows me this:

015699: Aug  7 15:21:16.317: DHCPD: Sending notification of DISCOVER:

015700: Aug  7 15:21:16.317:   DHCPD: htype 1 chaddr ffff.ffff.ffff

015701: Aug  7 15:21:16.317:   DHCPD: interface = Vlan141

015702: Aug  7 15:21:16.377: DHCPD: Sending notification of DISCOVER:

015703: Aug  7 15:21:16.377:   DHCPD: htype 1 chaddr ffff.ffff.ffff

015704: Aug  7 15:21:16.377:   DHCPD: interface = Vlan110

015705: Aug  7 15:21:16.421: DHCPD: Sending notification of DISCOVER:

015706: Aug  7 15:21:16.421:   DHCPD: htype 1 chaddr ffff.ffff.ffff

015707: Aug  7 15:21:16.421:   DHCPD: interface = Vlan250

over and over again.

I did a packet capture and I see a TON of packets like this.

S     ->     D      Info

192.168.102.176 -> 192.168.110.255     DHCPDISCOVER

192.168.102.176 -> 192.168.141.255     DHCPDISCOVER

192.168.102.176 -> 192.168.250.255     DHCPDISCOVER

S-MAC     ->      D-MAC

<core switch>     ffff.ffff.ffff

about 1000 packets/second.

(Note, 192.168.110.0/24 = VLAN 110, 192.168.141.0/24 = VLan 141, 192.168.250.0/24= Vlan250)

The fact that the source mac address is that of my core switch leads me to believe the broadcast is being propgated from somewhere else, a directed-broadcast by chacne?

Ok, just for troubleshooting purposes, I remove these VLANs from the trunks to one of my switches with the high CPU... as expected CPU drops back to normal.

I add VLAN 110 back on, DHCPD process jumps up to 8%, add vlan 141 back on, DHCPD process jumps to 12-16%...

Ok, so it would seem that these three VLANs are getting beaten up. I enable storm-control on all my access-ports at a fairly low amount, on the 4500s, i used a threshold of 0.2%. Storm-control kicked in on my uplink ports to my core, but nowhere else.

The server that has the IP of 192.168.102.176 (Vlan 102), doesn't seem to be doing anything, i actually turn it off. Packets STILL flowing...

I verified that directed-broadcast was disabled on the VLAN interfaces, which it is.

At this point I assume SOMTHING is spoofing packets somewhere and it needs to stop. I jump on my core switches (the 4500s are my access-layer), which is a pair of Nexus 5548 switches, to do a bunch of captures. I capture off every interface with the rx option set, idea being I'll find the interface where these packets are being received and that'll let me narrow it down. Yeah, not so much. I dont see them coming from ANY ports on my Nexus cores, not the uplinks to my access-layer switches, not from the port-channels to my ESX hosts, not on regular server access ports.

I dont know where to look from here. I have 4x Cat 4506s running at nearly max CPU and I dont hvae a solution to stop it. ANyone have any ideas?? Thanks.

0 Replies 0