I'm stumpped...
Yesterday afternoon, the CPUs on 4 of my Cat 4506 (Sup-II+) spiked up to 80+%
Doing a show proc cpu sort | e 0.00 reveals the following:
sh proc cpu sort | e 0.00
CPU utilization for five seconds: 85%/2%; one minute: 88%; five minutes: 92%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
52 440279804 322571476 1364 30.95% 29.32% 30.67% 0 Cat4k Mgmt LoPri
147 28747668 96973101 296 28.31% 31.23% 33.73% 0 DHCPD Receive
109 40035412 135991272 294 14.15% 15.25% 16.89% 0 IP Input
51 4586419641101973901 416 6.31% 6.89% 6.69% 0 Cat4k Mgmt HiPri
61 816 171 4771 1.11% 0.99% 0.24% 1 SSH Process
115 24551844 123358787 199 0.95% 0.85% 0.86% 0 Spanning Tree
218 1499404 547141208 2 0.23% 0.20% 0.21% 0 HSRP Common
Ok, So I assume it's a DHCP storm causing my problems. I do a "Debug ip dhcp server events" and it shows me this:
015699: Aug 7 15:21:16.317: DHCPD: Sending notification of DISCOVER:
015700: Aug 7 15:21:16.317: DHCPD: htype 1 chaddr ffff.ffff.ffff
015701: Aug 7 15:21:16.317: DHCPD: interface = Vlan141
015702: Aug 7 15:21:16.377: DHCPD: Sending notification of DISCOVER:
015703: Aug 7 15:21:16.377: DHCPD: htype 1 chaddr ffff.ffff.ffff
015704: Aug 7 15:21:16.377: DHCPD: interface = Vlan110
015705: Aug 7 15:21:16.421: DHCPD: Sending notification of DISCOVER:
015706: Aug 7 15:21:16.421: DHCPD: htype 1 chaddr ffff.ffff.ffff
015707: Aug 7 15:21:16.421: DHCPD: interface = Vlan250
over and over again.
I did a packet capture and I see a TON of packets like this.
S -> D Info
192.168.102.176 -> 192.168.110.255 DHCPDISCOVER
192.168.102.176 -> 192.168.141.255 DHCPDISCOVER
192.168.102.176 -> 192.168.250.255 DHCPDISCOVER
S-MAC -> D-MAC
<core switch> ffff.ffff.ffff
about 1000 packets/second.
(Note, 192.168.110.0/24 = VLAN 110, 192.168.141.0/24 = VLan 141, 192.168.250.0/24= Vlan250)
The fact that the source mac address is that of my core switch leads me to believe the broadcast is being propgated from somewhere else, a directed-broadcast by chacne?
Ok, just for troubleshooting purposes, I remove these VLANs from the trunks to one of my switches with the high CPU... as expected CPU drops back to normal.
I add VLAN 110 back on, DHCPD process jumps up to 8%, add vlan 141 back on, DHCPD process jumps to 12-16%...
Ok, so it would seem that these three VLANs are getting beaten up. I enable storm-control on all my access-ports at a fairly low amount, on the 4500s, i used a threshold of 0.2%. Storm-control kicked in on my uplink ports to my core, but nowhere else.
The server that has the IP of 192.168.102.176 (Vlan 102), doesn't seem to be doing anything, i actually turn it off. Packets STILL flowing...
I verified that directed-broadcast was disabled on the VLAN interfaces, which it is.
At this point I assume SOMTHING is spoofing packets somewhere and it needs to stop. I jump on my core switches (the 4500s are my access-layer), which is a pair of Nexus 5548 switches, to do a bunch of captures. I capture off every interface with the rx option set, idea being I'll find the interface where these packets are being received and that'll let me narrow it down. Yeah, not so much. I dont see them coming from ANY ports on my Nexus cores, not the uplinks to my access-layer switches, not from the port-channels to my ESX hosts, not on regular server access ports.
I dont know where to look from here. I have 4x Cat 4506s running at nearly max CPU and I dont hvae a solution to stop it. ANyone have any ideas?? Thanks.