06-20-2012 05:35 AM - edited 03-07-2019 07:21 AM
Hi all,
I have got a problem recently on a LAN where the 4500 switch acts as the L3 switches, a blade center server had a network driver card bug that flooded the 4500 L3 switch and so CPU was reaching 100% stopping th whole production.
sup card is the following : 1 2 Supervisor IV 1000BaseX (GBIC) WS-X4515 JAE0935K45J
chassis cisco WS-C4506 (MPC8245) processor (revision 7) with 524288K bytes of memory.
During the outage, the commands show platform health and show platform cpu packet statistics could be passed and I got the following, you can notice the large number of L2 Fwd Low.
The problem has been stopped identifying which server was flooding ( doing interface vlan shut tries, then watching servers on the vlan identified) and then stopping the identified server.
Now, the big point for me is to be able to limit a such problem in the future like limiting the CPU utilisation in case any server on the LAN "arp flood" the coreswitch for any reason. I heard about the DAI, but seems to be used with the DHCP snooping and build a table in the conf, but I have too much servers in this network (hundreds). I will also set the storm-control functionality, but the 4500 here are old and the unicast limitation is not existing in, only the broadcast (and multicast in the last version of IOS).
Someone told me about the MLS rate-limit but I don't know this functionality.
Can someone give me some guidances about a command that would prevent the core switch reaching 100% limiting the arp requests for example, this is what I need, or another good idea. I thank you for the time you will take to read this post.
show platform cpu packet statistics
Packets Dropped In Hardware By CPU Subport (txQueueNotAvail)
CPU Subport TxQueue 0 TxQueue 1 TxQueue 2 TxQueue 3
------------ --------------- --------------- --------------- ---------------
0 0 0 0 281812
2 0 311220 0 0
7 0 0 0 682198047
RkiosSysPacketMan:
Packet allocation falures: 0
Packet Buffer(Software Common) allocation falures: 0
Packet Buffer(Software ESMP) allocation falures: 0
Packet Buffer(Software EOBC) allocation falures: 0
IOS Packet Buffer Wrapper allocation falures: 0
Packets Dropped In Processing Overall
Total 5 sec avg 1 min avg 5 min avg 1 hour avg
-------------------- --------- --------- --------- ----------
64 0 0 0 0
Packets Dropped In Processing by CPU event
Event Total 5 sec avg 1 min avg 5 min avg 1 hour avg
----------------- -------------------- --------- --------- --------- ----------
Input Acl 25 0 0 0 0
SA Miss 15 0 0 0 0
Packets Dropped In Processing by Priority
Priority Total 5 sec avg 1 min avg 5 min avg 1 hour avg
----------------- -------------------- --------- --------- --------- ----------
Normal 24 0 0 0 0
Medium 39 0 0 0 0
High 25 0 0 0 0
Packets Dropped In Processing by Reason
Reason Total 5 sec avg 1 min avg 5 min avg 1 hour avg
------------------ -------------------- --------- --------- --------- ----------
SrcAddrTableFilt 2 0 0 0 0
L2DstDrop 13 0 0 0 0
NoDstPorts 24 0 0 0 0
NoFloodPorts 25 0 0 0 0
Total packet queues 16
Packets Received by Packet Queue
Queue Total 5 sec avg 1 min avg 5 min avg 1 hour avg
---------------------- --------------- --------- --------- --------- ----------
Esmp 387108 200 191 157 52
Control 3585 0 0 0 0
Host Learning 36405 0 0 0 0
L3 Fwd High 12462 4 2 0 0
L3 Fwd Medium 765 0 0 0 0
L3 Fwd Low 91200 56 33 27 10
L2 Fwd Low 13773746 7809 7870 6478 2012
L3 Rx Low 10409 3 2 0 0
ACL fwd(snooping) 131185 68 60 48 16
ACL sw processing 1 0 0 0 0
Packets Dropped by Packet Queue
Queue Total 5 sec avg 1 min avg 5 min avg 1 hour avg
---------------------- --------------- --------- --------- --------- ----------
Host Learning 210028 0 0 0 25
L2 Fwd Low 620345446 359549 355739 289024 90541
06-23-2012 08:24 AM
Hello,
DAI might be of help, according to documentation ARP ACLs can be used in non DHCP environments like yours.
An ARP ACL per SVI is needed listing all the permittted IP/MAC pairs of servers in that IP subnet.
With hundreds of servers it looks like a long job, but it should be feasible with one ACL per IP subnet.
With default settings each untrusted port is limited to 15 ARP packets /sec
Besides this, the configuration guide reports that enabling DAI increases the cpu usage see this note.
>>"When you enable DAI, all ARP packets are forwarded by CPU (software forwarding, the slow path). With this mechanism, whenever a packet exits through multiple ports, the CPU must create as many copies of the packet as there are egress ports. The number of egress ports is a multiplying factor for the CPU. When QoS policing is applied on egress packets that were forwarded by CPU, QoS must be applied in the CPU as well. (You cannot apply QoS in hardware on CPU generated packets because the hardware forwarding path is turned off for CPU generated packets.) Both factors can drive the CPU to a very high utilization level."
Before attempting this big job I would open a TAC service request to ask if DAI can be a tool to be used in your environment or is not a viable option
There is a chapter about CoPP (Control Plane Policing) but I don't see ARP mentioned in the chapter
see
>> ARP policing is not supported on either the classic series supervisor engines or fixed configuration switches. It is supported on the Catalyst 4900M and 4948E switches, Supervisor Engine 6-E, and Supervisor Engine 6L-E.
So no CoPP for ARP you have a sup IV
Hope to help
Giuseppe
06-27-2012 10:45 AM
Giuseppe, I would like to thank you for your response firstly. Thats kind taking care about my issue.
Did you had the opportunity to implement DAI on some networks and if yes, what is the real degree of complexity of implementation maybe non wanted behavior.
This customer runs a lot of blade centers connected to the distrib cisco switches, and also some ESX running a lot of virtual images, the risk could be to set a bad threshold for the broadcast packets counts and limit the traffic to normal traffic as those devices run between 12 and 30 images.
If I understand the implementation, a generic ACL per subnet matching traffic could be implemented and after, a per port DAI implementation has to be configured with a specific threshold. Am I correct?
The main cause of the initial issue I had was a unicast flood between the server and the core switch gateway, do you think the DAI can provide protection against broadcast but also unicast floods?
A ticket is currently being opened by the support to the Cisco TAC to discuss about this and get the recommandations also.
Have a nice day Giuseppe
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide