06-26-2019 10:45 AM
Hi,
I would like to receive feedback about our network design and posible drawbacks on ACL deployment in our layer 3 switches in the way we are using them. These ACLs, when applied and enough traffic traverses the interface, start to consume CPU in the switches up to 100% (ios-base (62%) and iprouting.iosproc (37%) processes), so the performance starts going down dangerously.
In our deployment we are implementing internal VLAN routing for customer VLANs via two HSRP Cisco 6504 (with IOS 12.2SX) on more than 30 VLANs hanging from them. Trying to control traffic between the VLANs, we implement ACLs in each VLAN interface, applied both outbound and inbound (not in all interfaces, for example, in the access to Internet there is no ACL as we have the perimter firewall).
As an example, in one VLAN interface we have the next two ACL implemented (there are some comments to understand them):
Applied inbound in the interface ("ip access-group ACL_XXX in"):
ip access-list extended ACL_XXX
permit udp any host 224.0.0.2 eq 1985 reflect udpHSRP
permit icmp any any echo
permit icmp any any echo-reply
!outbound ACL reflexive traffic
evaluate tcp_22
evaluate tcp_5666
evaluate tcp_NAT1
evaluate tcp_NAT2
!Denying traffic to the other internal VLANs
deny ip 10.10.X.0 0.0.0.255 10.10.0.0 0.0.255.255
!Permitting rest of Internet traffic
permit tcp 10.10.X.0 0.0.0.255 any reflect salida_tcp
permit udp 10.10.X.0 0.0.0.255 any reflect salida_udp
Applied outbound in the interface ("ip access-group ACL_YYY out"):
ip access-list extended ACL_YYY
permit udp any host 224.0.0.2 eq 1985 reflect udpHSRP
permit icmp any any echo
permit icmp any any echo-reply
!Some administrative permit rules
permit tcp 192.168.0.0 0.0.0.255 10.10.X.0 0.0.0.255 eq 22 reflect tcp_22
permit tcp host 192.168.0.1 10.10.X.0 0.0.0.255 eq 5666 reflect tcp_5666
!External NAT publication rules, that have correspondent on the perimeter firewall
permit tcp any host 10.10.X.X eq 443 reflect tcp_NAT1
permit tcp any host 10.10.X.Y eq 8443 reflect tcp_NAT2
!Reflexive traffic to Internet
evaluate salida_tcp
evaluate salida_udp
As said before, applying one of them while there is traffic in that concrete VLAN raises CPU consumption, and unapplying them suddenly falls again CPU consumption and things calm down. Things that we are reading about and are taking into account:
- Our ACLs are supposed to be processing packets via CEF in the interfaces and should not be kernel processed (what is called "process switching"). For that, we are using "no ip unreachables" and "no ip redirects" on each interface, and we have no explicit "log" command in ACL definition (enough for disabling logging, isn't it?)
- We know that in the moment that the ACLs apply there is a sudden raise in CPU computation, for the calculations that must be done. But these issue is continuous in time while the traffic is flowing
- As we are using reflexive ACLs, can they be processed in hardware (CEF)? Or should process switching be affected? Shouldn't we use them?
- Another question we have is why we don't have "object groups" in our IOS version so we can simplify some ACL design.
- There is no much flexibility in changing things, as is a production environment
Is there any advise you could tell us on what we are doing wrong? Is there a desing flaw or any other way to do this without this harm?
Thanks in advance!!
06-26-2019 01:40 PM
Cisco 6504 (with IOS 12.2SX) - not sure what supervisor you have here. and the code also too old.
can you give us what kind of supervisor you have. yes ACL will have impact on the process..
06-26-2019 02:27 PM
Hi Balaji,
Thanx for the fast reply...
Yes, the systems deployed are quite old (have to see upgrade posibilities). The supervisor is WS-SUP32-GE-3B...
I see you say is a normal issue to have impact on the process... Isn't it a huge impact that just one ACL (doubled one, outbound and inbound in an interface) could raise to 100% the use of CPU anytime there's tracffic on that VLAN?
Regards,
06-27-2019 12:48 AM
WS-SUP32-GE-3B -- Quite OLD
Its not how many ACL, how many time CPU incolved to proces the ACL.
here is some high CPU tips, but i suggest to stay away from ACL on your situation and introduct L2 Bridge FW any small one for mitigate the issue
06-27-2019 10:39 AM
Thanks for the link Balaji, will have a deep look on it... We need that the ACL processing does not hit CPU, and we really don't know why CPU is involved, as nowhere it should be involved...
When you say "L2 Bridge FW", you mean an independent device on each VLAN or a unique device?
The goal of this design and used devices was to have a global L2/L3 fast arch with "basic security capacities in L3" (ACLs):
- If we cannot implement ACLs we have an insecure design
- If we need a Deep Inspection firewall we lose fast switching capacity
- If we need a L2 device in each VLAN our physical and cabling design gets unmanagable
Regards,
06-28-2019 09:27 AM
As per the email and you are sure you hitting with ACL, so based on the information i have suggested.
if you are sure it was not hit by ACL when ever you hit with CPU high post show process cpu sorted to look
yes below Bridge FW i was suggesting again we need to look your network and suggest best
07-01-2019 01:24 AM
Reading carefully the document you posted about CPU tips, I see this:
"Policy-routed traffic, with use of match length, set ip precedence, or other unsupported parameters"
We have to make some tests and verify this, but the VLANs that make CPU get high (not sure if all of them) have a small route-map applied, only used for having a different default next-hop. This is the code applied:
The route-map:
route-map ZZZ permit 10
match ip address YYY
set ip default next-hop x.x.x.x
How we apply them in each interface:
interface Vlan XXX
ip address 10.10.X.X 255.255.255.0
ip access-group ACL_XXX in
ip access-group inbound_122 out
no ip redirects
no ip unreachables
ip policy route-map ZZZ
standby XXX ip 10.10.X.Z
standby XXX timers 5 15
standby XXX priority 150
standby XXX preempt
Don't really understand if our code fulfills the conditions for being process switched, what do you think?
Anyway, we will verify that CPU raises only on route-mapped VLANs, and if so, will make changes properly not to use these route-maps... This could be our definite solution!!
Will post back when tests are made.
Thanks again!
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide