05-10-2021 08:15 AM - edited 05-10-2021 11:19 AM
EDIT: This post is now academic. About 24 hours after implementing the configuration, the switch stack lost network connectivity and started spitting weird errors pertaining to stackport STP. After a reload, it worked for a while, then went offline again with the classic CPUHB_RECV_STARVE error. At that point, I disabled ip routing and stretched layer-2 over the WAN. My takeaway is: just because the 2960X has layer-3 features does not mean you can actually use them.
I have a stack of three Catalyst 2960X switches running 15.2(7)E2 operating as multilayer switch, doing routing for seven VLANs. I know the 2960X is not marketed for this job, but it should be able to do it (it even runs OSPF now) and it's what I have available. The only alternative would be to stretch layer-2 over a four mile WAN link.
Five of the seven SVIs have an ip access-group configured, but only one of them is working consistently. The other four are not blocking any of the traffic they are supposed to block. They're indicating a few "deny" hits, but in practice they don't seem to be doing anything.
The access lists use object/service groups pretty extensively, but I'm nowhere near the TCAM limits:
ce-mdf#show platform tcam utilization asic all CAM Utilization for ASIC# 0 Max Used Masks/Values Masks/values Unicast mac addresses: 32988/32988 495/495 IPv4 IGMP groups + multicast routes: 1072/1072 3/3 IPv4 unicast directly-connected routes: 2048/2048 492/492 IPv4 unicast indirectly-connected routes: 1024/1024 116/116 IPv6 Multicast groups: 1072/1072 11/11 IPv6 unicast directly-connected routes: 2048/2048 0/0 IPv6 unicast indirectly-connected routes: 1024/1024 3/3 IPv4 policy based routing aces: 504/504 14/14 IPv4 qos aces: 504/504 51/51 IPv4 security aces: 600/600 89/89 IPv6 policy based routing aces: 20/20 8/8 IPv6 qos aces: 500/500 44/44 IPv6 security aces: 600/600 18/18 Note: Allocation of TCAM entries per feature uses a complex algorithm. The above information is meant to provide an abstract view of the current TCAM utilization CAM Utilization for ASIC# 1 Max Used Masks/Values Masks/values Unicast mac addresses: 32988/32988 495/495 IPv4 IGMP groups + multicast routes: 1072/1072 3/3 IPv4 unicast directly-connected routes: 2048/2048 492/492 IPv4 unicast indirectly-connected routes: 1024/1024 116/116 IPv6 Multicast groups: 1072/1072 11/11 IPv6 unicast directly-connected routes: 2048/2048 0/0 IPv6 unicast indirectly-connected routes: 1024/1024 3/3 IPv4 policy based routing aces: 504/504 0/0 IPv4 qos aces: 504/504 51/51 IPv4 security aces: 600/600 89/89 IPv6 policy based routing aces: 20/20 0/0 IPv6 qos aces: 500/500 44/44 IPv6 security aces: 600/600 18/18 Note: Allocation of TCAM entries per feature uses a complex algorithm. The above information is meant to provide an abstract view of the current TCAM utilization
Below are the object group, ACL, SVI, and routing configurations, with some of the IPs changed for sanitation purposes. These are maintained centrally via Ansible on what we loosely call "distro" switches at six different sites in the WAN. The ACLs and VLAN names are the same between all buildings, but the subnets and VLAN numbers are different, which is why there are some ACL entries referencing specific hosts that could not possibly exist in the LAN with this switch.
The Guest-ACL on Vlan616 is doing exactly what it's supposed to do. The others are not blocking anything. For example, host 10.6.8.65 in the Production network should not able to SSH to host 10.2.1.143 because of line 330 on the Production-ACL, but it can.
I see nothing suspicious in the switch logs or CPU usage history. The Guest-ACL is the shortest one, so is it possible the others are too substantial for the 2960X, regardless of TCAM utilization? Or could I be hitting an undocumented firmware bug here?
Network object group Administrators-OGP host 10.2.8.20 host 10.2.8.54 host 10.2.8.47 172.27.224.0 255.255.240.0 172.27.240.0 255.255.240.0 Network object group Domain-Controllers-OGP host 10.2.1.174 host 10.2.1.175 host 10.2.1.176 Network object group Jumpboxes-OGP host 10.2.1.10 host 10.2.1.15 host 10.2.1.16 host 10.2.1.17 host 10.2.1.138 host 10.2.1.120 Service object group AD-Client-SGP udp eq domain udp eq 88 tcp eq 88 udp eq 464 tcp eq 464 tcp range 3268 3269 udp eq 445 tcp eq 445 udp eq 135 tcp eq 135 udp range 49152 65535 tcp range 49152 65535 udp eq 636 tcp eq 636 udp eq 389 tcp eq 389 udp eq ntp Service object group Cert-Enrollment-SGP tcp eq www tcp eq 443 udp eq 135 tcp eq 135 udp range 49152 65535 tcp range 49152 65535 Service object group Genetec-Mobile-SGP tcp eq www tcp eq 443 tcp range 8100 8101 Service object group Jumpbox-SGP udp eq 3389 tcp eq 3389 tcp eq 445 tcp eq 22 Service object group PaperCut-SGP tcp range 9163 9164 tcp range 9173 9174 tcp eq 9191 udp eq 445 tcp eq 445 Service object group Security-Desk-SGP tcp eq 443 tcp eq 5500 tcp eq 8012 udp eq 554 tcp eq 554 udp eq 560 tcp eq 560 udp eq 5004 tcp eq 5004 Service object group Web-Services-SGP tcp eq www tcp eq 443 Extended IP access list Production-ACL 10 permit icmp any any 20 permit udp any eq bootpc object-group Domain-Controllers-OGP eq bootps 30 permit object-group AD-Client-SGP any object-group Domain-Controllers-OGP 40 permit object-group Cert-Enrollment-SGP any host 10.2.1.158 50 permit tcp any host 10.2.1.158 eq 1688 60 permit object-group Jumpbox-SGP any object-group Jumpboxes-OGP 70 permit ip any object-group Administrators-OGP 80 permit udp any host 10.2.1.120 eq ntp 90 permit object-group Web-Services-SGP any host 10.2.1.21 100 permit object-group Web-Services-SGP any host 10.2.1.205 110 permit object-group Genetec-Mobile-SGP any host 10.2.1.233 120 permit object-group Security-Desk-SGP any host 10.2.1.26 130 permit ip any host 10.2.1.141 140 permit object-group Web-Services-SGP any host 10.2.1.31 150 permit tcp any host 10.2.1.140 range 4505 4506 160 permit ip any host 10.2.1.30 (3 matches) 170 permit object-group Web-Services-SGP any host 10.2.1.144 180 permit object-group Web-Services-SGP any host 10.2.1.137 190 permit object-group Web-Services-SGP any host 10.2.1.33 200 permit tcp any host 10.2.1.33 eq 445 210 permit object-group Web-Services-SGP any host 10.2.1.22 220 permit tcp any host 10.2.1.117 eq 443 230 permit tcp any host 10.2.1.155 range 55222 55225 240 permit object-group PaperCut-SGP any host 10.2.1.112 250 permit object-group Web-Services-SGP any host 10.2.1.134 260 permit ip any host 10.2.1.37 (5 matches) 270 permit tcp any host 10.2.1.96 eq 445 280 permit tcp any host 10.2.1.204 eq www 290 permit ip any host 10.2.1.206 300 permit ip any host 10.2.1.142 310 permit tcp any host 10.2.1.226 eq 3333 320 permit tcp any host 10.2.1.226 eq 8888 330 deny ip any 10.2.1.0 0.0.0.255 (147 matches) 340 deny ip any 172.16.254.0 0.0.0.255 350 permit ip any any (3 matches) Extended IP access list Guest-ACL 10 permit udp any eq bootpc object-group Domain-Controllers-OGP eq bootps 20 permit udp any object-group Domain-Controllers-OGP eq domain 30 permit object-group Web-Services-SGP any host 10.2.1.21 40 permit object-group Web-Services-SGP any host 10.2.1.205 50 permit object-group Genetec-Mobile-SGP any host 10.2.1.233 60 deny ip any 10.0.0.0 0.255.255.255 70 deny ip any 172.16.254.0 0.0.0.255 80 permit ip any any Extended IP access list BldgMgmt-ACL 10 permit icmp any any 20 permit udp any eq bootpc object-group Domain-Controllers-OGP eq bootps 30 permit udp any object-group Domain-Controllers-OGP eq domain 40 permit tcp any object-group Domain-Controllers-OGP eq 636 50 permit ip any object-group Jumpboxes-OGP 60 permit ip any object-group Administrators-OGP 70 permit udp any host 10.2.1.139 eq snmp 80 permit udp any host 10.2.1.120 eq ntp 90 permit tcp any host 10.2.1.201 eq smtp 100 permit ip any host 10.2.1.30 110 permit ip any host 10.2.1.22 120 permit ip any host 10.2.1.112 130 permit ip any host 10.2.1.26 140 deny ip any 10.2.1.0 0.0.0.255 150 deny ip any 172.16.254.0 0.0.0.255 160 permit ip any any Extended IP access list Security-Cameras-ACL 10 permit icmp any any 20 permit ip any host 10.5.3.145 30 permit ip any host 10.5.3.132 40 permit ip any host 10.5.3.135 50 permit ip any host 10.5.3.136 60 permit ip 10.2.1.0 0.0.0.255 any 70 permit ip 10.2.2.0 0.0.0.255 any (222 matches) 80 permit ip host 10.2.1.120 any 90 permit ip object-group Administrators-OGP any 100 permit ip 172.16.254.0 0.0.0.255 any 110 permit ip 172.16.255.0 0.0.0.255 any 120 permit ip 172.16.1.0 0.0.0.255 any 130 permit ip 172.3.1.0 0.0.0.255 any 140 permit ip 172.4.1.0 0.0.0.255 any 150 permit ip 172.5.1.0 0.0.0.255 any 160 permit ip 172.16.6.0 0.0.0.255 any 170 permit ip 172.9.1.0 0.0.0.255 any interface Vlan61 description CE Mgmt ip address 10.6.1.2 255.255.255.0 ip access-group BldgMgmt-ACL in ip helper-address 10.2.1.174 ip helper-address 10.2.1.175 no ip redirects no ip unreachables no ip proxy-arp ip route-cache policy interface Vlan62 description CE Inline ip address 10.6.2.1 255.255.255.0 ip access-group Production-ACL in ip helper-address 10.2.1.174 ip helper-address 10.2.1.175 no ip redirects no ip unreachables no ip proxy-arp ip route-cache policy interface Vlan63 description CE Cameras ip address 10.6.3.1 255.255.255.0 ip access-group Security-Cameras-ACL out ip helper-address 10.2.1.174 ip helper-address 10.2.1.175 no ip redirects no ip unreachables no ip proxy-arp ip route-cache policy interface Vlan68 description CE Relay ip address 10.6.8.1 255.255.248.0 ip access-group Production-ACL in ip helper-address 10.2.1.174 ip helper-address 10.2.1.175 no ip redirects no ip unreachables no ip proxy-arp ip route-cache policy interface Vlan616 description CE Guest ip address 10.6.16.1 255.255.248.0 ip access-group Guest-ACL in ip helper-address 10.2.1.174 ip helper-address 10.2.1.175 no ip redirects no ip unreachables no ip proxy-arp ip route-cache policy interface TenGigabitEthernet1/0/2 description WAN Link no switchport ip address 10.6.255.2 255.255.255.0 no ip redirects no ip proxy-arp ip pim sparse-mode Gateway of last resort is 10.6.255.1 to network 0.0.0.0 S* 0.0.0.0/0 [1/0] via 10.6.255.1
Any thoughts are appreciated!
05-10-2021 01:43 PM
high level i do not see any reason here, but need to check some more information :
can you post complete configuration "show run"
also below output :
show vlan
show ip interface brief
05-11-2021 05:24 AM
I appreciate your willingness to invest the time, but I've disabled IP routing for now and stretched layer-2 over the WAN, so the config no longer exists. The ACL inconsistencies were only the beginning; after about 24 hours, the switch stack lost connectivity while spitting these errors on all VLANs:
May 10 13:25:38.943: %SPANTREE-2-RECV_PVID_ERR: Received BPDU with inconsistent peer vlan id 616 on StackPort1 VLAN68. May 10 13:25:38.943: %SPANTREE-2-BLOCK_PVID_PEER: Blocking StackPort1 on VLAN0616. Inconsistent peer vlan.
After some troubleshooting and a reload, it started working but then, two hours later, went offline again while filling up with these:
May 10 15:25:26.262: %SUPQ-4-CPUHB_RECV_STARVE: Still seeing receive queue stuck after throttling
At that point I cut bait, offloaded the routing for this site to a core switch at another site, reverted the 2960X to layer-2, and everything has been fine ever since. I don't plan to attempt this again with a 2960X. I'll try to budget for a Cat 9300 to do this properly, but in the meantime, I have not seen any ill-effects from stretching layer 2.
05-11-2021 05:31 AM - edited 05-11-2021 05:31 AM
Thank for the feedback, i know 2960 are not powered as expected, yes Cat 9300 is good to go moving forward.
hope all good now.
05-10-2021 10:50 PM - edited 05-10-2021 10:52 PM
Hello
Your problem is the acls-
Although they are convoluted due to the nesting they are not correct in some areas
example-
security-cameras-acl aces 20-50 are incorrect
Applying acls to SVIs the logic is-
IN = sourced from within the vlan
OUT - sources from outside vlan
So if you see hits on certain aces within an acl then its because for that particular ace the logic works and the others dont. or that simply traffic isn’t being matched upon that particular ace
Also the 2960x can perform L3 but you my need to change the sdm template to accommodate additional resources for acl/ipv4 routing
sh sdm prefer
05-11-2021 05:17 AM
Thanks for your reply. The ACEs you identify are actually correct under the circumstances. Unlike the others, Security-Cameras-ACL is placed OUT on the Vlan63 SVI by design, and as indicated in the original post, ACLs are pushed to all sites via a central Ansible playbook. This leads to a handful of inconsistencies like the ones you identified, since there are no 10.5.x.x hosts at this site. Those ACEs would just be ignored here, and I wasn't expecting to see hits on them. There are only a few ACEs like that, and I should probably have just edited the config to omit them from the original post, as it's a bit confusing.
The SDM template is lanbase-default. I'd considered this as well, but I'd figured we see TCAM utilization issues if the template was wrong. Could be mistaken.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide