cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
12918
Views
21
Helpful
20
Replies

Issue with CoA? ISE 2.7/9800-80

Erik Allen
Level 1
Level 1

Hello,

 

Recently we've been working on deploying new 9800-80's and currently have them set up in our test environment, they are running 17.3.3. We are presently using ISE 2.7.0.356 as well.

 

We've set up a CWA Guest Portal and our redirects are presently working. The user joins the Guest SSID we have created and are presented with the captive portal page. Upon pressing "accept" to gain wireless access, the client is properly moved to the run state and a successful log is placed in the ISE Live Logs saying all is well.

 

The problem, however, is that the client does not recognize that it has been moved into the run state and does not gain internet access unless the client disassociates from the SSID and re-associates. The client runs into no further issues upon re-association. We've tested this on multiple iPhones running iOS 14.7, a macbook air running OSX_Catalina and three motorola android devices. All exhibit the same issue.

 

Checking radioactive traces show the clients move through L2 and L3 auth as expected and move into the run state. The only errors generated pertain to 11w not being enabled as this is an open guest SSID.

 

Any thoughts on what to try would be appreciated.

 

Thanks.

 

 

20 Replies 20

Hi Erik,

1) What is the WLC code: is it 17.3.3 (just for confirmation)?

2) What are the AP models?

3) What is the AP mode? Local or Flex? If the AP mode is Flex, does the problem happen with local mode APs too?

4) Is this Foreign/Anchor scenario?

5) Is the DHCP external or local, meaning the 9800 is acting as the DHCP or you have external DHCP server?

6) If you have external DHCP server, where did you configure the ip helper address, from the 9800 or from the connected Switch?

7) Is the clients VLAN configured on the WLC has a Layer 3 interface or just Layer 2 (I saw many problems with Layer 3 interfaces for client’s subnet and the best practices is to go with Layer 2 interface for all clients unless we need to implement mDNS or DHCP relay or Internal DHCP Server) Best Practices Doc is here: https://www.cisco.com/c/en/us/products/collateral/wireless/catalyst-9800-series-wireless-controllers/guide-c07-743627.html

(((DHCP bridging is the recommended mode)))

I’m not TAC but I really suspect a DHCP problem here (not from your DHCP server, instead handling the DHCP packets from/to AP-WLC-switch)

 

If you want to go further to troubleshoot this without TAC, then collect RA trace “radioactive” as explained here https://www.cisco.com/c/en/us/support/docs/wireless/catalyst-9800-series-wireless-controllers/213949-wireless-debugging-and-log-collection-on.html#anc13 then verify if you’re seeing the below messages:

Failed to get ewlc dot11 packet handler. Dot11 action processing error. Dropping request

Skipping DHCP TLVs for further processing. DHCP based classification isn't enabled

If that’s the case, then you will see just

DHCP_DISCOVER

DHCP_OFFER

Without DHCP_REQUEST

Then the client sends another discover because didn’t get an offer. This will validate that you have problem with DHCP handling.

Oh, I forgot to mention that I did test CWA with ISE 3.0 Patch 2 and Patch 3 and it’s working fine for me with iOS 14.7.1

Hi Grendizer,

1. Version is 17.3.3

2. We were initially using an AIR-AP3802i-B-K9, but switched to using AIR-AP2802i-B-K9. I've also tried a 3702, problem persists across models.

3. AP's are in Local Mode, we don't utilize flex.

4. This is not a Foreign/Anchor scenario.

5. We are utilizing our external DHCP server on site.

6. We configured IP helper addresses on each of our Layer 2 VLANS. Additionally, our connected switches do have an ip helper address configured. These IP's match.

7. The client VLANs are configured as a layer 2 interface only. The configuration of one of them looks like such:

 

interface Vlan1101
description DC WiFi Guest1
ip address 10.18.0.11 255.255.248.0
ip helper-address 172.21.64.101

 

8. Radioactive Trace is attached, and I unfortunately do not see those statements in there. I ran this through the debug analyzer and don't see anything that sticks out, but will admit that this is a little over my head.

 

Thoughts?

 

Ok! Try two (didn't sanitize my debug enough, I've double checked and removed every identifying element in the debug)

 

1. WLC Code is 17.3.3, and we're using ISE 2.7 on Patch 4.

2. We've tried various AP models: a 3802i, 2802i, 3702i, and a 2702i. The issue persists on all of these models.

3. AP mode is Local, as we do not utilize flex.

4. This is not a Foreign/Anchor scenario. The controller is acting alone.

5. DHCP server is external. The 9800 is not acting as our DHCP server.

6. The IP helper address is configured per interface on the 9800. I also have the ip helper address on our downstream switches to aid wired clients, etc.

7. The client vlans are purely layer 2 interfaces. We wanted to avoid the layer 3 interfaces due to reported issues. Our DHCP mode is in bridging mode.

 

It looks like DHCP is working as expected. However, I'm not too well versed in deciphering the log and the parser seems to suggest that things look ok? Not quite sure.

 

Lastly, I've ran a debug and radioactive trace. I've attached it after sanitizing any identifying information from it. (all information from the trace is sanitized. I've removed IP addresses/mac addresses/etc and substituted them with their corresponding purpose, I.E "[Wireless Controller] and [Client Mac]" etc.)

OK, two things need to be checked:
After the client accept the AUP page/portal, ISE is sending back to the WLC two things to apply them to the client session (yes, you will see successful log from ISE and the WLC moved the client to run state with below):

1. ACL: DENY_GUEST_INTERNAL
2. security-group-tag=0006
ISE shouldn't send group tag in this case
For the ACL, best practice is to not use ACL to control guest traffic, instead use Anchor deployment in the DMZ or VRFs design so you can isolate the guest traffic from corp traffic. In this case it might causing the problem somehow, you can start removing the security tag first from ISE config then if that doesn't fix the problem then try to remove the ACL too from ISE reply and make it simple "Permit Access" as in below screenshot.
I'm positive one of the above is causing the issue you have.
CWA ISE AuthZ Policy Set.jpg
[cid:image001.jpg@01D78B1D.3D3E8960]

From the RA trace:
2021/08/06 15:23:46.036756 {wncd_x_R0-0}{1}: [radius] [24995]: (info): RADIUS: Cisco AVpair [1] 32 "cts:security-group-tag=0006-00"
.
.
2021/08/06 15:23:46.036773 {wncd_x_R0-0}{1}: [radius] [24995]: (info): RADIUS: Airespace-ACL-Name [6] 21 "DENY_GUEST_INTERNAL"
.
.
2021/08/06 15:23:46.040199 {wncd_x_R0-0}{1}: [aaa-attr-inf] [24995]: (info): [ Applied attribute : security-group-tag 0 "0006-00" ]
.
.
2021/08/06 15:23:46.040201 {wncd_x_R0-0}{1}: [aaa-attr-inf] [24995]: (info): [ Applied attribute : bsn-acl-name 0 "DENY_GUEST_INTERNAL" ]

Good morning Grendizer,

First, thank you for your continued help. I really appreciate you taking the time to help me try and solve this issue.

Secondly, it was the ACL. We're going to look into the best practices for deploying the VRF's design, but as soon as I removed the ACL... it worked.

Thank you!

I'm glad I was able to help,
For the Guest traffic, you have other option using "Dual Home" (Multi-chassis Link Aggregation Group "Multi LAGs"). Multi LAG is supported on 17.2.1 and later and provides the capability to connect multiple uplinks from controller to separate uplink switches to isolate Guest traffic on completely different switch/network from Enterprise traffic. Each LAG must be connected to a single switch (VSS and vPC are example of logical single switch too so you don't have to use single switch without redundancy) and different VLANs must be assigned to different LAGs. Just an option for your deployment. Example below for 9800-40 but it's the same concept for 9800-80.
Multi Lag.jpg

Review Cisco Networking for a $25 gift card