04-30-2023 09:05 PM - edited 04-30-2023 10:02 PM
Hi Cisco Experts,
We have a WLC : 9800-CL 17.3.6 + C9120AXI APs
1. 2x WLANs, both are in untrusted zone
WLAN1: Staff , 802.1x (AD users)
WLAN2: Guest, WPA2, and WebAuthentication (L3 consent, captive portal)
2. They use the same Access Policy to be in the same VLAN, and use the same DHCP pool.
The issue is that: When the user are connecting to Staff WLAN by an android phone, sometimes it prompts "no internet access", just like a user connected to Guest WLAN without clicking "consent" in the Captive Portal.
I can find this in Radioactive Trace
2023/05/01 11:29:28.007558 {wncd_x_R0-0}{1}: [client-orch-state] [21445]: (note): MAC: 6083.34b8.1b91 Client state transition: S_CO_IP_LEARN_IN_PROGRESS -> S_CO_RUN
but actually it already got an IP address.
Is it possible:
1. Stuck by Web Authencitation? if the phone connected to Guest WLAN before, but the user didn't "Consent" in the portal? ( Thinking Staff and Guest in the same broadcast domain/VLAN )
2. I try to set a static IP address on the Android phone, it still has the issue.
3. It doesn't occur all the time, sometimes it works good, and you can switch between Staff and Guest without any issue: (1) connect to Staff with 802.1x authentication successfully and access internet .(2)connect to Guest with password, and then consent on the portal to access internet. just like the issue never occurred.... And at the time I can't replicate the issue.
The above tests are all on Android phones, iPhone seems no issue.
Any suggestions to replicate the issue and troubleshooting?
Thanks in advance.
04-30-2023 09:48 PM
@117222400 at first , my suggestion is to sperate these WLANs to different VLANs in different IP ranges. its not recommended to keep Guest VLAN and Internal VLAN in same subnet.
04-30-2023 10:10 PM - edited 04-30-2023 10:45 PM
Hi Kasun,
Thanks very much for your reply.
Do you mean that it is not supported to put different authentication method in the same VLAN ? But why the system allow it to configure so?
Sometimes it works very good, and I can't replicate the issue... Several days later the issue occurred, and I don't know how to make it working...that's very strange..
P.S. Just now I run Ping 8.8.8.8 from an issued Android phone, and the first 2 packets were lost, then it got through... the WIFI was working!! I am still watching what time later the issue will occur again
Thanks again.
Best regards
04-30-2023 11:58 PM
>...But why the system allow it to configure so?
- Qualifying good configuring practices can be asserted when using the CLI command show tech wireless ; have the output analyzed with : https://cway.cisco.com/wireless-config-analyzer/
Checkout all advisories ,
M.
05-01-2023 12:22 AM - edited 05-01-2023 12:23 AM
@117222400 its ok to share same VLAN by 2 WLANs. its supported. my intention is about recommendation because its about Guest VLAN (which need to be separate in all ways) and internal VLAN.
debug the client and analyze the issue with this tool
https://cway.cisco.com/wireless-debug-analyzer/
05-01-2023 10:05 AM
Agreed that it should work but as Kasun said normally you'd want guest to be kept completely separate from staff. That said 17.3 has had a lot of bugs and is approaching end of life https://www.cisco.com/c/en/us/products/collateral/ios-nx-os-software/ios-xe-17/ios-xe-17-3-x-eol.html (no more routine bug fixes already) so you should really be considering migrating to 17.6 or 17.9 - currently 17.6.5 and 17.9.3. Also refer to TAC recommended versions below.
05-03-2023 05:30 PM - edited 05-03-2023 09:32 PM
Hi Kasun and Rich
Thanks very much for your reply.
My company's IT management is very strict and I couldn't just simply advise to update the IOS or seperate the WLAN into different VLANs as it might cause new issues which might impact the whole company's internet access. Anyway in current condition, all other WLANs are working good, and iPhones are working good in the above mentioned WLANs. So I think I need to find out the root cause then I can advise some changes.
I suspect the web authentication blocked the internet access, as
(1)the phones can pass the L2 authentication and get connected, and get DHCP IP address. Usually that is enough to access internet.(DNS is set to 8.8.8.8 in the DHCP pool) (iPhone works good so no firewall issue)
(2)WebAuthentication is L3 authentication, in my opinion, it can only take IP address/MAC address as conditions to judge whether it "Agreed" the Captive Portal or not. In this case, the 2 WLANs in the same VLAN, so the phone will get a same IP address.
Will WebAuthentication distinguish the L2 authentication (802.1x or WPA2) as well ? (if not, then might both 802.1x and WPAs will be impacted).
(3) Once I pinged 8.8.8.8 and got through, the issue disappear and I can forget and rejoin it. But serval days laters, the issue happened again. That's why I suspect it was stuck on authentication.
Is it possible to collect Web Authentication logs ? if yes, then I can check whether the 802.1x WLAN is impacted or not.
Thanks again.
05-04-2023 02:38 AM
> Is it possible to collect Web Authentication logs ?
Yes, enable radioactive trace for the device MAC address. Remember it may use a different MAC address for each WLAN so you'd need to add both.
05-04-2023 09:25 PM
Hi Rich,
1. Thanks very much for your advice, and I used the analyser to get the below result, it seems all working.
2. I used WireShark to capture packets,
(1)on the issue laptop: it can't receive DNS query response from 8.8.8.8. I can see the query traffic was allowed on the firewall to internet UDP 53. But there isn't response packets captured on the laptop. Also there isn't response traffic on the firewall been monitored (I am not sure whether the firewall monitor UPD response packets or not, as the UDP response traffic to working laptop was not found on the firewall either).
(2)On the working laptop: it query to 8.8.8.8 and get response packet very quickly. The query and response packets can be captured. It shows "internet access".
The difference is working laptop can receive DNS response but non-working one can't.
05-05-2023 04:26 AM - edited 05-05-2023 04:31 AM
That sounds suspiciously similar to the bugs which plagued the wave 2 APs (not supposed to affect 9100 series) - see Leo's list below. Those are now supposed to be fixed in 17.3.6. However there are a number of other critical bugs in 17.3.6 which Cisco have released APSP's for. If you don't already have those APSP installed then you should get them installed as a priority even if you aren't going to upgrade the IOS version yet:
https://software.cisco.com/download/home/286322605/type/286325254/release/17.3.6
If you insist on sticking to 17.3 then note that 17.3.7 is released now with a whole lot more fixes:
https://www.cisco.com/c/en/us/td/docs/wireless/controller/9800/17-3/release-notes/rn-17-3-9800.html#resolved-caveats-for-cisco-ios-xe-amsterdam-17.3.7
All the 17.3.6 APSP will be rolled up in 17.3.7.
ps. the wave 2 bugs were related to MU-MIMO - it might be worth testing by trying to disable MU-MIMO and see whether that cures your problem. Notes at CSCwa73245 : Bug Search Tool (cisco.com) (as I said these were not supposed to affect 9100 series but sometimes there's overlap in the code anyway). This is just an idea - maybe better to open a TAC case and let them investigate further.
05-05-2023 05:46 AM - edited 05-05-2023 05:52 AM
Thanks very much for your reply, and I will look into the post carefully.
By the way, just as mentioned, on today's issued laptop, I also used ping 8.8.8.8 to solve it temporarily. While I can't ask the end user to do so each time when the issue occurred.
After ICMP to 8.8.8.8 failed twice ( both are "no response") , the DNS response packets were captured soon, and it is all working... ping, domain name resolve, forget and reconnect wifi.. the issue was gone!
Right now, I think I need to wait some days later until it happened again..
05-16-2023 05:53 AM
Today I find that the DHCP Pool settings in the issued VLAN hasn't enabled the "DNS proxy", but it is suggested to do so in the guide https://www.cisco.com/c/en/us/td/docs/wireless/controller/9800/17-3/config-guide/b_wl_17_3_cg/m_dhcp_wlan_9800.html , as below:
I can't find any article or discussion about this selection.
Considering we have seen so many DNS response packets lost, I suspect it is the root cause of the issue.
05-16-2023 06:31 AM
Better still don't use the WLC as a DHCP server.
As you are refer to https://www.cisco.com/c/en/us/products/collateral/wireless/catalyst-9800-series-wireless-controllers/guide-c07-743627.html#DHCPproxy
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide