cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1637
Views
5
Helpful
12
Replies

Some WPA2-PSK Devices Not Connecting or Taking Hours to Connect

jcdrew
Level 1
Level 1

Hoping to get some help here... we are baffled.

There are 400+ commercial IoT devices (all the same) on the wireless network across several campuses.  These devices have always connected very quickly and have been very stable.

Internet shortages, chip changes, etc... The manufacturer has had to change radios to a new generation from their supplier.  The supplier is one of the large global suppliers based in the US with a very good reputation (not mentioning names here since this is a new radio and there's a chance that there's a bug in the hardware or stack, though that is thought to be unlikely).

The problem is that the new devices can not connect to the network 99.9% of the time and reach the internet cloud service used.  Here's what we know and why we are stumped (everyone).  Hopefully, there's some excellent insight to be had from the community.

Network Basics-
- 30+ campuses with thousands of connections
- Wi-Fi policy is WPA2 Only
- large static block of 850+ devices
- outbound ports are all properly configured to allow the devices to connect to their cloud application
- same SSID/credentials deployed at all APs at all campuses
- 400+ of the older devices connected across 12+ campuses (circa 2022 mfg with circa 2014 radios)

Previous Device & Settings
- stable with 400+ devices installed
- b/g radio, 2.4, WPA or WPA2-PSK capable (WPA2-PSK used for this network)
- IPv4 static addressing, subnet mask, gateway, and DNS all verified to be correct
- SSID name, WPA2-PSK, credentials all checked 

NEW Device Settings (circa 2023 mfg with circa 2022 radios)
- b/g/n radio, 2.4, WPA or WPA2-PSK, or WPA/WPA2-PSK MIXED capable (WPA2-PSK used for this network)
- IPv4 static addressing, subnet mask, gateway, and DNS all verified to be correct
- SSID name, WPA2-PSK, credentials all checked

NEW Device Connection Issues 
- Devices will usually not connect to the cloud service (2 campuses attempted with multiple devices) 
- Have had 2 devices connect on one campus after the devices sat powered up for 10+ hours.  When the device connects, the cloud service also authenticates and the device is stable.  Additional devices installed at the same campus never complete the connection.

- If the device is changed to DHCP, the device connects to the network and to the cloud service, but only stays connected for 10-15 minutes, then the connection to the cloud disappears.  Further analysis seems to indicate a DHCP scope overlap with static addressing (thinking that's a different issue?).
- Changing the device to a different static address within the same range of another campus (and physically moved to that campus) does not result in a successful connection.
- IT manager states he can see the device connected to the AP in the room, but no data is being transmitted and the device cannot be pinged. 
- Placing a Windows laptop onto the network, using Wi-Fi only, with the same static IP addressing as the new device works like a champ. Instant connectivity to the network and to the internet.

Device manufacturer has verified many different networks/routers/APs to be operational with the new device hardware, but does not have access to full Cisco environment for testing.  Tests have been performed in both DHCP and static addressing, with the exact same static addressing, subnet mask, gateway and router schema.

So..
1) What could be different in the way the Cisco APs handle the connections between these two devices?
2) If this is a radio or radio stack issue, since the radios are proven to work with many other APs from various manufacturers, where should the investigation start?

Thanks in advance for any help!

12 Replies 12

Hi

  I would like to see logs. If you could enable debug while try to conect those devices and share here the logs, that would be great.

But, if I were the network admin over there, this information here would be my starting  point

"- Placing a Windows laptop onto the network, using Wi-Fi only, with the same static IP addressing as the new device works like a champ. Instant connectivity to the network and to the internet."

 And then, the question to be made is not this one:

 "What could be different in the way the Cisco APs handle the connections between these two devices?"

but tihs:

 "What could be different in the way this two device handle the connections with cisco AP?"

If the laptop connect just fine, the suspicious must fall over this device and not on the network.

 

jcdrew
Level 1
Level 1

Flavio - I agree with your comment. I probably asked the wrong question. It's just bizarre that this connection issue with the new devices only seems to happen when they try to connect to this Cisco network.   All other networks tested connect very quickly.

Working to get a sniffer on the network to try to see full handshake info on wireshark, and compare the 'old' and the 'new'. 

Will report what is found.

Rich R
VIP
VIP

1. What WLC model, what version of software, what AP model(s)?

2. Run a debug for the client device MAC address on the WLC.  If it's 9800 WLC then radioactive trace for the client MAC.  Run the results through https://cway.cisco.com/wireless-debug-analyzer/ 

3. Check the WLC config with https://cway.cisco.com/wireless-config-analyzer/ - for AireOS use output of "show run-config", for 9800 use output of "show tech wireless".

4. A problem like this might be related to what features you have enabled/disabled on the WLAN so start looking at those - things like 11k and 11v BSS Transition Support for example (amongst others).

Can't the vendor provide you with any debug logs from the device?

I think you're on the right track here.  I had a bunch of medical devices that would continually drop.  For some reason, Aironet IE was on for that SSID - turned it off and all of my problems went away!

Leo Laohoo
Hall of Fame
Hall of Fame

A "new generation" device that only talks 802.11b/g and "n"?

That only means the device is running a very ancient wireless NIC driver. 

Try with open authentication and see if the wireless clients are able to consistently connect.  

Exactly as Leo said ^^^
I had a similar case where customer couldn't get equipment to work and after extensive troubleshooting and the reseller being unhelpful and disinterested I eventually had to tell the customer we couldn't help them and they should look for an alternative piece of equipment.
Many of these vendors tend to look for the oldest, cheapest chipset they can find which often come with 10+ year old driver code reference implementation (released with the original chipset) which they've never updated, so compatibility with modern networks is patchy at best.  That also means likely to be full of security vulnerabilities.

Rich - I get it, and fortunately that's not the case here. The device manufacturer is all over it.  They are puzzled as well, as it doesn't seem to be happening on any other networks.  The manufacturer is working with the silicon provider, setting up sniffing gear to be able to capture packets and see what's different between the 'old' and the 'new' devices.  Again, these are brand new chipsets from a major US supplier (not a cheap Chinese toy device) and there's a potential the issue is in silicon or the new stack in the chipset.

Ok good that you've got engagement with the vendor because they'll need to debug it on the device side.
Over the air captures should help identify what's going wrong.

Leo - Yeah, as strange as it seems, it's not uncommon that even newer IoT devices (bare metal microcontrollers, not devices with an OS) have only 2.4 and b/g/n. Data that gets transferred is minimal and range is often an issue, so 2.4 often performs better.   Open auth has the same result.

The devices can be seen by the APs, just not transmitting data and they can not be pinged.

2.4 gets a bad rap.  It's perfect for these kind of applications.  You are essentially getting three additional channels to work with!

Wondering if anyone found a solution to this. At my office we are getting new Logitech Tap Schedulers. At 1st none of these could connect to the WPA2/AES128 PSK network but if I made a open one it would connect fine. But out of the blue 90% finally connected but we can't get the other 10% to connect. Got Logitech involved but they want to blame my network because those devices connect to other networks just fine. 


@William Foster wrote:
because those devices connect to other networks just fine. 

Yeah, I've heard those lines so many times from various equipment vendors.  

They only stopped singing the same refrain after we have dragged them in and told them to troubleshoot or we are not signing the purchase order. 

Review Cisco Networking for a $25 gift card