perplexing Client/Server Communication Problem on LWAPP Wireless Network

tmoffett · ‎12-10-2006

Hi,

I am troubleshooting a problem in a building that we recently converted to LWAP. There seem to be a few internal web applications that will not finish loading. They seem to work fine on the wired Ethernet network, as well as the old autonomous wireless network.

The LWAPP solution is a pair of 4404-100 controllers connected to a core 6509 switch with dual SUP 720s.

The two EtherChannels, connecting the two controllers, span a pair of Ethernet modules - two links per module.

I have tried changing the MTU on the 6509, just to see if that was the issue with no success.

The controllers are running the latest 4.0 version 179.

Any suggestions would be greatly appreciated!

Thanks!

scottmac · ‎12-10-2006

How far down did you adjust the MTU?

After you adjusted the MTU, did you also reset the client?

There are potentially a couple tunnels-in-tunnels going on, LWAPP sets a tunnel from the controller to the AP, plus if you have L3 roaming up, that's another tunnel, plus SSL for secure pages ... maybe more (Got VPN?)

Go low to start, no higher than 1396 and give it a shot. You should also reset/reboot the client to make sure that the MTU is scratched and re-negotiated.

MTU / fragmentation is almost all it could be.

Good Luck

Scott

tmoffett · ‎12-10-2006

Thanks Scott. There's no L3 roaming going on. Each site is a separate subnet.

Just to provide a bit more information:

There are three global SSIDs on the controllers, one for voice, one for data and the other for guest access.

For each site, there is a site specific VLAN for voice and internal data. One common VLAN exists for guests.

The voice has been given platinum QoS, the internal data has silver.

No SSL, L3 roaming, etc. Only LWAP tunnel to the APs.

I set the MTU down to 1300 bytes. I could see it reflected in the packet captures I took. I did, however, see the don't fragment bit set in traffic coming from the server. Seems as if the client the server's packets and the server never receives them. This happens even when I am the only WLAN client in the building.

Another strange thing that has been happening - there are multiple arp entries in the router for some WLAN clients. Not sure why. I don't think this is related.

Tim M.

scottmac · ‎12-10-2006

Since you are doing traces (Good Thing!), check again to see if perhaps the missing traffic (client ack to the server) is maybe ending up on the native VLAN.

There was a similar problem in the early IOS for the automomous (Aironet) APs and DHCP. The Client would broadcast a DHCP request (on a non-native VLAN), the server would respond (unicast), but the response would be dropped into the Native VLAN as it passed through the AP on the way to the client.

The response would make it through several routers and switches just fine, but the AP would drop it into the Native ... wierd stuff, and I believe it was corrected later (I haven't heard of it happening since 12.3{something}.

I suppose it could be tied to the ARP thing you mentioned ... i.e., the AP doesn't see the client and re-ARPS ... adding a duplicate entry that later confuses the AP ... hard to say.

It might also be interesting to see if the problem with the web server happens on native as well as non-native VLAN/SSIDs.

I'm pretty sure you're seeing a bug, it might be worth a call to the TAC, or, if you have a CCO account, you can do your own bug check. If you don't have a CCO account, they're free (the ones with no software downloads) ..... I think that'll give you access to the "Support Tools" pages (output interpreter and bug tracker).

Check out the traffic stuff and let us know. If I get a chance, I'll tip-toe through the bug tracker and see if I can get a hit.

Good Luck

Scott

tmoffett · ‎12-11-2006

Thanks again. The DHCP issue only seems to rear it's ugly head with the LWAPP installation, not the autonomous. It's definitely strange.

I did a browse through the TAC case collection and found nothing. I will have a Cisco buddy do an internal search as well.

Also seeing lots of disassociation attacks in this site. Not sure that they're legitimate as they seem to happen randomly, throughout the day and night. There is a coexistence of converted 1231 APs and 1131s...

I will keep you in the loop!

Tim

Darren Ramsey · ‎12-11-2006

You might want to check the EtherChannel load balancing method. I ran into strange problems using "port-channel load-balance src-dst-port" on the 6509 connected to 4404 and WISM. Parts of the LWAPP conversation were getting split across the EtherChannel members and the WISM/4404 does not know how to reassemble. The default L3 hash disperses traffic differently than L4 hash. Just a thought.

scottmac · ‎12-12-2006

Is there a chance that there is another Big / Company / Enterprise wireless system in the neighborhood (or, maybe you're using a WLSE on the automomous system?) that is set for rogue detection?

One of the mechanisms for rogue mitigation is to send "disassociate" commands to prevent the rogues from catching a connection.

One of my favorite stories is about a company on the sixth floor of a building. They brought up wireless, enabled rogue detection / mitigation, and shut down other company's wireless systems for probably a mile around or more.

That would explain why the Aironet APs work OK (they're not rogues to the WLSE that supports them)... but the new Airespace system isn't working (because they're new, considered rogues, and are being ECM'd).

Worth a shot ... if you aren't doing it, maybe someone else in your neighborhood is.

Good Luck

Scott

sethgarnar · ‎12-11-2006

I had a problem like this where clients had issues connecting back to local servers. It would load some of the page then hang up and time out. We are using WiSM's, but any server connected to the 6509 with the WS-X6548-GE-TX card had this issue. We got a new WS-X6548-GE-TX card with updated firmware and this solved the problem.

The TAC number for our case was 604179723.

tmoffett · ‎12-11-2006

Thanks guys.

I changed the load balancing on the Etherchannel earlier today with no luck.

I even shut down 3 of the four interfaces to each controller to see if it would help - with no success.

Oddly enough - the MAC wireless clients seem to work just fine.

Both servers in question are 2003 server R2.

I am going to comb through the packet capture I did earlier to see what's going on. It appears that one or more of the server's packets don't seem to reach the client.

I was unable to find information on the TAC case that was shown. Can you post more information on it?

The customer has two 6148s, one older and one newer... The Etherchannels span cards, but I have proven that to not be an issue...

ANY information and suggestions are appreciated!

Tim

tmoffett · ‎12-12-2006

Thanks Seth (I think that's your name?).

That was exactly our problem with a 6148 - old firmware. All servers that were affected were connected to the old module.

Basically, the old module will not transmit frames smaller than 64 Bytes - it just drops them.

Tim

tmoffett · ‎12-12-2006

So, here's the Cisco advisory on the 6748 and 6148 GE modules...

http://www.cisco.com/en/US/products/hw/switches/ps700/products_field_notice09186a0080228f16.shtml

scottmac · ‎12-12-2006

A switch shouldn't pass frames less than 64 bytes, that's the minimum frame size for Ethernet. Less than 64 bytes is considered a Runt.

FWIW

Scott

tmoffett · ‎12-13-2006

Perhaps, in theory.

The reality is that this is the case - seems to rear its ugly head with LWAPP wireless.