Hi Rasika

Dominic Stalder (old profile) · ‎08-18-2016

Hi guys

we have a simple WGB szenario with a client behind (no VLANs on WGB side) and trying to connect to a Cisco WLC 8.3 controller:

dot11 ssid Gimli
 authentication open eap PEAP 
 authentication network-eap PEAP 
 authentication key-management wpa version 2 cckm
 dot1x credentials PEAP
 dot1x eap profile PEAP
 infrastructure-ssid
!
!
!
eap profile PEAP
 method peap
!
...
!
dot1x credentials PEAP
 username gimli-bridge120
 password 7 01300F175804575D72
 pki-trustpoint XXX
!
interface Dot11Radio0
 no ip address
 !
 encryption mode ciphers aes-ccm 
 !
 ssid Gimli
 !
 antenna gain 0
 station-role workgroup-bridge
 mobile station scan 2412 2437 2462
 mobile station ignore neighbor-list
 mobile station period 20 threshold 70
 bridge-group 1
 bridge-group 1 spanning-disabled
!
interface GigabitEthernet0
 no ip address
 duplex auto
 speed auto
 bridge-group 1
 bridge-group 1 spanning-disabled
!
interface BVI1
 mac-address f0f7.5560.2cf6
 ip address dhcp client-id Dot11Radio0
!
bridge 1 route ip
!

The problem occurs, when the client wants to re-bind its DHCP address or tries to get a new one:

*iappSocketTask: Aug 18 17:17:21.551: 3c:07:54:61:90:23 Ignoring wired client add as the WGB is not in RUN state.
*iappSocketTask: Aug 18 17:17:22.622: 3c:07:54:61:90:23 Ignoring wired client add as the WGB is not in RUN state.

*DHCP Socket Task: Aug 18 17:18:36.023: 3c:07:54:61:90:23 DHCP received op BOOTREQUEST (1) (len 308,vlan 49, port 1, encap 0xec03, xid 0xea053280)
*DHCP Socket Task: Aug 18 17:18:36.023: 3c:07:54:61:90:23 DHCP dropping packet (no mscb) found - (giaddr 0.0.0.0, pktInfo->srcPort 68, op: 'BOOTREQUEST')

Did someone ever see this problem too and is there a workaround?

Thanks in advance and best regards

Dominic

Rasika Nayanajith · ‎08-18-2016

Hi Dominic,

By looking at my old notes (see below), "config wgb vlan enable" command seems required on WLC. Try configure this on your WLC & see

https://mrncciew.com/2013/04/30/wgb-with-capwap/

HTH

Rasika

*** Pls rate all useful responses ***
----------------------------------------------------------------
Need to learn WiFi, Please check out these sites
www.wifitraining.com/about-wifi/ - WiFi Training by WiFi Experts
www.my80211.com - George's Blog
www.mrncciew.com - My Blog
----------------------------------------------------------------

Dominic Stalder (old profile) · ‎08-18-2016

Hi Rasika

Thanks, but I already tried that before; saw it in your blog today. By the way, great posts on mrncciew, follow them a lot.

maybe I have to test it again.

Regards

Dominic

skneip4891 · ‎09-28-2016

Dominic,

I was reading through your post, and I am experiencing a very similar issue with our WGBs since we upgraded to the 8.X version of code. We have a Cisco 5508 WLC running version 8.0.133.0. We use 3502 APs in flex connect with local switching. We use 1231s, 1602i, and 1702i for WGBs in the field. The WGBs worked as normal on code version 7.3.112, however we had to upgrade to the 8.X platform for compatibility with ISE for our non WGB wireless networks.

Since we upgraded - the issue we are having is devices behind the WGB lose connection randomly. If multiple device are connected to the WGB, they do not loose connectivity at the same time. We are always able to reach the WGB, but not the client or clients behind it. When logged into client behind the WGB, client shows 169.x.x.x address or its acquired DHCP address. We try to do a dhcp release / renew it does not work. We can ping the bridge but not the gateway from the devices behind the WGB. We try to reboot the client device, still can not get past the bridge. We reboot the bridge and we still can not get out. This problem occurs on all models of WGBs that we use.

For a workaround to restore connectivity, we remove the association from the bridge to the WAP from the WLC, and connectivity to the client will restore (until the client drops again - which we have seen devices stay online for 1-12 days. The timing of the client dropping is completely random)

We have a TAC case open with Cisco, and what was determined through debugs and packet captures is that the WGB is forwarding its bridging table to the wireless access points as normal via IAAP, the DOT11 radio interface on the WAP is receiving the IAAP messages, however after a period of time information is being "black holed" and not being sent back to the switch where the WAP is connected (this can be seen by the MAC address table on the switch where the WAP resides)

For troubleshooting, TAC had us try these code Versions: 8.0.121, 8.0.133, 7.4.150, 8.2.121, and 8.3.102 all with the same results. We also tried other things such as adding static ARP entrys on our equipment, disabling our session timeout on the SSID, disabling CCKM, and adding ARP-Caching to our test controller -- All with the same failed result.

Finally, TAC opened bug CSCvb46216 for us, and we are waiting to here back from developers.

Does this sound like the issue you are experiencing as well? Did you have any success finding a work around?

Dominic Stalder (old profile) · ‎09-29-2016

Hey skneip4891

thanks for your precise information about your problem and bug you are hitting. First of all, we do not yet have a solution or workaround for our problem, first of all I think we are hitting multiple problems / bugs and on the other hand, our onsite engineer is not available at the moment.

But in the meantime we made some more tests and theses are the results for the moment:

We are using Cisco WLC 8.0.115.0 and all the access points are running in LOCAL mode, we don't have FlexConnect in this setup
I think the first problem with the client not beeing accessible behind the WGB is solved with enabling the passive client feature
After we enabled passive client, we had to recognize, that the WGB itself has problems while moving around, let's say something like a roaming issue. The WGB sends 802.11 probe requests, gets multiple 802.11 probe responses from nearby access points but completly ignores them and tries to connect to an access point far, far away, not even sending a 802.11 probe response! See the attached Wireshark printscreen nearby the WGB (Wireshark_Probe_Responses.png).

Using the Cisco Bug Search tool we found the following bug: CSCut07170 (https://bst.cloudapps.cisco.com/bugsearch/bug/CSCut07170)

Even we are using Cisco Aironet 1702I as WGB (should not be affected when reading the bug description) and using the latest Cisco IOS version 15.3(3)JD (where the bug should be solved), I think we hit this bug.
We have an open TAC case and the engineer internally escalated the case -> we will have a TAC session the comming week, will keep you updated

As you can see, seems to be a different problem that you are experiencing, but still would be interessted to see if you find a solution for your problem too...

Best regards

Dominic

davidstillert · ‎01-16-2017

Did you ever find a solution to this problem? I have 1142 WGBs that are exhibiting the same behavior. A reboot of the WGB is the only way to bring the client devices back to life.

TAC has been working on it for some time now without any success.

Dominic Stalder (old profile) · ‎01-17-2017

Hi David

not yet, but at the moment my colleagues are testing an escalation software build of the aironet software created by the wireless business unit. We have an open TAC case too and the located a bug inside the Aironet software in version 153-3.JD, so we are testing a pre-version of 153-3.JD2.

If I know more, I can get back to you or you try to get this build too from Cisco TAC?

Best regards

Dominic

Matthias · ‎01-25-2017

Hi David,

I'm a workmate of Dominic.

Do you have the "roaming" or the DHCP problem?

Best regards

Matthias

WiFi Trainers · ‎08-18-2016

Hi Dominic,

The error log that you have put "*iappSocketTask: Aug 18 17:17:21.551: 3c:07:54:61:90:23 Ignoring wired client add as the WGB is not in RUN state" is quite interesting. It shows that the WGB itself is not in RUN state which can happen when the WGB is roaming for ex, and any client request received till it reaches RUN state will be ignored.

Can you please give more information on the issue:

-Does this happen each time the client tries to do re-DHCP or randomly?

-Have you observed what is the state of the WGB on the WLC when the issue is seen? If not it will be a good idea to check this.

I would also suggest running simultaneous debugs for both the wired client and WGB when the issue is seen so that we get the complete picture. Please also let me know about the exact version of WLC and model of WGB.

Best Regards,

www.wifitrainers.com

Learn from the Best To be the Best

Change the way you look at wireless client connectivity forever by registering to watch this free webinar and also stand a chance to win our Wireless starter kit worth $8000 for free!!

Dominic Stalder (old profile) · ‎08-18-2016

Hi

1. It does not happen everytime, if I just run a ping and the client initiates a DHCP renew by itself, everything works fine. Also when i force roaming by disabling the actual AP the WGB is connected, AES CCKM comes into play and the connection stays up.

I can reproduce the problem in the lab by just changing some parameters or just applying the same settings under WLANs in WLC -> deauth sent to the WGB and clients? The WGB can reconnect, the client not.

2. I did a simoultaneous debug in one CLI session and saw, that the WGB is running through the hole 802.1x process while there are already DHCP requests from the wired client -> IAAP error. When the WGB is in RUN state, the MSCB error starts to pop up. I will do a debug when I am back in the office or at customer site to show the details.

WLC: 8.3

WGB: Aironet 1042 with latest (I think 15.3.3JD IOS)

What would be great to know, how to force the MSCB to be recognized in the correct subnet, any idea?

Regards

Dominic

WiFi Trainers · ‎08-18-2016

Hi Dominic,

When we disable the AP this does not simulate exact roaming as the client entry gets deleted from the WLC as well. This will not be noticeable as the WGB will try to initiate the new connection immediately. In this case complete dot1x authentication will always happen.

What you have mentioned in point two is important. In a scenario like this when the WGB is not in RUN state, wired client info will still be sent in IAPP messages causing it to get blocked with the error earlier reported. In your setup does the wired client stay unable to get connected for a long period of time? How do you ensure it gets the IP address and connects. I would like to have a look at the debugs as well when the issue occurs (not by disabling the AP as the conditions change).

I did not understand what you mean by MSCB to recognize the correct subnet. MSCB is the internal table on the WLC. Does not have anything to with subnet.

Best Regards,

www.wifitrainers.com

Learn from the Best To be the Best!

Change the way you look at wireless client connectivity forever by registering to watch this free webinar and also stand a chance to win our Wireless starter kit worth $8000 for free!!

Dominic Stalder (old profile) · ‎08-18-2016

Hi

When we disable the AP this does not simulate exact roaming as the client entry gets deleted from the WLC as well. This will not be noticeable as the WGB will try to initiate the new connection immediately. In this case complete dot1x authentication will always happen.

You are right, the client entry gets deleted on the WLC, but that's the way to reproduce it. If that happens, the WGB will go through the hole dot1x process without any problem and will reconnect as fast as possible. But as I said, the client is not able to regain connection. Even when the WGB gets deleted from the client table, the wired client behind the WGB should somehow be able to reconnect.

In your setup does the wired client stay unable to get connected for a long period of time? How do you ensure it gets the IP address and connects.

The client will stay unconnected, I is not able to reconnect by itself; I need to shut / no shut the Interface dot 0 and re-plug the network cable to the client manually. The same happens with the medical stuff at customer site, they have to restart the hole medical device before it is able to get a connection again - even when the WGB is already in RUN state. That is the point, that seems very strange / problematic to me.

I would like to have a look at the debugs as well when the issue occurs (not by disabling the AP as the conditions change).

Will upload the debugs asap, as soon as someone is in the office restarting the lab WGB ;-)

I did not understand what you mean by MSCB to recognize the correct subnet. MSCB is the internal table on the WLC. Does not have anything to with subnet.

Sorry was a little early in the morning (3am) ;-) What I meant is, that the DHCP request gets to the WLC, but it has no MSCB entry and that's why the WLC will not answer it. Even when I disable the DHCP proxy feature, there is no difference.

By the way I tried to enable "config wgb vlan enable", but this did not help.

Best regards

Dominic

WiFi Trainers · ‎08-18-2016

Hi Dominic,

Precise clarification. Appreciate it :).

You are right. The wired client should get connected. I am suspecting a buggy behavior to be honest. I would recommend opening a TAC case as well simultaneously while we look into the debugs. Will save us time if they can identify something.

I found this someone which looks similar:

https://bst.cloudapps.cisco.com/bugsearch/bug/cscux90060

It is supposed to be fixed in 8.3.102.0. Not sure what code of 8.3 you are running. But still worthwhile getting it investigated as there is not much info in the description.

Best Regards,

www.wifitrainers.com

Learn from the Best To be the Best!

Change the way you look at wireless client connectivity forever by registering to watch this free webinar and also stand a chance to win our Wireless starter kit worth $8000 for free!!

Dominic Stalder (old profile) · ‎08-18-2016

Hi

I am suspecting a buggy behavior to be honest. I would recommend opening a TAC case as well simultaneously while we look into the debugs. Will save us time if they can identify something.

Absolutely right, I am just on the way (train) to customer site to get all the logs there and to open the TAC case asap. I am pretty sure, there is a bug somewhere, because the configs look correct.

The WLC version we are using (in the lab) is 8.3.102.0, but unfortunately I am not able to see the bug details for cscux90060, I do not have enough permissions. But found some other bugs related to the same problem before, for example this one:

CSCum80836 - WLC error "DHCP dropping packet (no mscb) found" for client in RUN state

But in the details it is stated, that this was only on 7.4 train and I do not find it in any other release notes for Cisco WLC 8.x

Will keep you updated, the debugs are coming soon.

Best regards and thanks a lot for your help so far!

Dominic

Dominic Stalder (old profile) · ‎08-18-2016

Here we go with the debugs:

debug client f0:f7:55:60:2c:f6 3c:07:54:61:90:23
debug dhcp packet enable

f0:f7:55:60:2c:f6 --> Cisco Aironet 1040 (WGB)

3c:07:54:61:90:23 --> Wired Client behind WGB

1. Restarted Cisco WGB

2. WGB gets authenticated (dot1)

3. While the WGB gets associated / authenticated, the IAAP already kicks in:

*iappSocketTask: Aug 19 07:16:11.960: 3c:07:54:61:90:23 Ignoring wired client add as the WGB is not in RUN state.

4. The WGB gets to the RUN state:

*DHCP Socket Task: Aug 19 07:16:15.077: f0:f7:55:60:2c:f6 192.168.49.207 DHCP_REQD (7) Change state to RUN (20) last state DHCP_REQD (7)

5. The wired client behind tries to get an IP address too:

*DHCP Socket Task: Aug 19 07:16:48.506: 3c:07:54:61:90:23 DHCP received op BOOTREQUEST (1) (len 308,vlan 49, port 1, encap 0xec03, xid 0x2f7f9e86)
*DHCP Socket Task: Aug 19 07:16:48.506: 3c:07:54:61:90:23 DHCP dropping packet (no mscb) found - (giaddr 0.0.0.0, pktInfo->srcPort 68, op: 'BOOTREQUEST')

See attachment debug_client_wgb1.txt

Thanks and best regards

Dominic

Client behind WGB not allowed to get DHCP address