cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2507
Views
12
Helpful
5
Replies

9800-L-C Dirty VLAN + SVI + IP helper = DHCP + Traffic breakdown

Gehrig_W
Level 1
Level 1

Hello Cisco WLAN Community,

we are struggling here using a new 9800-L-C HA-Anchor WLC running 17.3.4 for guests in our hospital. The Foreign WLCs are 2 5520-WLcs and 1 9800-80.

From time to time we are running into "Dirty-VLANs"-problem causing complete DHCP and traffic break downs.

During peek-times we see around 3000+ users on this box. But suddenly, the whole traffic can stop immediately and Clients are lost on DHCP Discover phase. At this moment, we see a lot of WLAN clients in "IP LEarn" state on this Single-Point-of Failure.

We can clearly see from Wireshark-Traces done on the next switch that DHCP is working and answers are coming from DHCP-Server. But the traces on the 9800-L-C only show DHCP Discover packets ?!?

At the moment we recover from this by rebooting the 9800-L-C.

Case is already open with Cisco TAC SR 694582039 Helpdesk#14257100

The 9800-L-C Guest solution is constructed from two 9800-L-C running in HA-mode.

In the meantime I found out, that this desaster might be created by the "Dirty-VLAN"-functionality on this box!! According to config-guide, this function appearently checks the DHCP-packet flow continously when SVI-Layer-3 interface + iphelper is active. The algorithm blocks DHCP traffic for 30 minutes when it discovers that DHCP-clients appear to be unable to receive IP-addresses from DHCP. The blockade hits not only DHCP-discovers but also DHCP-renewals, which causes in second step complete communication breakdown, in cause of continously discovering DHCP-client problems.

In our case the 9800-L-C shows thousands of Dirty-counters and is blocking DHCP continously for hours !!!

Her the output from the failure situation in two of or Guest VLANs after several hours of complete outage

WLC-9800-Guest#show wireless vlan details

Vlan inforamation:   <-- Here is a typo be the way

-------------------------------------------------------------------

DirtyTime displays time remaining for dirty vlan to become non-dirty

Dirty-Counter displays the number of times a vlan has become dirty

 

-------------------------------------------------------------------

Process          Vlan     Dirty    Dirty-Counter   DirtyTime(mm:ss)

-------------------------------------------------------------------

0                760         Yes        1206          28:44       <- During this time, no DHCP-traffic possible !!!

0                763         Yes        1186          29:57       <- During this time, no DHCP-traffic possible !!!

0                764         No         18            0

0                4080        No         0             0

 

Unfortunately I havent' found a command neither in GUI nor in CLi to deactivate "Dirty-VLAN"! It should be better named "Dirty-DHCP" in my eyes. Apparently the only workaround to overcome this, is to deactivate the Layer-3-SVi-Interface and the IP-helper on the 9800-L-C. This forces the WLC to bridge the DHCP-Broadcasts from clients instead of taking over an send unicast from the IP-adress of its own L3-interface towards the DHCP-Server. It looks like in that case, the "Dirty-VLAN"-algorithm is deactivated and cannot block DHCP-traffic by error anymore!

Now I understand "recommendations" from local Cisco-experts that it is "best practice" to get rid of "SVI-iphelper" on the 9800-platform and install IP-helper on the next Layer-3-instance, for example next router or Firewall instead.

Therefore I would like to know:

Is is possible to deactivate "Dirty-VLAN" or better "Dirty-DHCP" ?

Is my assumption correct that "Dirty-VLAN"-algorithm is based on SVI-Layer-3-IP-interface + iphelper on the 9800-WLC ?

Is Cisco hiding the inacceptable DHCP-blocking-problem caused by "Dirty-VLAN"-algorithm from customers by simply urging them to not use Layer-3-SVI-interfaces on the 9800-platform, which apparently causes this DHCP and traffic blockade?

Are there any known bugs open in conjunction with "Dirty-VLAN" and inacceptable DHCP blockade caused by 9800-WLCs?

I propose also to rename "Dirty-VLAN" into "Dirty-DHCP", which would better reflect the real bad purpose and outcome of this. Especially new users on the 9800-WLC would then better understand the real purpose of this algorithm.

Kind regards

Wini

 

 

 

 

 

 

 

 

 

5 Replies 5

marce1000
VIP
VIP

 

 - Review these bug reports : https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvw69665 , and https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvu71930 , for the 9800-L-C have a checkup of it's configuration with the CLI command : show  tech   wireless , have the output analyzed by  https://cway.cisco.com/tools/WirelessAnalyzer/  , please note do not use classical show tech-support (short version) , use the command denoted in green for Wireless Analyzer.               Checkout all advisories especially concerning the use of SVI's

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Arshad Safrulla
VIP Alumni
VIP Alumni

Hi Gehrig_W,

Get rid of the Layer 3 SVIs from the 9800 WLC completely. You don't need that unless you have mdns, I assume that you have a device which can support ip helper. You just need to allow the VLANs in the trunk connecting to the upstream device. This is the Cisco Validated design and as soon as you highlight your issue this will be one of the recommendations any Expert will provide.

Another option is get rid of VLAN group and use a fat subnet with one VLAN. You need to conduct a feasibility study and review your requirements before doing this change. I am not sure whether this will be accepted by Security teams as I always had our auditors' flag this as an issue, where I see none when you have enabled P2P block.

  • Is is possible to deactivate "Dirty-VLAN" or better "Dirty-DHCP"? - No
  • Is my assumption correct that "Dirty-VLAN"-algorithm is based on SVI-Layer-3-IP-interface + iphelper on the 9800-WLC ? No, Dirty VLANs are marked when you have VLAN Groups. It is not mandatory to have L3 SVIs and IP helper. 
  • Is Cisco hiding the inacceptable DHCP-blocking-problem caused by "Dirty-VLAN"-algorithm from customers by simply urging them to not use Layer-3-SVI-interfaces on the 9800-platform, which apparently causes this DHCP and traffic blockade? As mentioned before Dirty VLAN marking will happen as soon as you configure a VLAN group. WLC will not care whether there is any L3 SVIs or IP helper configured. 
  • Are there any known bugs open in conjunction with "Dirty-VLAN" and inacceptable DHCP blockade caused by 9800-WLCs? Reach out to TAC or search in the Cisco bug tracking tool. 
  • I propose also to rename "Dirty-VLAN" into "Dirty-DHCP", which would better reflect the real bad purpose and outcome of this. Especially new users on the 9800-WLC would then better understand the real purpose of this algorithm. Please reach out to your local Cisco team and submit wish a feature or you can even ask TAC to open a bug for this request.

 

Hello Arshad,

thank You for Your good structured answers. Looking at our 9800-L-C-box I still need some clarification regarding "Dirty VLAN" and VLAN-groups.

The configuration of this Guest-WLC consists mainly of 3 VLANs. Two of them are grouped for eduroam guests and eduroam employes, while the single 764 is used for @BayernWLAN-users, a free Internet-access-service of bavarian government.

60 eduroam_gast active
763 eduroam_mitarbeiter active
764 bayernwlan active

vlan group eduVLANs : 760, 763

The output of "show wireless vlan detail" shows an increase of the ungrouped VLAN 764 during the past days:

Process Vlan Dirty Dirty-Counter DirtyTime(mm:ss)
-------------------------------------------------------------------
0 760 No 0 0
0 763 No 0 0
0 764 No 52 0
0 4080 No 0 0

Now my questions to You:

1. Why do we see an icrease of the Dirty VLAN counter in a VLAN which is not grouped ?

2. What does the counter "52" mean after the last box reload, which happend around 2 weeks ago ?

Have our VLAN 764 users faced 52 times x 30 Miuntes = 1560 minutes = 26 hours DHCP-traffic-outage and blocking

during the last 2 weeks ?

3. Where can I find more information about anongoing active Dirty VLAN situation or already happened

DHPC-traffic-blockades in the recetn past in the logs of the 9800-L-C ?

Who knows a good troubleshooting command to record this malbehaviour ?

Thank You for any good tips.

Greetings from Frankonia

Wini

 

 

 

 

 

 

 

 

NickyJKernow
Level 1
Level 1

Hi,

Did you ever manage to resolve this issue?
We are experiencing the same, and unfortunately TAC have so far been unable to solve.

Thanks

We started with 2 big /21's and added more over time.  But got exhaustions because the client is getting assigned a vlan, to get dhcp from.  instead of a broadcast to all the subnets, and picking one.
Instead it picks a vlan via the dirty-dhcp menchanism.., which is possibly different than the client is requesting, since they don't know the vlan is different.  So there is something with that mechanism that is broken.  Our entire wireless system melted last week.  Had worked with TAC on multiple occasions for this and ISE integration of a new guest portal.  None of them ever suggested looking for anything called Dirty.
today, I will monitor the stats.  Now that I have something to look at with the command... it would be nice if there was a way to clear it....  or even set it.  Make the vlan dirty.   We had other situations with the server scope was 100% full, yet the controller kept assigning that exhausted vlan, over and over.  There is some hashing that is happening that creates a client ID.  That ID seems to play a role in where you go, or how well your request goes out.  Funny enough, when we force the machine to drop its remembered lease, things are very smooth.  (if there are leases available, and the controller hasn't started a client blockade)
No reason why 3,000 clients would need 8,000 IP's... yet, it happens.  Some of that is the mac address obfuscation that happens by default on the mobile devices.

 

 

Review Cisco Networking for a $25 gift card