08-26-2019 03:11 PM
Hello fellow experts,
We are facing some issues in our Catalyst stack switches that makes them unresponsive for administration. Actually we have some 802.1X configuration against a third party vendor, and sometimes we are getting a bunch of messages of this type:
%DOT1X_SWITCH-5-ERR_VLAN_EQ_VVLAN: Data VLAN <<>> on port <<PORT>> cannot be equivalent to the Voice VLAN AuditSessionID 2
The issue is not on all ports, but in 2 or 3 of them. When it appears the switches logs gets full quickly and the switches started to answer with several seconds of delay, making them innaccesible by VTY or even by SNMP. We think the issue is caused by some wrong configuration in some clients (IP phones) and we are looking further in that way, but we need to stop the blocking behaviour, it appears that the logging speed is so high that the system is not capable of writing all the logs down, and that I/O bottleneck is causing the switch to get stuck.
Is there anyway we can stop that logs writing to the flash, or at least reducing its appereance? Or, better, it is possible to configure some ERR-Disable policy for blocking those ports in the moment it appears the misconfiguration? We have digged the CISCO documentation and the only related stuff we found was "Change either the voice VLAN or the IEEE 802.1x-assigned VLAN on the interface so that they are not the same.", but is not enough to stop the blocking.
Any idea will be appreciated.
Best regards.
Solved! Go to Solution.
09-26-2019 02:15 PM
Hello all, and thank you for your responses. At this moment we have configured some "workaround" and all its working pretty good, but we still have some doubts and we want to shared them with all of you, just in case someone found them usefull.
We checked configuration and procedures and found some race condition that was started by one of our change procedures, like this:
switch(config)#interface GigabitEthernetX/Y/Z switch(config-if)#no switchport voice vlan A switch(config-if)#switchport voice vlan BThe error pops up inmediatly and the switch started to slowdown:
%DOT1X_SWITCH-5-ERR_VLAN_EQ_VVLAN: Data VLAN ^A on port <<PORT>> cannot be equivalent to the Voice VLAN AuditSessionID 2We think that it is due to some coding rule in the IOS that doesn't have enough validation or that is heavy coupled with other data structure and when we launch the "no switchport voice vlan" some variable got empty and some other process got buggy about it (VLAN data its empty, and appears some non-numerical char "^A")
%PARSER-6-WMLRETRY: Write memory lock currently held by pid <<PID>>, automatic retry
For solving this, we just changed the procedure with one of this two sequences:
switch(config)#default interface GigabitEthernetX/Y/Z
switch(config)# interface GigabitEthernetX/Y/Z switch(config-if)# no authentication event server dead action authorize voice switch(config-if)# no switchport voice vlan A switch(config-if)# switchport voice vlan B switch(config-if)# authentication event server dead action authorize voice
And we were happy about it.... BUT, the CISCO TAC were checking the issue too and just told us that all the issue was caused because some bad practice in the port configuration and the solution was increasing the dot1x timeout tx-period to 10 seconds. this proved to work too... but we found that answer pretty empty. But maybe was just that simple. The configuration is:
switch(config)# interface GigabitEthernetX/Y/Z switch(config-if)# dot1x timeout tx-period 10
Is up to you what to use.
Thank you all for the ideas and the collaboration, we hope this post helps somebody else out there.
08-27-2019 12:34 AM
- You may also want to look at the current software version being used on your platform. Sometimes such issues are caused by bugs.
M.
08-28-2019 09:47 AM
We have upgraded our switches by our Cisco channel recomendation, atm we have Version 16.8.1r [FC4], we don't have found any bug related specifically for this. By the way, we will ask CISCO directly for this. Thank you
08-27-2019 01:55 AM
Hello,
if you want to get rid of the log messages, until you have found the cause of the problem, you can configure a logging discriminator:
switch(config)#logging discriminator DOT1X severity drops 5 facility drops ERR_VLAN_EQ_VVLAN
switch(config)#logging buffered discriminator DOT1X
switch(config)#logging console discriminator DOT1X
switch(config)#logging monitor discriminator DOT1X
08-28-2019 09:59 AM
Thank you!, we will check that, i'll let you know how it goes.
08-27-2019 05:21 AM - edited 08-27-2019 05:22 AM
Please supply port configuration of a port with the problem along with model of IP Phone attached and the same for a port without the problem.
Also "sh proc cpu sorted" to see if a specific process is running high CPU.
08-28-2019 09:53 AM
Hi, the port configuration is "standard" in all the stack and looks like this one:
interface <<INTERFACE>> switchport access vlan <<DATAVLAN> switchport mode access switchport voice vlan <<VOICEVLAN>> authentication event fail action authorize vlan <<PREAUTHVLAN>> authentication event server dead action authorize vlan <<DATAVLAN>> authentication event server dead action authorize voice authentication event no-response action authorize vlan <<PREAUTHVLAN>> authentication host-mode multi-domain authentication order mab dot1x authentication priority dot1x mab authentication port-control auto authentication periodic authentication timer reauthenticate 28800 authentication violation restrict mab storm-control broadcast level 0.50 storm-control multicast level 0.50 storm-control action shutdown dot1x pae authenticator dot1x timeout tx-period 1 spanning-tree portfast spanning-tree bpduguard enable end
The IP Phone is a CISCO CP-7841.
The "sh proc cpu sorted" cannot be checked, the stack got unresponsive also from console. We needed to power cicle the stack to recover the management.
08-28-2019 09:58 AM
We got some update,
We checked the last logs before Power Cycling the stack and we found this message:
%PARSER-6-WMLRETRY: Write memory lock currently held by pid '504', automatic retry.
Sadly, at that moment we couldnt check the PID owner, and at this moment the 504 is held by NTP (Don't think so is related).
08-29-2019 01:49 AM
I think I would try to swap phones on different ports and see if the issue follow the phone. Also I would try to simplify the configuration on a port, disabling dot1x and see how it behaves when it is a normal access port with voice enabled. Another possibility is to try and break the stack and see how it behaves then. These are divide and conquer troubleshooting approaches, to try and limit the scope.
It would be nice to have the CPU process information, but I can see that this is a difficult one. Perhaps check the flash for crashdumps that could reveal some more information. Is a syslog server configured? Perhaps logging is still sent to this one while the console is being unresponsive.
Cisco CLI Analyzer could be used to check the configuration.
09-26-2019 02:15 PM
Hello all, and thank you for your responses. At this moment we have configured some "workaround" and all its working pretty good, but we still have some doubts and we want to shared them with all of you, just in case someone found them usefull.
We checked configuration and procedures and found some race condition that was started by one of our change procedures, like this:
switch(config)#interface GigabitEthernetX/Y/Z switch(config-if)#no switchport voice vlan A switch(config-if)#switchport voice vlan BThe error pops up inmediatly and the switch started to slowdown:
%DOT1X_SWITCH-5-ERR_VLAN_EQ_VVLAN: Data VLAN ^A on port <<PORT>> cannot be equivalent to the Voice VLAN AuditSessionID 2We think that it is due to some coding rule in the IOS that doesn't have enough validation or that is heavy coupled with other data structure and when we launch the "no switchport voice vlan" some variable got empty and some other process got buggy about it (VLAN data its empty, and appears some non-numerical char "^A")
%PARSER-6-WMLRETRY: Write memory lock currently held by pid <<PID>>, automatic retry
For solving this, we just changed the procedure with one of this two sequences:
switch(config)#default interface GigabitEthernetX/Y/Z
switch(config)# interface GigabitEthernetX/Y/Z switch(config-if)# no authentication event server dead action authorize voice switch(config-if)# no switchport voice vlan A switch(config-if)# switchport voice vlan B switch(config-if)# authentication event server dead action authorize voice
And we were happy about it.... BUT, the CISCO TAC were checking the issue too and just told us that all the issue was caused because some bad practice in the port configuration and the solution was increasing the dot1x timeout tx-period to 10 seconds. this proved to work too... but we found that answer pretty empty. But maybe was just that simple. The configuration is:
switch(config)# interface GigabitEthernetX/Y/Z switch(config-if)# dot1x timeout tx-period 10
Is up to you what to use.
Thank you all for the ideas and the collaboration, we hope this post helps somebody else out there.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide