Re: %MLS-4-OVERFLOW messages

lwhalley · ‎09-20-2004

We have a 5509 core switch that is getting constant messages like:

2004 Sep 20 12:05:32 central -05:00 %MLS-4-MOVEOVERFLOW:Too many moves, stop MLS

for 5 sec(40000000)

2004 Sep 20 12:05:37 central -05:00 %MLS-4-RESUMESC:Resume MLS after detecting t

oo many moves

I have seen the document that indicates that the problem might be due to the following reasons:

- a permanent L2 (spanning-tree) loop

- one or more faulty switch ports

- a bad cable (for example, a unidirectional fiber link)

- other bad hardware (not necessarily on the switch generating the messages)

- misconfigured device (for example, a traffic generator sending traffic to two switch ports using the same MAC address)

I have checked the spanning-tree portfast parameters, port counters and the neighboring switches, but have not found anything obvious.

Does anyone have any ideas for tracking this down other than disabling ports one-by-one to see if the messages stop?

Thanks in advance.

Prashanth Krishnappa · ‎09-20-2004

Most common reason is STP loop. Draw out your topology and check for any redundanct links. Check for STP loop starting from the root switch.

If you have etherchannels, check for any possible misconfigs,

The following page should get you started

http://www.cisco.com/warp/public/473/16.html

Open a TAC case if you need assistance

lwhalley · ‎09-21-2004

I did resolve the problem. It is a rogue hub or switch that has two connections back to the core switch. We're in the process of tracing it back and eliminating the extra link. Spantree portfast had been enabled on those ports. Disabling portfast by itself didn't seem to resolve the problem. I had to disable/re-enable the ports to stop the message traffic.

Thanks for the help.

Kevin Dorrell · ‎09-21-2004

Disabling portfast would not have any effect on the current situation because portfast only determines the behaviour during the transition from online to offline.

There is a bpdu-guard feature that you can enable to prevent this sort of situation; As soon as it sees a Bridge PDU on a portfast port, it zaps the port. See http://www.cisco.com/univercd/cc/td/doc/product/lan/cat5000/rel_6_3/cmd_ref/setsn_su.htm#wp1100200

Kevin Dorrell

Luxembourg

Kevin Dorrell · ‎09-21-2004

Just one further thought. I had this "too many moves" situation as well, but in my case it was nothing to do with a rogue hub or switch. I'll tell the story anyway, because it is interesting. It was due to a large population of a certain model of PC.

If you apply power to one of these PCs, but you do not switch it on, then the NIC goes into standby, but generates occasional frames sourced from MAC 00:00:00:00:00:00. So, imagine a site of 1000 PCs after a major power outage: there are a load of hosts generating spurious frames with source 00:00:00:00:00:00. To MLS, this looks like a load of rapid moves, and so you can get the message shown. And they are a devil to track down. On a 4000, if you "show cam", then it filters 00:00:00:00:00:00 from the table, and tells you so, which helps.

Kevin Dorrell

Luxembourg

ibakanchev · ‎09-27-2004

I see this situation twice.

Both cases happened in large network based on Catalyst 5000 Family.

I think that it is a bug in CAT OS for 5000 Family.

I recommend you disabling MLS functionality.

lwhalley · ‎09-27-2004

We have tracked this problem down to a server that has two NICS and two connections to the switch. It is a "Virtual Machine" VMWare ESX server which has internal switching and needs to be in a separate VLAN and have 802.1Q trunking enabled on the neighboring switch to work properly. Here is the link to a white paper that discusses the issue.

http://www.vmware.com/pdf/esx21_vlan.pdf

We are in the process of implementing the suggested solution. These servers are new to our environment, but are apparently being deployed across our network.