dot3StatsInternalMacRxErrs on Cat 5500

Kevin Dorrell · ‎09-27-2004

I have a Catalyst 5500, and on some of the heavily used 100BaseTX ports I have quite a high error rate - sometimes as much as 500 part per million. This happens on quite a few ports, spread over several cards.

The "show counters" tells me that these are mainly dot3StatsInternalMacRxErrs. I have excluded duplex mismatches, as those tend to give alignment and runt errors, and I have excluded cable problems as they usually give FCS errors. In fact, I do not believe that these errors are port-side; I think they are more likely to do with the inner workings of the 5500.

Can someone with a detailed knowledge of the 5500 please tell me how to interpret dot3StatsInternalMacRxErrs, and suggest what I can do to avoid them?

Kevin Dorrell

Luxembourg

Prashanth Krishnappa · ‎09-27-2004

Here is an explanation for these errors

dot3StatsInternalMacRxErrs

RFC1398(802.3) counter:

"A count of frames for which reception on a particular interface fails due to an internal MAC sublayer receive error. A frame is only counted by an instance of this object if it is not counted by the corresponding instance of either the dot3StatsFrameTooLongs object, the dot3StatsAlignmentErrors object, or the dot3StatsFCSErrors object. The precise meaning of the count represented by an instance of this object is implementation- specific. In particular, an instance of this object may represent a count of receive errors on a particular interface that are not otherwise counted."

Is an equivalent of 'Rcv-Err' in sh port and 'In-Lost' in sh mac. So do a sh mac and see if you see these counters increment. See the explanation of these counters.

http://www.cisco.com/warp/public/473/53.shtml#Show_Mac_for_CatOS_and_Show_Interfaces_f

Kevin Dorrell · ‎09-27-2004

Thank you Prashanth, that looks like a really useful document.

So, I see that the frames are lost due to receive buffer problems. I presume this means that the port had exhausted all its available buffer space, and did not have one available in which to receive the frame. That fits in with my observation that the error rate is greatest during heavy traffic, e.g. backups.

So, I am looking at ways to reduce this error count.

Now, if I remember rightly, in the 5500 each port has dedicated 128Kb of buffer memory on the blade for the reception queue and 64 Kb for transmission queue (or maybe vice versa). Which presumably means that there is no mileage in matching one heavily used port with several lightly used ones on the same card; the buffer resources cannot be re-allocated from a lightly used port to a heavily used one.

Also, if I remember rightly, the 5500 has a 3.6 Gbps backplane which is common to all the data paths. So maybe I will get rid of this problem if I move to a 4000, which has 1 Gbps dedicated to each group of (4 or 8) ports, if I spread the load carefully.

Do you know any tricks I can do with the 5500 to avoid these receive buffer errors?

Kevin Dorrell

Luxembourg

Prashanth Krishnappa · ‎09-27-2004

One thing I would check for is to see if there is any uneccessary broadcast/multicast or unknown unicast flooding which is causing the buffer starvation. Check for broadcast/multicast counters in the output of sh mac. If you see, them increment a lot, put a sniffer in that VLAN(without any SPAN session) and see what the traffic is. If you see other unicast traffic not meant for the sniffer port, it could be unicast flooding

http://www.cisco.com/warp/public/473/143.html