Re: WS-X6748-GE-TX Problem or Supervisor Problem?

mladentsvetkov · ‎09-28-2009

Hi Guys,

I have the following messages logged on my CAT65XX:

...

Sep 28 17:00:10: %SYS-CFC1-2-MMAP: mmap failed for size 24576 bytes Caller PC ...

...

Sep 28 17:00:21: %SYS-CFC1-2-MALLOCFAIL: Memory allocation of 21656 bytes failed from 0x72AFC54C, alignment 8

Pool: Processor Free: 92392 Cause: Memory fragmentation

Alternate Pool: None Free: 0 Cause: No Alternate pool

Question is whether the problem is with the "WS-F6700-CFC" daughter card of WS-X6748-GE-TX or with the Supervisor?

Thanks in advance,

Mladen

Giuseppe Larosa · ‎09-28-2009

Hello Mladen,

this looks like to be a problem with system wide buffers to hold process switched packets.

You can use sh buffers to see statistics of system buffers.

However, we can see that memonic is SYS-CFC1-

let's look at error guide

%FACILITY-SUBFACILITY-SEVERITY-MNEMONIC: Message-text

see

http://www.cisco.com/en/US/partner/docs/ios/12_2sx/system/messages/sm2sxovr.html#wp22036

or

http://www.cisco.com/en/US/docs/ios/12_2sx/system/messages/sm2sxovr.html#wp22036

so it looks like from a CFC facility not sure about the slot (1?) and severity should be 2

why CFC or main cpu has to handle a so big PDU is also to be investigated.

Hope to help

Giuseppe

mladentsvetkov · ‎09-28-2009

Thanks for your answer.

What would you say about the following:

CSCsr09208

Memory allocation error of fragmentation when plenty memory available

Symptom:

Memory allocation error of fragmentation when plenty memory available.

Conditions:

Configure a large number of ACE's. In this DDTS, the submitter tried to

configure more than 50k ACE's and it can cause memory fragmentation problem.

This memory fragmentation is gone after removing ACL's.

Workaround:

None.

CSCsm69827

Symptoms:

Customer switch running on modular IOS outputs the following error messages.

%SYS-CFC4-2-MMAP: mmap failed for size 24576 bytes Caller PC 0x72AFC54C

errno 12 : ios-base : (PID=12307, TID=16) :

The "size" in the error message is often the same as the above but may be different as well.

Conditions:

The main way to differentiate this defect from other memory leak type defects is that show proc memory shows that the switch has not run out of memory at the time the error messages are seen.

When the problem occurs, the switch has usually been running for anywhere between two weeks to a couple of months.

Workaround:

There is no workaround apart from reloading the affected card.

Further Problem Description:

The problem occurs due to an accounting problem internal to the operating system kernel. The particular counter that causes the problem is not visible at the user level and can not be seen from any command line interface. The counter increments whenever a process requests memory from the kernel. The error message indicates that the counter has reached the threshold value at which point the kernel denies any further memory allocation requests from the process.

Even though the underlying accounting problem is always present the symptoms and its effects may never be seen under many customer scenarios. This is because the number of kernel allocations required to trigger the symptoms is very high and might practically never be reached.

Giuseppe Larosa · ‎09-29-2009

Hello Mladen,

thanks for your kind remarks.

I would look at second one that is almost a perfect match unless you have a lot of ACL lines configured on device.

Hope to help

Giuseppe