cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2689
Views
5
Helpful
10
Replies

WLC4404 - pmalloc detected memory corruption

Erik Qvam
Visitor

Hi there

Yesterday I upgraded from K9-6-0-188-0.aes to K9-7-0-116-0.aes. The controller has always been stable (86xLAP1131AG-E-K9, 2xports installed, maximum 250 users). After the upgrade it begun to reboot. Maximum uptime 5-20 minutes. Except for in the middle of the night with almost 6 hours uptime, until the users were back this morning. I have transfered some of the AP's to our secondary controller so "only" 61 are left to handle for 4404. And now it keeps running for a little longer between reboots.

The crash file starts with:

----------------------------------- cut -----------------------------------

************************************************************
*             Dumping Registers                              *
************************************************************
NIP: 1000C0E8 XER: 20000000 LR: 1000C084 SP: 3B03AA10 REGS: 0x3b03a550 TRAP: 0300
MSR: 0002d000 EE: 1 PR: 1 FP: 0 ME: 1 IR/DR: 00
DAR: 00000000, DSISR: 00800000
Stack:
1000c084 0000032b 1125d6b8 1a2eaef0 11f70000 11a90000 00000000 1a2eaef0

10cc8bdc 1a2eaef0 11265f14 1a2eaef0 11257d94 00000100 1a2eaef0 00000000
3113fbd8 00000100 10ccd9b8 11f50000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 706d616c 0000032b 00000006 13bf5584

3113f344 11970000 3113f7d8 3113f2ac 10cce01c 00000003 19e83020 1199e3fc


************************************************************

*             Start Cisco Crash Handler                *

************************************************************

Sys Name:       wlc1

Model:          AIR-WLC4404-100-K9

Version:        7.0.116.0

Timestamp:      Tue Sep  6 12:18:29 2011
SystemUpTime:   0 days 0 hrs 57 mins 29 secs
signal:         11
pid:            963
TID:            720941
Task Name:      sshpmMainTask
Reason:         System Crash
si_signo:       11

si_errno:       0

si_code:        1

si_addr:        0x0

timer tcb:      0x4dc

timer cb:       0x101173d0 ('apfTimeGetInStr+88')

timer arg1:     0x18722194
timer arg2:     0x18722194

Long time taken timer call back inforamtion:
Time Stamp:     Tue Sep  6 12:09:51 2011
timer cb  :     0x10123f84 ('apfAaaUrlRedirectAclGetByMscb+220')
Duration  : 18602 usecs, cbCount= 1

------------------------------------------------------------
Analysis of Failure:

   Software was stopped for the following reason:
     pmalloc detected memory corruption
------------------------------------------------------------

pmalloc memory corruption type: ++PMALLOC_ENTRYMAGIC_0_CORRUPTION
-  Corruption detected at pmalloc entry address: (0x3113f2ac)
-  Corrupt entry: entryMagic_0(0x0004), entryMagic_1(0xbe91be91),
   trailer(0xead0ead0),poison(0x06a0),
   entrysize(1024),bytes(992),thread(Unknown task name, task id = (331306184)),
   file(sshbuffer.c),line(283), time(9),
   previous  access file(sshmp-integer-co)-line(220)

----------------------------------- cut -----------------------------------

Anyone got an idea if we are talking about broken memory? Memory I did not hit earlier because 6-0-188-0 is smaller than 7-0-116-0?

Any comment might be helpful 🙂  Thanks in advance.

Message was edited by: Erik Qvam

1 Accepted Solution

Accepted Solutions

Surendra BG
Cisco Employee
Cisco Employee

I feel we are hitting the below bug..

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCtr17396

WLC crash : pmalloc memory corruption

Symptom:

Software was stopped for the following reason:
pmalloc detected memory corruption
------------------------------------------------------------
pmalloc memory corruption type: ++PMALLOC_ENTRYMAGIC_0_CORRUPTION

Conditions:
NA

Workaround:
rebooted back with backup code after which wlc stays up

Please dont forget to rate the usefull posts!!

Regards

Surendra

Regards
Surendra BG

View solution in original post

10 Replies 10

Surendra BG
Cisco Employee
Cisco Employee

I feel we are hitting the below bug..

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCtr17396

WLC crash : pmalloc memory corruption

Symptom:

Software was stopped for the following reason:
pmalloc detected memory corruption
------------------------------------------------------------
pmalloc memory corruption type: ++PMALLOC_ENTRYMAGIC_0_CORRUPTION

Conditions:
NA

Workaround:
rebooted back with backup code after which wlc stays up

Please dont forget to rate the usefull posts!!

Regards

Surendra

Regards
Surendra BG

Thank You. This might be the solution. The symptom is familiar, but I don't understand the workaround.

What does "rebooted back with backup code" meen?

Boot the "Emergency Image Version 7.0.116.0" ?

or

Install and boot, for instanse, 6.0.188.0 ?

Regards Erik

Yes its 6.0.188

issue the command

config boot backup

and issue the command "show boot" and see if the back up image is active and reboot the WLC its gonna boot with the backup!!

Lemme know if this helps and please dont forget to rate the usefull posts!!

Regards

Surendra

Regards
Surendra BG

I can do that Surendra, but then I will not run 7.0.116.0 on both my controllers. Will I? The 4402 that upgraded flawlessly. And the 4404 that didn't upgrade that well.......

Can there be an upgrade path trough multiple versions that prevents the CSCtr17396 bug? Downgrading shouldn't be the answer. I think.

Anyway. Thank you for pointing me in the right direction. I will probably try 7.0.98.x or 6.0.202.0 since CSCtr17396 first was found in 7.0.116.0.

I will try to remember to post a follow-up if 7.0.98.x was the right way to go.

Regards

Erik

Feel free to open up a TAC case to get the RCA done on the same.. Can u do this.. upload 7.0.98.218 and then boot the image with 7.0.98.218 and then boot hthe WLC with the backup (Which will be ur 7.0.116) and try once??

Regards

Surendra

Regards
Surendra BG

This sounds like a well structured approach.

Since we have a  group of 24/7 users I will not disturb them with a double image download to the 1131's now. So I will start with downloading 7.0.98.218 to the 4404 and then config boot backup so that I continue to use 7.0.116.0 booted from the "Backup Boot Image" position. If this doesn't make a difference I will do the config boot primary (7.0.98.218) / config boot backup (7.0.116.0) sequence.

Thank You for all help so far.

lemme know how it goes.. and fee lfree to update the post.. we will be more than happy to assist you!!

Regards

Surendra

Regards
Surendra BG

I'm sorry to say that running 7.0.116.0 from Backup Boot Image is no success. It was stabel during the night:

(Cisco Controller) >show run-config

System Inventory

NAME: "Chassis"    , DESCR: "4400 Series WLAN Controller:100 APs"

PID: AIR-WLC4404-100-K9,  VID: V02,  SN: FOC1143F0J6

:

Product Version.................................. 7.0.116.0

RTOS Version..................................... 7.0.98.218

Bootloader Version............................... 7.0.116.0

Emergency Image Version.......................... 7.0.116.0

:

System Up Time................................... 0 days 8 hrs 6 mins 37 secs

:

Number of Active Clients......................... 68

(Cisco Controller) >show boot

Primary Boot Image............................... Code 7.0.98.218

Backup Boot Image................................ Code 7.0.116.0 (default) (active)

But the Crash File Story for last evening tells a different story:

************************************************************

*             Start Cisco Crash Handler                *

************************************************************

Model:        AIR-WLC4404-100-K9

Version:      7.0.116.0

Timestamp:    Wed Sep  7 00:24:33 2011

SystemUpTime:     0 days 0 hrs 50 mins 33 secs

************************************************************

*             Start Cisco Crash Handler                *

************************************************************

Model:        AIR-WLC4404-100-K9

Version:      7.0.116.0

Timestamp:    Tue Sep  6 23:32:38 2011

SystemUpTime:     0 days 0 hrs 6 mins 35 secs

************************************************************

*             Start Cisco Crash Handler                *

************************************************************

Model:        AIR-WLC4404-100-K9

Version:      7.0.116.0

Timestamp:    Tue Sep  6 23:24:40 2011

SystemUpTime:     0 days 0 hrs 8 mins 3 secs

************************************************************

*             Start Cisco Crash Handler                *

************************************************************

Model:        AIR-WLC4404-100-K9

Version:      7.0.116.0

Timestamp:    Tue Sep  6 23:15:20 2011

SystemUpTime:     0 days 0 hrs 8 mins 6 secs

The interesting thing is that my 4402 has no problems with 7.0.116.0. Anyway... I will schedule booth my controllers for a reboot into 7.0.98.218 the forthcomming night. And then I can, probably, confirm the workaround for http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCtr17396

Regards

Erik

Hmmm.... Lemme know how it goes!!

Regards

Surendra

Regards
Surendra BG

Downgrading from 7.0.116.0 to 7.0.98.218 solved CSCtr17396. I could of course also reverted to the backup code (6.0.188.0), as stated in the workaround, but we wanted to take the move from 6 to 7.

Note that only the WLC4404 was affected by the bug (CSCtr17396), WLC4402 was not.

Thank you for all the help.

Regards

Erik

Review Cisco Networking for a $25 gift card