cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1824
Views
0
Helpful
14
Replies

Catalyst 4510 - crashing iosd process

lydia.walther
Level 1
Level 1

Hello,

on March the 23th one of our switches (Catalyst 4510R+E, Version 03.07.03.E) crashed and rebooted on half past 5 pm (clock was wrong, actually it was one hour later).

 

The logging says:

1986: Mar 23 17:28:31.535: %C4K_SWITCHINGENGINEMAN-4-VFEOCINTERRUPT: VFE OC aggToPhyMapParErr interrupt. valid: 1 addr: 0x82 data.rep: 0x1A000068004 parity: 1

What you can find about the general error message ist:
Error Message    C4K_SWITCHINGENGINEMAN-4-VFEOCINTERRUPT: [char]
Explanation    An error in the Very-fast Forwarding Engine's Output Classification Module was detected. Contents of the log register are printed out. This could be a parity error in a table that software is capable of correcting or a fatal error.

Recommended Action    If this message is a fatal error, contact Cisco TAC. Otherwise, no further actions are necessary.

 

I looked into the crash-file and found some messages referencing to the reboot. The service iosd caused a high CPU and the switch killed the process. So the reboot occurs.

Mon Mar 23 17:28:34 2020> hap_sup_reset: Reason Code:[2] Reset Reason:Service [iosd] pid:[5612] terminated abnormally [6].

crashinfo: PID 5612 is taking too much time to collect crashinfo, sent SIGKILL

[1521277198:704624:ERROR:5:ha_mgr:b45d7490:process_get_next:1218] provider returned (rc=101, err=Unknown Error)

=========== top process sorted by rss =================

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND

root      7860  0.0  0.0  19544  3832 ?        S     2016   0:00 /usr/binos/bin/xlogger -q -F -b installer-scripts -S

root      4009 81.1  0.0  15984  3720 ?        S    17:28   0:50 /usr/local/bin/ng_dumper -x 1 -p 5612 -u 0 -g 0 -s 6 -t 1584984514 -h vt45cat4510r-e-5 -e iosd -l 0

[03/23/20 17:28:34.814 UTC d27 5250] [HA]  bury_child: Service name: IOSd service, Pid: 5612, exit code: 6, cwd: /var/sysmgr/work, core dump = 0

[03/23/20 17:28:34.814 UTC d28 5250] [HB]  bury_child: Stopping the timer for service "IOSd service" , its getting terminated.

 

Does anybody knows this problem? Maybe it is a one-time problem? Or a bug? Or a hardware issue?

I only found out, that there are two bugs with similar - but not identical - logging message:
https://quickview.cloudapps.cisco.com/quickview/bug/CSCvd05307
https://quickview.cloudapps.cisco.com/quickview/bug/CSCvd65647

 

After the reboot the switch did not work normal.  It was sending wrong information to our network access control system (SNMP, not 802.1x). So, for example, the interface which is normally „gi 4/4“ was suddenly stated as „gi 10/8“ (wrong indexed?).
Solution was to delete the switch from the NAC and reinsert it again.
But then we noticed that all the clients on module 8 of the switch were having problems with their network connection. On the console the switchports were „connected“, but you couldn’t see the mac addresses of the clients. So they did not get an ip address from the dhcp server. Nothing worked to resolve this…. Tried to reset the switchports and rebooted the whole switch again!
Only resetting module 8, deleting the config of it and reconfigure the ports helped to solve this issue.
Does anybody has an idea what was wrong? Could this be a beginning hardware issue with the module? The Status oft he module was „ok“ all the time.

 

Mod Ports Card Type                              Model            

---+-----+--------------------------------------+------------------

 1    48  10/100/1000BaseT Premium POE E Series  WS-X4748-RJ45V+E  

 3    48  10/100/1000BaseT Premium POE E Series  WS-X4748-RJ45V+E  

 4    48  10/100/1000BaseT Premium POE E Series  WS-X4748-RJ45V+E  

 5     8  Sup 8-E 10GE (SFP+), 1000BaseX (SFP)   WS-X45-SUP8-E     

 7    48  10/100/1000BaseT Premium POE E Series  WS-X4748-RJ45V+E  

 8    48  10/100/1000BaseT Premium POE E Series  WS-X4648-RJ45V+E  

 9    48  10/100/1000BaseT Premium POE E Series  WS-X4648-RJ45V+E  

10    48  10/100/1000BaseT Premium POE E Series  WS-X4748-RJ45V+E  

 

 

Unfortunately our reseller found out that we don't have support for this switch anymore so they can’t open a TAC for us.

Thanks for your help and any opinions.

 

Greetings Lydia

14 Replies 14

Richard Burts
Hall of Fame
Hall of Fame

Lydia

 

I do not have much insight into exactly what this problem is and hope that perhaps someone from Cisco might supply some insight. But in the mean time I would say that it is unfortunate that you are not able to open a case with TAC because that is probably the only way to really get an understanding and a resolution of this issue. If you can not open a case with TAC then I believe that your only option is to let the switch run and hope that this was a one time issue. If the problem does happen again then you probably need to plan to replace the switch (or perhaps to see if you can get a new service contract that covers the switch).

HTH

Rick

Leo Laohoo
Hall of Fame
Hall of Fame
Please post the complete output to the command "dir crashinfo:".

Hi Rick, Hi Leo,

 

thanks for your answers.

 

Here is the output.

 

dir crashinfo:
Directory of crashinfo:/

8066 -rw- 0 May 1 2016 22:24:21 +00:00 koops.dat
8067 drwx 1024 May 12 2016 19:04:38 +00:00 ap_crash
8068 -rw- 0 Mar 23 2020 19:55:34 +00:00 cilogs
8069 -rw- 0 Mar 23 2020 17:29:38 +00:00 deleted_crash_files
8070 -rwx 5593784 Mar 23 2020 17:29:38 +00:00 crashinfo_iosd_20200323-17 2834-UTC
8071 -rw- 44 Mar 23 2020 17:29:38 +00:00 last_crashinfo
8072 -rwx 72710529 Mar 23 2020 17:29:39 +00:00 fullcore_iosd_20200323-172 834-UTC

 

Greetings Lydia


@lydia.walther wrote:
crashinfo_iosd_20200323-17 2834-UTC

fullcore_iosd_20200323-172 834-UTC

Download these files and attach them.

Hi Leo,

 

first, here is the crashinfo-iosd-file.

 

Greetings Lydia

And here is the fullcore-file.

I am having difficulty downloading the file.

I'm wondering that the fullcore-post is there. I got error messages all the time when trying to upload it. I think it's to big.

 

 

 - No access here either, the forum probably rejects it because it's too big (indeed)/

  M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

How about the other file?

Sent you a message.

 

Greetings Lydia

 

 - The file is garbled but anyway, if this would happen again, then have a cold-start of the switch using power-cycle, have a console connected and scrutinize the boot-up process , especially the faze where the modules are undergoing tests.

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Sorry for the garbled file. I'm not sure what the problem ist... maybe the process of copying the file from the tftp to my computer.
Thanks a lot for your answer.

Review Cisco Networking for a $25 gift card