04-03-2020 05:51 AM
Hello,
on March the 23th one of our switches (Catalyst 4510R+E, Version 03.07.03.E) crashed and rebooted on half past 5 pm (clock was wrong, actually it was one hour later).
The logging says:
1986: Mar 23 17:28:31.535: %C4K_SWITCHINGENGINEMAN-4-VFEOCINTERRUPT: VFE OC aggToPhyMapParErr interrupt. valid: 1 addr: 0x82 data.rep: 0x1A000068004 parity: 1
What you can find about the general error message ist:
Error Message C4K_SWITCHINGENGINEMAN-4-VFEOCINTERRUPT: [char]
Explanation An error in the Very-fast Forwarding Engine's Output Classification Module was detected. Contents of the log register are printed out. This could be a parity error in a table that software is capable of correcting or a fatal error.
Recommended Action If this message is a fatal error, contact Cisco TAC. Otherwise, no further actions are necessary.
I looked into the crash-file and found some messages referencing to the reboot. The service iosd caused a high CPU and the switch killed the process. So the reboot occurs.
Mon Mar 23 17:28:34 2020> hap_sup_reset: Reason Code:[2] Reset Reason:Service [iosd] pid:[5612] terminated abnormally [6].
…
crashinfo: PID 5612 is taking too much time to collect crashinfo, sent SIGKILL
…
[1521277198:704624:ERROR:5:ha_mgr:b45d7490:process_get_next:1218] provider returned (rc=101, err=Unknown Error)
…
=========== top process sorted by rss =================
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 7860 0.0 0.0 19544 3832 ? S 2016 0:00 /usr/binos/bin/xlogger -q -F -b installer-scripts -S
root 4009 81.1 0.0 15984 3720 ? S 17:28 0:50 /usr/local/bin/ng_dumper -x 1 -p 5612 -u 0 -g 0 -s 6 -t 1584984514 -h vt45cat4510r-e-5 -e iosd -l 0
…
[03/23/20 17:28:34.814 UTC d27 5250] [HA] bury_child: Service name: IOSd service, Pid: 5612, exit code: 6, cwd: /var/sysmgr/work, core dump = 0
[03/23/20 17:28:34.814 UTC d28 5250] [HB] bury_child: Stopping the timer for service "IOSd service" , its getting terminated.
Does anybody knows this problem? Maybe it is a one-time problem? Or a bug? Or a hardware issue?
I only found out, that there are two bugs with similar - but not identical - logging message:
https://quickview.cloudapps.cisco.com/quickview/bug/CSCvd05307
https://quickview.cloudapps.cisco.com/quickview/bug/CSCvd65647
After the reboot the switch did not work normal. It was sending wrong information to our network access control system (SNMP, not 802.1x). So, for example, the interface which is normally „gi 4/4“ was suddenly stated as „gi 10/8“ (wrong indexed?).
Solution was to delete the switch from the NAC and reinsert it again.
But then we noticed that all the clients on module 8 of the switch were having problems with their network connection. On the console the switchports were „connected“, but you couldn’t see the mac addresses of the clients. So they did not get an ip address from the dhcp server. Nothing worked to resolve this…. Tried to reset the switchports and rebooted the whole switch again!
Only resetting module 8, deleting the config of it and reconfigure the ports helped to solve this issue.
Does anybody has an idea what was wrong? Could this be a beginning hardware issue with the module? The Status oft he module was „ok“ all the time.
Mod Ports Card Type Model
---+-----+--------------------------------------+------------------
1 48 10/100/1000BaseT Premium POE E Series WS-X4748-RJ45V+E
3 48 10/100/1000BaseT Premium POE E Series WS-X4748-RJ45V+E
4 48 10/100/1000BaseT Premium POE E Series WS-X4748-RJ45V+E
5 8 Sup 8-E 10GE (SFP+), 1000BaseX (SFP) WS-X45-SUP8-E
7 48 10/100/1000BaseT Premium POE E Series WS-X4748-RJ45V+E
8 48 10/100/1000BaseT Premium POE E Series WS-X4648-RJ45V+E
9 48 10/100/1000BaseT Premium POE E Series WS-X4648-RJ45V+E
10 48 10/100/1000BaseT Premium POE E Series WS-X4748-RJ45V+E
Unfortunately our reseller found out that we don't have support for this switch anymore so they can’t open a TAC for us.
Thanks for your help and any opinions.
Greetings Lydia
04-03-2020 06:34 AM
Lydia
I do not have much insight into exactly what this problem is and hope that perhaps someone from Cisco might supply some insight. But in the mean time I would say that it is unfortunate that you are not able to open a case with TAC because that is probably the only way to really get an understanding and a resolution of this issue. If you can not open a case with TAC then I believe that your only option is to let the switch run and hope that this was a one time issue. If the problem does happen again then you probably need to plan to replace the switch (or perhaps to see if you can get a new service contract that covers the switch).
04-03-2020 05:30 PM
04-04-2020 08:37 AM
Hi Rick, Hi Leo,
thanks for your answers.
Here is the output.
dir crashinfo:
Directory of crashinfo:/
8066 -rw- 0 May 1 2016 22:24:21 +00:00 koops.dat
8067 drwx 1024 May 12 2016 19:04:38 +00:00 ap_crash
8068 -rw- 0 Mar 23 2020 19:55:34 +00:00 cilogs
8069 -rw- 0 Mar 23 2020 17:29:38 +00:00 deleted_crash_files
8070 -rwx 5593784 Mar 23 2020 17:29:38 +00:00 crashinfo_iosd_20200323-17 2834-UTC
8071 -rw- 44 Mar 23 2020 17:29:38 +00:00 last_crashinfo
8072 -rwx 72710529 Mar 23 2020 17:29:39 +00:00 fullcore_iosd_20200323-172 834-UTC
Greetings Lydia
04-04-2020 04:22 PM - edited 04-04-2020 05:17 PM
@lydia.walther wrote:
crashinfo_iosd_20200323-17 2834-UTC
fullcore_iosd_20200323-172 834-UTC
04-05-2020 04:06 AM
04-05-2020 04:21 AM
04-05-2020 05:06 AM
04-05-2020 05:42 AM
I'm wondering that the fullcore-post is there. I got error messages all the time when trying to upload it. I think it's to big.
04-05-2020 06:20 AM
04-05-2020 08:37 AM
- No access here either, the forum probably rejects it because it's too big (indeed)/
M.
04-05-2020 02:40 PM
04-06-2020 01:21 AM
Sent you a message.
Greetings Lydia
04-06-2020 01:51 AM
- The file is garbled but anyway, if this would happen again, then have a cold-start of the switch using power-cycle, have a console connected and scrutinize the boot-up process , especially the faze where the modules are undergoing tests.
M.
04-06-2020 04:56 AM
Sorry for the garbled file. I'm not sure what the problem ist... maybe the process of copying the file from the tftp to my computer.
Thanks a lot for your answer.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide