05-15-2023 01:12 PM
Hello alltogether,
I'm using a SG350X-24P and I upgraded from firmware 2.5.8.15 to firmware 2.5.9.16. After my workstation with Mellanox 10G card (connected via fiber to the uplink ports) came up, the switch crashed immediately with a hard reboot in an endless loop until the fibre connection was removed.
The following error was logged via syslog to my small server system:
2023-05-15T21:22:41.387964+02:00 octopus.mgmt.siski.de %LINK-I-CHNGCOMBOMEDIA: Media changed from copper media to fiber media on port te1/0/2.
2023-05-15T21:34:00.980374+02:00 octopus.mgmt.siski.de %SYSLOG-F-OSFATAL: mtdSoftwareReset(((rel_ifIndex < (64 * 2))?EXTHWP_SF_phy_port_db_ARR[rel_ifIndex]->mtd_object:EXTHWP_SF_phy_port_db_ARR[0]->mtd_object), HALP_config_phy_port_db[rel_i
fIndex].external_phyId, sleep_time_ms) failed with 0x1 ***** FATAL ERROR ***** Reporting Task: HCLT. Software Version: 2.5.9.16 (date Feb 27 2023 time 16:53:52) base_address=0x00444000
2023-05-15T21:34:00.980966+02:00 octopus.mgmt.siski.de ros(+0x798c98)[0xbdcc98] ros(HOSTG_fatal_error+0x14)[0xbe0130] ros(OSSYSG_fatal_error+0x258)[0x10f3370] ros(OSSYSG_fatal_error_formatted+0x44)[0x10f3510] ros(+0x1046f9c)[0x148af9c]
ros(EXTHWP_SF_set_power_modules_2+0x268)[0x148b2e8] ros(EXTHWG_SF_dispatch+0x78)[0x148cef4] ros(HALP_config_phy_set_power_modules_4+0x14)[0x1444748] ros(HALP_config_phy_perform_phy_operation+0xe8)[0x1430e08] ros(HALC_config_phy_perform
_phy_operation+0xfc)[0x1445588] ros(+0xfcb458)[0x140f458] ros(HALC_config_if_dispatch+0x230)[0x1418eac] ros(+0xfdd12c)[0x142112c] ros(+0xfdd350)[0x1421350] ros(HALP_config_main_copy_big_dev_data+0x0)[0x14213d0] /lib/libp2linux.so.1(ta
sk_run+0xf4)[0xb6f3a840] ***** END OF FATAL ERROR *****
2023-05-15T21:34:00.981567+02:00 octopus.mgmt.siski.de %SYSLOG-F-OSFATAL: mtdSoftwareReset(((rel_ifIndex < (64 * 2))?EXTHWP_SF_phy_port_db_ARR[rel_ifIndex]->mtd_object:EXTHWP_SF_phy_port_db_ARR[0]->mtd_object), HALP_config_phy_port_db[rel_i
fIndex].external_phyId, sleep_time_ms) failed with 0x1 ***** FATAL ERROR ***** Reporting Task: HCLT. Software Version: 2.5.9.16 (date Feb 27 2023 time 16:53:52) base_address=0x004c5000
2023-05-15T21:34:00.982076+02:00 octopus.mgmt.siski.de ros(+0x798c98)[0xc5dc98] ros(HOSTG_fatal_error+0x14)[0xc61130] ros(OSSYSG_fatal_error+0x258)[0x1174370] ros(OSSYSG_fatal_error_formatted+0x44)[0x1174510] ros(+0x1046f9c)[0x150bf9c]
ros(EXTHWP_SF_set_power_modules_2+0x268)[0x150c2e8] ros(EXTHWG_SF_dispatch+0x78)[0x150def4] ros(HALP_config_phy_set_power_modules_4+0x14)[0x14c5748] ros(HALP_config_phy_perform_phy_operation+0xe8)[0x14b1e08] ros(HALC_config_phy_perform
_phy_operation+0xfc)[0x14c6588] ros(+0xfcb458)[0x1490458] ros(HALC_config_if_dispatch+0x230)[0x1499eac] ros(+0xfdd12c)[0x14a212c] ros(+0xfdd350)[0x14a2350] ros(HALP_config_main_copy_big_dev_data+0x0)[0x14a23d0] /lib/libp2linux.so.1(ta
sk_run+0xf4)[0xb6ea9840] ***** END OF FATAL ERROR *****
2023-05-15T21:34:00.982566+02:00 octopus.mgmt.siski.de %SYSLOG-F-OSFATAL: mtdSoftwareReset(((rel_ifIndex < (64 * 2))?EXTHWP_SF_phy_port_db_ARR[rel_ifIndex]->mtd_object:EXTHWP_SF_phy_port_db_ARR[0]->mtd_object), HALP_config_phy_port_db[rel_i
fIndex].external_phyId, sleep_time_ms) failed with 0x1 ***** FATAL ERROR ***** Reporting Task: HCLT. Software Version: 2.5.9.16 (date Feb 27 2023 time 16:53:52) base_address=0x00499000
2023-05-15T21:34:00.983063+02:00 octopus.mgmt.siski.de ros(+0x798c98)[0xc31c98] ros(HOSTG_fatal_error+0x14)[0xc35130] ros(OSSYSG_fatal_error+0x258)[0x1148370] ros(OSSYSG_fatal_error_formatted+0x44)[0x1148510] ros(+0x1046f9c)[0x14dff9c]
ros(EXTHWP_SF_set_power_modules_2+0x268)[0x14e02e8] ros(EXTHWG_SF_dispatch+0x78)[0x14e1ef4] ros(HALP_config_phy_set_power_modules_4+0x14)[0x1499748] ros(HALP_config_phy_perform_phy_operation+0xe8)[0x1485e08] ros(HALC_config_phy_perform
_phy_operation+0xfc)[0x149a588] ros(+0xfcb458)[0x1464458] ros(HALC_config_if_dispatch+0x230)[0x146deac] ros(+0xfdd12c)[0x147612c] ros(+0xfdd350)[0x1476350] ros(HALP_config_main_copy_big_dev_data+0x0)[0x14763d0] /lib/libp2linux.so.1(ta
sk_run+0xf4)[0xb6f73840] ***** END OF FATAL ERROR *****
2023-05-15T21:34:00.983690+02:00 octopus.mgmt.siski.de %SYSLOG-F-OSFATAL: mtdSoftwareReset(((rel_ifIndex < (64 * 2))?EXTHWP_SF_phy_port_db_ARR[rel_ifIndex]->mtd_object:EXTHWP_SF_phy_port_db_ARR[0]->mtd_object), HALP_config_phy_port_db[rel_i
fIndex].external_phyId, sleep_time_ms) failed with 0x1 ***** FATAL ERROR ***** Reporting Task: HCLT. Software Version: 2.5.9.16 (date Feb 27 2023 time 16:53:52) base_address=0x004fb000
2023-05-15T21:34:00.984208+02:00 octopus.mgmt.siski.de ros(+0x798c98)[0xc93c98] ros(HOSTG_fatal_error+0x14)[0xc97130] ros(OSSYSG_fatal_error+0x258)[0x11aa370] ros(OSSYSG_fatal_error_formatted+0x44)[0x11aa510] ros(+0x1046f9c)[0x1541f9c]
ros(EXTHWP_SF_set_power_modules_2+0x268)[0x15422e8] ros(EXTHWG_SF_dispatch+0x78)[0x1543ef4] ros(HALP_config_phy_set_power_modules_4+0x14)[0x14fb748] ros(HALP_config_phy_perform_phy_operation+0xe8)[0x14e7e08] ros(HALC_config_phy_perform
_phy_operation+0xfc)[0x14fc588] ros(+0xfcb458)[0x14c6458] ros(HALC_config_if_dispatch+0x230)[0x14cfeac] ros(+0xfdd12c)[0x14d812c] ros(+0xfdd350)[0x14d8350] ros(HALP_config_main_copy_big_dev_data+0x0)[0x14d83d0] /lib/libp2linux.so.1(ta
sk_run+0xf4)[0xb6f56840] ***** END OF FATAL ERROR *****
2023-05-15T21:34:00.986547+02:00 octopus.mgmt.siski.de %SYSLOG-N-LOGGING: Logging started.
2023-05-15T21:34:05.070855+02:00 sw3-2l.mgmt.siski.de %BOOTP_DHCP_CL-I-DHCPCONFIGURED: The device has been configured on interface Vlan 30 , IP 172.16.1.23, mask 255.255.255.0, DHCP server 172.16.1.17
2023-05-15T21:34:05.234446+02:00 sw5-1l.mgmt.siski.de %BOOTP_DHCP_CL-I-DHCPCONFIGURED: The device has been configured on interface Vlan 30 , IP 172.16.1.22, mask 255.255.255.0, DHCP server 172.16.1.17
2023-05-15T21:34:26.954738+02:00 octopus.mgmt.siski.de %LINK-I-Up: gi1/0/24
2023-05-15T21:34:27.618387+02:00 octopus.mgmt.siski.de %LINK-W-Down: gi1/0/24
2023-05-15T21:34:30.401167+02:00 octopus.mgmt.siski.de %LINK-I-Up: gi1/0/24
2023-05-15T21:34:32.090383+02:00 octopus.mgmt.siski.de %LINK-I-Up: gi1/0/16
2023-05-15T21:35:00.719544+02:00 octopus.mgmt.siski.de %LINK-W-Down: gi1/0/24
2023-05-15T21:35:10.431903+02:00 octopus.mgmt.siski.de %LINK-I-CHNGCOMBOMEDIA: Media changed from copper media to fiber media on port te1/0/2.
There is no issue with firmware 2.5.8.15. The network card on the other side causing the crash of firmware 2.5.9.16 is a
Mellanox Technologies MT27710 Family [ConnectX-4 Lx]. It's using the mlx5 driver from ubuntu 22.04.
The card Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] is not causing problems and works also with the newer firmware (but is using the mlx4 driver on a debian buster)
The SFP+ modules are running since years in this switch and were never causing any issues (finisar and similar brands) up to this date.
Detailled description can be given on request.
Regards
08-02-2023 02:13 PM - edited 08-03-2023 11:19 PM
I run into a similar issue as well. Connected are two supermicro servers with 10G SFP+s to the switch.
After a power outage (servers shut down with UPS) the systems never came back up. I started with analysis and saw that the Switch SG350X-24P is in a infinite boot loop.
I tried with disconnecting the ports and it came back up. So I started to plug-in cable for cable. As soon as I attatch one of the supermicro servers it crashed. It doesnt matter if the SFP+ is pluged in. It crashed as soon as the link cames up when the cable is connected on both ends.
Log:
%SYSLOG-F-OSFATAL: mtdSoftwareReset(((rel_ifIndex < (64 * 2))?EXTHWP_SF_phy_port_db_ARR[rel_ifIndex]->mtd_object:EXTHWP_SF_phy_port_db_ARR[0]->mtd_object), HALP _config_phy_port_db[rel_ifIndex].external_phyId, sleep_time_ms) failed with 0x1 ***** FATAL ERROR ***** Reporting Task: HCLT. Software Version: 2.5.9.16 (da te Feb 27 2023 time 16:53:52) base_address=0x0048a000 ros(+0x798c98)[0xc22c98] ros(HOSTG_fatal_error+0x14)[0xc26130] ros(OSSYSG_fatal_error+0x258)[0x1139370 ] ros(OSSYSG_fatal_error_formatted+0x44)[0x1139510] ros(+0x1046f9c)[0x14d0f9c] ros(EXTHWP_SF_set_power_modules_2+0x268)[0x14d12e8] ros(EXTHWG_SF_dispatch+0x
This issue at least occours on 2.5.9.15 and 2.5.9.16. The switch was running fine for some month in the exact same configuration. Trying now to downgrade to 2.5.8.15 as mentioned in oritinal post.
Edit: one more thing: the SFP+ are connected in the combo port while the corresponding copper port is emtpy.
Update: After downgrading to 2.5.8.15 the SFP+ ports and links are now stable since more than 24h without any issues.
12-24-2024 09:48 AM - edited 12-24-2024 09:58 AM
Hello, I have a similar problem. More details at this link: https://community.cisco.com/t5/switches-small-business/system-parameters-reset-to-zero-sg350-28mp/m-p/5240168#M29248.
Update: After downgrading to version 2.5.8.15, the SFP+ ports and links have now been stable for over 24 hours without any problems.
I only have access to the interface for 1 minute. On putty via the console cable I can't enter the username.
I got this by plugging in the console cable
***** FATAL ERROR *****
Reporting Task: DH6C.
Software Version: 2.5.0.83 (date Jun 18 2019 time 16:44:23)
base_address=0xb4733000
ros(+0x78a5f0)[0xb4ebd5f0]
ros(HOSTG_fatal_error+0x10)[0xb4ebfbf4]
ros(OSSYSG_fatal_error+0x2a0)[0xb54895dc]
ros(+0xb58348)[0xb528b348]
ros(+0x849270)[0xb4f7c270]
ros(+0x84ba74)[0xb4f7ea74]
ros(+0x84857c)[0xb4f7b57c]
ros(+0x84d8c4)[0xb4f808c4]
ros(DHCPV6CLIENTP_task+0x3ec)[0xb4f84638]
/lib/libp2linux.so.1(task_run+0xf4)[0xb46c3818]
My biggest concern: have you managed to reboot the system?
12-24-2024 10:13 AM
As long as I stay on Firmware version 2.5.8.15 the switch is booting normal with all Interfaces and SFP+ connected.
As soon as I try a 2.5.9.x version the issue reappears. So currently I‘ve to stick on 2.5.8.15.
12-24-2024 10:19 AM
When the switch kept restarting, how did you update it?
In my case, I only have access to the WEB interface for 30 seconds. When I put in the file ‘image_tesla_hybrid_2.5.9.54_release_cisco_signed.bin’, it stops in progress.
12-24-2024 10:31 AM
As mentioned the issues seems to be SFP+ when the link comes up and the 2.5.9.x versions. So I just disconnected the SFP+ for downgrading. When SFP+ are disconnected the switch came up with 2.5.9.x normally.
12-24-2024 10:39 AM
I don't have an SFP+ module on the switch
12-24-2024 10:46 AM - edited 12-24-2024 11:07 AM
what I got after pressing the ‘reset’ button for 10 seconds with no RJ45 port connected
To perform reset to factory defaults do not release the button for 10 seconds.
Resetting device to factory defaults.
**************************************************
***************** SYSTEM RESET *****************
**************************************************
Restarting system.
BootROM 1.41
Booting from NAND flash
General initialization - Version: 1.0.0
Serdes initialization - Version: 1.0.2
PEX: pexIdx 0, detected no link
DDR3 Training Sequence - Ver TIP-1.56.0
DDR3 Training Sequence - Switching XBAR Window to FastPath Window
Updated Physical Mem size is from 0x20000000 to 10000000
DDR3 Training Sequence - Ended Successfully
BootROM: Image checksum verification PASSED
ROS Booton: May 26 2019 14:16:26
Press x to choose XMODEM...
Booting from NAND flash
Running UBOOT...
U-Boot 2013.01 (Jun 18 2019 - 16:47:02) Marvell version: 2014_T3.0_eng_dropv6 2.5.18
Loading system/images/active-image ...
secure boot not supported
Uncompressing Linux... done, booting the kernel.
I2C frequency 100 kHz (Tclk 200 MHz, freq_m 12, freq_n 3)
MAC address : ac:4a:56:77:4a:9f.
Autoboot in 2 seconds - press RETURN or Esc. to abort and enter prom.
*******************************************************************
*** Running SW Ver. 2.5.0.83 Date Jun 18 2019 Time 16:44:23 ***
*******************************************************************
HW version is V07
Serial Number is DNIxxxxxxxWG
Base Mac address is: ac:aa:aa:aa:4a:9f
Dram size is : 512M bytes
Flash size is: 256M
18-Jun-2019 04:45:19 %CDB-I-LOADCONFIG: Loading running configuration.
18-Jun-2019 04:45:19 %CDB-I-LOADCONFIG: Loading startup configuration.
Device configuration:
Slot 1 - SG350-28MP
Device 0: CPSS_98DX3235 (AlleyCat3)
CPLD version is: 0x03
CPU speed: 800 MHz
------------------------------------
-- Unit Factory Default --
------------------------------------
18-Jun-2019 04:45:33 %INIT-I-InitCompleted: Initialization task is completed
>
-----------------------------------
-- Unit Number 1 Master Enabled --
-----------------------------------
18-Jun-2019 04:45:43 %Environment-W-RPS-STAT-MSG: Power supply source changed to Main Power Supply.
18-Jun-2019 04:45:43 %MLDP-I-MASTER: Switching to the Master Mode.
18-Jun-2019 04:45:46 %Entity-I-SEND-ENT-CONF-CHANGE-TRAP: entity configuration change trap.
18-Jun-2019 04:45:46 %SNMP-I-CDBITEMSNUM: Number of running configuration items loaded: 0
18-Jun-2019 04:45:46 %SNMP-I-CDBITEMSNUM: Number of startup configuration items loaded: 0
The SSH Server is generating a default RSA key.
This may take a few minutes, depending on the key size.
18-Jun-2019 04:45:47 %NT_poe-I-PoEPowerSourceChange: Active power source set to PS for unit 1
The SSH Server is generating a default DSA key.
This may take a few minutes, depending on the key size.
18-Jun-2019 04:45:51 %Environment-I-FAN-STAT-CHNG: FAN# 1 status changed to operational.
18-Jun-2019 04:45:51 %Environment-I-FAN-STAT-CHNG: FAN# 2 status changed to operational.
The SSH Client is generating a default RSA key.
This may take a few minutes, depending on the key size.
The SSH Client is generating a default DSA key.
This may take a few minutes, depending on the key size.
18-Jun-2019 04:46:00 %SSL-I-SSLCTASK: Starting autogeneration of self-signed certificate - 2048 bits
Generating RSA private key, 2048 bit long modulus
18-Jun-2019 04:46:13 %SSL-I-SSLCTASK: Autogeneration of self-signed certificate was successfully completed
Generating RSA private key, 2048 bit long modulus
>lcli
Console baud-rate auto detection is enabled, press Enter twice to complete the detection process
User Name :
Detected speed: 115200
User Name:cisco
Password:*****
Please change your username AND password from the default settings.
Change of credentials is required for better protection of your network.
Please note that new password must follow password complexity rules.
Enter new username: az
Enter new password: ********
Confirm new password: ********
Username and password were successfully updated.
switch774a9f#24-Dec-2024 19:00:58 %DHCPV6CLIENT-I-ADDR: DHCPv6 Address :: received on vlan 1 from DHCP Server fe80::6a3f:7dff:fe3d:6ef0 was renewed
24-Dec-2024 19:00:58 %DHCPV6CLIENT-I-ADDR: DHCPv6 Address :: received on vlan 1 from DHCP Server fe80::6a3f:7dff:fe3d:6ef0 was renewed
24-Dec-2024 19:00:58 %DHCPV6CLIENT-I-ADDR: DHCPv6 Address :: received on vlan 1 from DHCP Server fe80::6a3f:7dff:fe3d:6ef0 was renewed
24-Dec-2024 19:00:58 %DHCPV6CLIENT-I-ADDR: DHCPv6 Address :: received on vlan 1 from DHCP Server fe80::6a3f:7dff:fe3d:6ef0 was renewed
24-Dec-2024 19:00:58 %DHCPV6CLIENT-I-ADDR: DHCPv6 Address :: received on vlan 1 from DHCP Server fe80::6a3f:7dff:fe3d:6ef0 was renewed
24-Dec-2024 19:01:09 %DHCPV6CLIENT-F-HASHINCONS: Hash table inconsistancy, table - DHCPV6CLIENTP_update_address
***** FATAL ERROR *****
Reporting Task: DH6C.
Software Version: 2.5.0.83 (date Jun 18 2019 time 16:44:23)
base_address=0xb46e0000
ros(+0x78a5f0)[0xb4e6a5f0]
ros(HOSTG_fatal_error+0x10)[0xb4e6cbf4]
ros(OSSYSG_fatal_error+0x2a0)[0xb54365dc]
ros(+0xb58348)[0xb5238348]
ros(+0x849270)[0xb4f29270]
ros(+0x84ba74)[0xb4f2ba74]
ros(+0x84857c)[0xb4f2857c]
ros(+0x84d8c4)[0xb4f2d8c4]
ros(DHCPV6CLIENTP_task+0x3ec)[0xb4f31638]
ros(+0x84d8c4)[0xb4f2d8c4]
ros(DHCPV6CLIENTP_task+0x3ec)[0xb4f31638]
/lib/libp2linux.so.1(task_run+0xf4)[0xb4670818]
***** END OF FATAL ERROR *****
**************************************************
***************** SYSTEM RESET *****************
**************************************************
Restarting system.
12-24-2024 11:32 AM
I can't believe it's the DHCPv6 on the livebox that's making everything crash.
12-24-2024 11:49 AM
Could be that this makes sense for my case too.
My Supermocro server is hosting multiple VMs. One of the VMs is a firewall which is providing IPv6 to the connected networks.
12-25-2024 06:56 AM
there's a good chance because since ipv6 has been deactivated I've had no more problems
12-25-2024 07:47 AM
the question is more, why Cisco is not fixing the issue at all? As the issue exists since more than 1.5 years now.
12-28-2024 12:25 AM
I retested to be sure, by activating DHCPv6, the switch restarts permanently, with the version image_tesla_hybrid_2.5.9.54_release_cisco_signed.bin
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide