cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
700
Views
0
Helpful
1
Replies

SG350X-24P - Crash with constant reboot with Firmware 2.5.9.16

Hello alltogether,

I'm using a SG350X-24P and I upgraded from firmware 2.5.8.15 to firmware 2.5.9.16. After my workstation with Mellanox 10G card (connected via fiber to the uplink ports) came up, the switch crashed immediately with a hard reboot in an endless loop until the fibre connection was removed.

The following error was logged via syslog to my small server system:

2023-05-15T21:22:41.387964+02:00 octopus.mgmt.siski.de %LINK-I-CHNGCOMBOMEDIA: Media changed from copper media to fiber media on port te1/0/2.   
2023-05-15T21:34:00.980374+02:00 octopus.mgmt.siski.de %SYSLOG-F-OSFATAL: mtdSoftwareReset(((rel_ifIndex < (64 * 2))?EXTHWP_SF_phy_port_db_ARR[rel_ifIndex]->mtd_object:EXTHWP_SF_phy_port_db_ARR[0]->mtd_object), HALP_config_phy_port_db[rel_i
fIndex].external_phyId, sleep_time_ms) failed with 0x1  ***** FATAL ERROR *****   Reporting Task: HCLT.  Software Version: 2.5.9.16 (date Feb 27 2023 time 16:53:52)  base_address=0x00444000   
2023-05-15T21:34:00.980966+02:00 octopus.mgmt.siski.de ros(+0x798c98)[0xbdcc98]  ros(HOSTG_fatal_error+0x14)[0xbe0130]  ros(OSSYSG_fatal_error+0x258)[0x10f3370]  ros(OSSYSG_fatal_error_formatted+0x44)[0x10f3510]  ros(+0x1046f9c)[0x148af9c]  
ros(EXTHWP_SF_set_power_modules_2+0x268)[0x148b2e8]  ros(EXTHWG_SF_dispatch+0x78)[0x148cef4]  ros(HALP_config_phy_set_power_modules_4+0x14)[0x1444748]  ros(HALP_config_phy_perform_phy_operation+0xe8)[0x1430e08]  ros(HALC_config_phy_perform
_phy_operation+0xfc)[0x1445588]  ros(+0xfcb458)[0x140f458]  ros(HALC_config_if_dispatch+0x230)[0x1418eac]  ros(+0xfdd12c)[0x142112c]  ros(+0xfdd350)[0x1421350]  ros(HALP_config_main_copy_big_dev_data+0x0)[0x14213d0]  /lib/libp2linux.so.1(ta
sk_run+0xf4)[0xb6f3a840]    ***** END OF FATAL ERROR *****    
2023-05-15T21:34:00.981567+02:00 octopus.mgmt.siski.de %SYSLOG-F-OSFATAL: mtdSoftwareReset(((rel_ifIndex < (64 * 2))?EXTHWP_SF_phy_port_db_ARR[rel_ifIndex]->mtd_object:EXTHWP_SF_phy_port_db_ARR[0]->mtd_object), HALP_config_phy_port_db[rel_i
fIndex].external_phyId, sleep_time_ms) failed with 0x1  ***** FATAL ERROR *****   Reporting Task: HCLT.  Software Version: 2.5.9.16 (date Feb 27 2023 time 16:53:52)  base_address=0x004c5000   
2023-05-15T21:34:00.982076+02:00 octopus.mgmt.siski.de ros(+0x798c98)[0xc5dc98]  ros(HOSTG_fatal_error+0x14)[0xc61130]  ros(OSSYSG_fatal_error+0x258)[0x1174370]  ros(OSSYSG_fatal_error_formatted+0x44)[0x1174510]  ros(+0x1046f9c)[0x150bf9c]
ros(EXTHWP_SF_set_power_modules_2+0x268)[0x150c2e8]  ros(EXTHWG_SF_dispatch+0x78)[0x150def4]  ros(HALP_config_phy_set_power_modules_4+0x14)[0x14c5748]  ros(HALP_config_phy_perform_phy_operation+0xe8)[0x14b1e08]  ros(HALC_config_phy_perform
_phy_operation+0xfc)[0x14c6588]  ros(+0xfcb458)[0x1490458]  ros(HALC_config_if_dispatch+0x230)[0x1499eac]  ros(+0xfdd12c)[0x14a212c]  ros(+0xfdd350)[0x14a2350]  ros(HALP_config_main_copy_big_dev_data+0x0)[0x14a23d0]  /lib/libp2linux.so.1(ta
sk_run+0xf4)[0xb6ea9840]    ***** END OF FATAL ERROR *****    
2023-05-15T21:34:00.982566+02:00 octopus.mgmt.siski.de %SYSLOG-F-OSFATAL: mtdSoftwareReset(((rel_ifIndex < (64 * 2))?EXTHWP_SF_phy_port_db_ARR[rel_ifIndex]->mtd_object:EXTHWP_SF_phy_port_db_ARR[0]->mtd_object), HALP_config_phy_port_db[rel_i
fIndex].external_phyId, sleep_time_ms) failed with 0x1  ***** FATAL ERROR *****   Reporting Task: HCLT.  Software Version: 2.5.9.16 (date Feb 27 2023 time 16:53:52)  base_address=0x00499000   
2023-05-15T21:34:00.983063+02:00 octopus.mgmt.siski.de ros(+0x798c98)[0xc31c98]  ros(HOSTG_fatal_error+0x14)[0xc35130]  ros(OSSYSG_fatal_error+0x258)[0x1148370]  ros(OSSYSG_fatal_error_formatted+0x44)[0x1148510]  ros(+0x1046f9c)[0x14dff9c]
ros(EXTHWP_SF_set_power_modules_2+0x268)[0x14e02e8]  ros(EXTHWG_SF_dispatch+0x78)[0x14e1ef4]  ros(HALP_config_phy_set_power_modules_4+0x14)[0x1499748]  ros(HALP_config_phy_perform_phy_operation+0xe8)[0x1485e08]  ros(HALC_config_phy_perform
_phy_operation+0xfc)[0x149a588]  ros(+0xfcb458)[0x1464458]  ros(HALC_config_if_dispatch+0x230)[0x146deac]  ros(+0xfdd12c)[0x147612c]  ros(+0xfdd350)[0x1476350]  ros(HALP_config_main_copy_big_dev_data+0x0)[0x14763d0]  /lib/libp2linux.so.1(ta
sk_run+0xf4)[0xb6f73840]    ***** END OF FATAL ERROR *****    
2023-05-15T21:34:00.983690+02:00 octopus.mgmt.siski.de %SYSLOG-F-OSFATAL: mtdSoftwareReset(((rel_ifIndex < (64 * 2))?EXTHWP_SF_phy_port_db_ARR[rel_ifIndex]->mtd_object:EXTHWP_SF_phy_port_db_ARR[0]->mtd_object), HALP_config_phy_port_db[rel_i
fIndex].external_phyId, sleep_time_ms) failed with 0x1  ***** FATAL ERROR *****   Reporting Task: HCLT.  Software Version: 2.5.9.16 (date Feb 27 2023 time 16:53:52)  base_address=0x004fb000   
2023-05-15T21:34:00.984208+02:00 octopus.mgmt.siski.de ros(+0x798c98)[0xc93c98]  ros(HOSTG_fatal_error+0x14)[0xc97130]  ros(OSSYSG_fatal_error+0x258)[0x11aa370]  ros(OSSYSG_fatal_error_formatted+0x44)[0x11aa510]  ros(+0x1046f9c)[0x1541f9c]
ros(EXTHWP_SF_set_power_modules_2+0x268)[0x15422e8]  ros(EXTHWG_SF_dispatch+0x78)[0x1543ef4]  ros(HALP_config_phy_set_power_modules_4+0x14)[0x14fb748]  ros(HALP_config_phy_perform_phy_operation+0xe8)[0x14e7e08]  ros(HALC_config_phy_perform
_phy_operation+0xfc)[0x14fc588]  ros(+0xfcb458)[0x14c6458]  ros(HALC_config_if_dispatch+0x230)[0x14cfeac]  ros(+0xfdd12c)[0x14d812c]  ros(+0xfdd350)[0x14d8350]  ros(HALP_config_main_copy_big_dev_data+0x0)[0x14d83d0]  /lib/libp2linux.so.1(ta
sk_run+0xf4)[0xb6f56840]    ***** END OF FATAL ERROR *****    
2023-05-15T21:34:00.986547+02:00 octopus.mgmt.siski.de %SYSLOG-N-LOGGING: Logging started.   
2023-05-15T21:34:05.070855+02:00 sw3-2l.mgmt.siski.de %BOOTP_DHCP_CL-I-DHCPCONFIGURED: The device has been configured on interface Vlan 30 , IP 172.16.1.23, mask 255.255.255.0, DHCP server 172.16.1.17    
2023-05-15T21:34:05.234446+02:00 sw5-1l.mgmt.siski.de %BOOTP_DHCP_CL-I-DHCPCONFIGURED: The device has been configured on interface Vlan 30 , IP 172.16.1.22, mask 255.255.255.0, DHCP server 172.16.1.17    
2023-05-15T21:34:26.954738+02:00 octopus.mgmt.siski.de %LINK-I-Up:  gi1/0/24   
2023-05-15T21:34:27.618387+02:00 octopus.mgmt.siski.de %LINK-W-Down:  gi1/0/24   
2023-05-15T21:34:30.401167+02:00 octopus.mgmt.siski.de %LINK-I-Up:  gi1/0/24   
2023-05-15T21:34:32.090383+02:00 octopus.mgmt.siski.de %LINK-I-Up:  gi1/0/16   
2023-05-15T21:35:00.719544+02:00 octopus.mgmt.siski.de %LINK-W-Down:  gi1/0/24   
2023-05-15T21:35:10.431903+02:00 octopus.mgmt.siski.de %LINK-I-CHNGCOMBOMEDIA: Media changed from copper media to fiber media on port te1/0/2.  

There is no issue with firmware 2.5.8.15. The network card on the other side causing the crash of firmware 2.5.9.16 is a
Mellanox Technologies MT27710 Family [ConnectX-4 Lx].  It's using the mlx5 driver from ubuntu 22.04.

The card Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] is not causing problems and works also with the newer firmware (but is using the mlx4 driver on a debian buster)
The SFP+ modules are running since years in this switch and were never causing any issues (finisar and similar brands) up to this date.

Detailled description can be given on request.

Regards

1 Reply 1

vistalba
Level 1
Level 1

I run into a similar issue as well. Connected are two supermicro servers with 10G SFP+s to the switch.
After a power outage (servers shut down with UPS) the systems never came back up. I started with analysis and saw that the Switch SG350X-24P is in a infinite boot loop.
I tried with disconnecting the ports and it came back up. So I started to plug-in cable for cable. As soon as I attatch one of the supermicro servers it crashed. It doesnt matter if the SFP+ is pluged in. It crashed as soon as the link cames up when the cable is connected on both ends.

Log:

 

 

%SYSLOG-F-OSFATAL: mtdSoftwareReset(((rel_ifIndex < (64 * 2))?EXTHWP_SF_phy_port_db_ARR[rel_ifIndex]->mtd_object:EXTHWP_SF_phy_port_db_ARR[0]->mtd_object), HALP _config_phy_port_db[rel_ifIndex].external_phyId, sleep_time_ms) failed with 0x1 ***** FATAL ERROR *****  Reporting Task: HCLT. Software Version: 2.5.9.16 (da te Feb 27 2023 time 16:53:52) base_address=0x0048a000 ros(+0x798c98)[0xc22c98] ros(HOSTG_fatal_error+0x14)[0xc26130] ros(OSSYSG_fatal_error+0x258)[0x1139370 ] ros(OSSYSG_fatal_error_formatted+0x44)[0x1139510] ros(+0x1046f9c)[0x14d0f9c] ros(EXTHWP_SF_set_power_modules_2+0x268)[0x14d12e8] ros(EXTHWG_SF_dispatch+0x

 

 

This issue at least occours on 2.5.9.15 and 2.5.9.16. The switch was running fine for some month in the exact same configuration. Trying now to downgrade to 2.5.8.15 as mentioned in oritinal post.

Edit: one more thing: the SFP+ are connected in the combo port while the corresponding copper port is emtpy.

Update: After downgrading to 2.5.8.15 the SFP+ ports and links are now stable since more than 24h without any issues.