cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
792
Views
0
Helpful
0
Replies

Random reboots on master switch in stack of SG350X-48-K9 V02

Hello everybody.

 

As in title - the master switch in our stack randomly rebooted second time in the last 2 weeks.

 

Stack contains of 3 SG350X-48-K9 V02 units (one of which is master) and one SG350X-48MP-K9 V04 unit.

 

First reboot came up with this error:

08-Jul-2019 07:14:10 :%STCK SYSL-A-UNITMSG: UNIT ID 1,Msg:ros(+0x78a510)[0xb4e63510] ros(HOSTG_fatal_error+0x10)[0xb4e65b14]
ros(OSSYSG_fatal_error+0x2a0)[0xb542f4fc]
ros(+0x7850d4)[0xb4e5e0d4]
/lib/libc.so.6(+0x25050)[0xb44aa050]
ros(OSSYSG_fatal_error+0x60)[0xb542f2bc]
ros(SW2C_buf_mgr_rx_pkt_handler+0x130)[0xb56fc820]
ros(SW2C_packet_rx_buf_handler+0x1c)[0xb56fd1a4]
ros(+0x10835b8)[0xb575c5b8]
ros(+0x10838a0)[0xb575c8a0]
/lib/libp2linux.so.1(task_run+0xf4)[0xb4669818]

***** END OF FATAL ERROR *****

***** END OF FATAL ERROR *****

08-Jul-2019 07:14:10 :%STCK SYSL-A-UNITMSG: UNIT ID
1,Msg:%SYSLOG-F-OSFATAL: caught segmentation fault exception at address
0xb6ad46b8

signal-num = 11 (SIGSEGV)
signal-code = 2 (SEGV_ACCERR)

reg[00] = 0x1
reg[01] = 0x20
reg[02] = 0xb6ad46b9
reg[03] = 0xa
reg[04] = 0xb771bdfc
reg[05] = 0x1
reg[06] = 0x1e9
reg[07] = 0x5
reg[08] = 0xe
reg[09] = 0x0
reg[10] = 0x0
FP = 0xb71c2b58
IP = 0x20
SP = 0x56bfeb10
LR = 0xb542f514
PC = 0xb542f2bc
CPSR = 0x20000010
Fault Address = 0xb6ad46b8
Trap no = 0xe
Err Code = 0x81f
Old Mask = 0x0

***** FATAL ERROR *****
Reporting Task: EVRX.

 

After this problem we've upgraded firmware from 2.5.0.82 to 2.5.0.83. The second unit became master in stack.

In the last week somehow the stack changed the master from unit 2 to 1 by itself. Today master switch rebooted itself again.

 

The logs show:

%STCK SYSL-A-UNITMSG: UNIT ID 1,Msg:ros(+0x78a5f0)[0xb4e3c5f0]

ros(HOSTG_fatal_error+0x10)[0xb4e3ebf4]

ros(OSSYSG_fatal_error+0x2a0)[0xb54085dc]

ros(+0x7851b 4)[0xb4e371b4] /lib/libc.so.6(+0x25050)[0xb4483050]

ros(OSSYSG_fatal_error+0x60)[0xb540839c]

ros(SW2C_buf_mgr_rx_pkt_handler+0x130)[0xb56d5900]

ros(SW2C_pac ket_rx_buf_handler+0x1c)[0xb56d6284]

ros(+0x10837c0)[0xb57357c0]

ros(+0x1083aa8)[0xb5735aa8] /lib/libp2linux.so.1(task_run+0xf4)[0xb4642818]

***** END OF FATAL ERROR *****

***** END OF FATAL ERROR *****

%STCK SYSL-A-UNITMSG: UNIT ID 1,Msg:%SYSLOG-F-OSFATAL: caught segmentation fault exception at address 0xb6aad928 signal-num = 11 (SIGSEGV) signal-code = 2 ( SEGV_ACCERR) reg[00] = 0x1 reg[01] = 0x20 reg[02] = 0xb6aad929 reg[03] = 0xa reg[04] = 0xb76f4e2c reg[05] = 0x0 reg[06] = 0x21a reg[07] = 0x5 reg[08] = 0x19 reg[09] = 0x0 reg[10] = 0x0 FP = 0xb719bb88 IP = 0x20 SP = 0x56bfeb10 LR = 0xb54085f4 PC = 0xb540839c CPSR = 0x20080010 Fault Address = 0xb6aad 928 Trap no = 0xe Err Code = 0x81f Old Mask = 0x0

***** FATAL ERROR *****

Reporting Task: EVRX. Software Version: 2.5.0.83 (date Jun 18 2019 time 16:44

 

Other problems we have with stack:

  • high CPU utilisation spikes
  • problems with dropped frames on many diffrent interfaces mostly in units 1 and 2 (doesn't matter if port connects to other switch, cash terminal, computer etc.)

Other useful information:

  • stack is connected to FortiGate100E with aggregated link (4 ports, 1 for every switch in stack)
  • the stack is root for STP
  • we used Cisco FindIT probe to display topology of our network but turned it off now.

Any help will be much appreciated.

0 Replies 0