Hello everybody.
As in title - the master switch in our stack randomly rebooted second time in the last 2 weeks.
Stack contains of 3 SG350X-48-K9 V02 units (one of which is master) and one SG350X-48MP-K9 V04 unit.
First reboot came up with this error:
08-Jul-2019 07:14:10 :%STCK SYSL-A-UNITMSG: UNIT ID 1,Msg:ros(+0x78a510)[0xb4e63510] ros(HOSTG_fatal_error+0x10)[0xb4e65b14]
ros(OSSYSG_fatal_error+0x2a0)[0xb542f4fc]
ros(+0x7850d4)[0xb4e5e0d4]
/lib/libc.so.6(+0x25050)[0xb44aa050]
ros(OSSYSG_fatal_error+0x60)[0xb542f2bc]
ros(SW2C_buf_mgr_rx_pkt_handler+0x130)[0xb56fc820]
ros(SW2C_packet_rx_buf_handler+0x1c)[0xb56fd1a4]
ros(+0x10835b8)[0xb575c5b8]
ros(+0x10838a0)[0xb575c8a0]
/lib/libp2linux.so.1(task_run+0xf4)[0xb4669818]
***** END OF FATAL ERROR *****
***** END OF FATAL ERROR *****
08-Jul-2019 07:14:10 :%STCK SYSL-A-UNITMSG: UNIT ID
1,Msg:%SYSLOG-F-OSFATAL: caught segmentation fault exception at address
0xb6ad46b8
signal-num = 11 (SIGSEGV)
signal-code = 2 (SEGV_ACCERR)
reg[00] = 0x1
reg[01] = 0x20
reg[02] = 0xb6ad46b9
reg[03] = 0xa
reg[04] = 0xb771bdfc
reg[05] = 0x1
reg[06] = 0x1e9
reg[07] = 0x5
reg[08] = 0xe
reg[09] = 0x0
reg[10] = 0x0
FP = 0xb71c2b58
IP = 0x20
SP = 0x56bfeb10
LR = 0xb542f514
PC = 0xb542f2bc
CPSR = 0x20000010
Fault Address = 0xb6ad46b8
Trap no = 0xe
Err Code = 0x81f
Old Mask = 0x0
***** FATAL ERROR *****
Reporting Task: EVRX.
After this problem we've upgraded firmware from 2.5.0.82 to 2.5.0.83. The second unit became master in stack.
In the last week somehow the stack changed the master from unit 2 to 1 by itself. Today master switch rebooted itself again.
The logs show:
%STCK SYSL-A-UNITMSG: UNIT ID 1,Msg:ros(+0x78a5f0)[0xb4e3c5f0]
ros(HOSTG_fatal_error+0x10)[0xb4e3ebf4]
ros(OSSYSG_fatal_error+0x2a0)[0xb54085dc]
ros(+0x7851b 4)[0xb4e371b4] /lib/libc.so.6(+0x25050)[0xb4483050]
ros(OSSYSG_fatal_error+0x60)[0xb540839c]
ros(SW2C_buf_mgr_rx_pkt_handler+0x130)[0xb56d5900]
ros(SW2C_pac ket_rx_buf_handler+0x1c)[0xb56d6284]
ros(+0x10837c0)[0xb57357c0]
ros(+0x1083aa8)[0xb5735aa8] /lib/libp2linux.so.1(task_run+0xf4)[0xb4642818]
***** END OF FATAL ERROR *****
***** END OF FATAL ERROR *****
%STCK SYSL-A-UNITMSG: UNIT ID 1,Msg:%SYSLOG-F-OSFATAL: caught segmentation fault exception at address 0xb6aad928 signal-num = 11 (SIGSEGV) signal-code = 2 ( SEGV_ACCERR) reg[00] = 0x1 reg[01] = 0x20 reg[02] = 0xb6aad929 reg[03] = 0xa reg[04] = 0xb76f4e2c reg[05] = 0x0 reg[06] = 0x21a reg[07] = 0x5 reg[08] = 0x19 reg[09] = 0x0 reg[10] = 0x0 FP = 0xb719bb88 IP = 0x20 SP = 0x56bfeb10 LR = 0xb54085f4 PC = 0xb540839c CPSR = 0x20080010 Fault Address = 0xb6aad 928 Trap no = 0xe Err Code = 0x81f Old Mask = 0x0
***** FATAL ERROR *****
Reporting Task: EVRX. Software Version: 2.5.0.83 (date Jun 18 2019 time 16:44
Other problems we have with stack:
- high CPU utilisation spikes
- problems with dropped frames on many diffrent interfaces mostly in units 1 and 2 (doesn't matter if port connects to other switch, cash terminal, computer etc.)
Other useful information:
- stack is connected to FortiGate100E with aggregated link (4 ports, 1 for every switch in stack)
- the stack is root for STP
- we used Cisco FindIT probe to display topology of our network but turned it off now.
Any help will be much appreciated.