cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
736
Views
0
Helpful
0
Replies

LACP needs reset to get online (?)

tfhelp
Level 1
Level 1

Last months I've been trying many times (it needs to be done in the night, so it's sometimes frustrating and even scary), but I finally got my LACP-portgroup to work. Even though it works now, I hope you can help me understanding the situation because I don't understand it and because of this I'm quite anxious for new problems in production.

 

To be clear: it works now. It's just that I don't understand what went wrong and how to prevent this.

 

The problem: I've got 2 Cisco SG350XG-2F10 12-Port 10G Stackable Managed Switches connected to 4 virtual hosts (VRT11 till VRT14) among other things. I could set up all but 1 virtual host. Only one (VRT12) gave me nightmares.

I tried many things; swapping network cards, reïnstalling Linux, changing switch, other ports, checking configuration 100 times etc. etc.. For weeks I went back to my old switch for this server because I couldn't get it up.

 

What happened is:

- I configured the server to use LACP (bonding, Linux Debian Stretch / Proxmox)

- I connected the network cables to the ports

- Link doesn't get up (no lights, no connection...)

 

Last night, I tried it again (swapped server hardware, reinstalled Linux, different network cards) but all without any improvement... but suddenly something interesting happened. When I connected the server to the 5th (empty) LACP-group that I set up for testing, the connection suddenly got online. But when I switched the cables back to the 2th LACP-group (the one that the server belongs to) it stopped working, even after I tried to reconnect to the 5th group. Just like it has been blocked after one time or something.

After getting more and more frustrated I disabled LACP group 2-port and re-enabled it. Suddenly everything worked.

 

It seems that the switch actively refused the connection in most cases and because I disabled and enabled the port the blockage is over. Is it possible this is related to (R)STP and/or another detection method? As far as I know i didn't set up extra security or difficult configuration... Why was it necessary to disable/enable the port? And does this work "always" to solve the issue or is it just luck (I've re-configured the LACP and ports so many times I thought this wouldn't be relevant)

 

(because it's a production environment and the strange situation I prefer not to experiment that much, but I hope to comprehend what happened so I know what to do in future if it's going to repeat or something)

 

(P.S. I'm sorry, I just read I should post this on Small Business forum but I can't find a move-option or action button)

0 Replies 0
Review Cisco Networking for a $25 gift card