11-02-2018 09:13 AM - edited 03-08-2019 04:32 PM
Hello,
I am having an odd problem in a client's network and it is causing big issues. Please see the (simple) star topology below:
5x Cisco Small Business Switch SG220-50
1x Fortinet FortiWifi 60D firewall
A whole bunch of desktops and printers and servers
The problem we are having is that at very random times, no consistency whatsoever, internal clients lose connectivity to only the gateway which is at x.x.x.1. When this happens the entire office loses their internet connection. All internal resources such as servers and printers are still available and reachable, except the gateway.
When this problem occurs I ran an infinite ping -t to the gateway's IP and what I saw is intermittent replies and timeouts. I thought, because only the gateway is affected, that there would be a machine in the network assuming the gateway's IP address and so causing an IP conflict, but when checking the arp on a computer and checking the MAC address table on the switches, I do not see anything conflicting. Also, when I disconnect the internal interface of the firewall from the network, all pings timeout so there is no other device in the network that is assuming the gateway's IP address.
Now here comes the weird part I cannot explain. While working on this issue I was convinced there was a device in the network causing this. I disconnected cables one by one from the switches and then at some point the connectivity to the gateway is restored. After tracing the cable to the specific workstation I found a computer in sleep mode, so it wasn't even on. I turned it on and did an ipconfig. It had a normal IP address from the DHCP pool. Anyway, the connectivity to the gateway was restored and I called it a night. The next day the office's connection ran perfectly fine until the end of the day. Then the issue started occurring again. To fix it I had to do the exact same thing, but this time the connection got restored after disconnecting different cables on another switch. Again when tracing the cable to a workstation, there is no IP conflict on the computer. Also, after disconnecting the cables and the connection is restored to the gateway, I reconnected the workstations to the switch and everything was still working fine. However, the connection to the gateway keeps going down randomly and the only way to fix it is by disconnecting cables from the switches. I can't figure out what is going on and the times it happens is randomly and also every time I have to disconnect different cables in order to fix the problem.
Also, when this problem occurs I tried connecting my laptop straight into the inside interface of the fortinet firewall and that was working perfectly fine so I do not think the problem is caused by the firewall.
What can be the issue here?
Any help is greatly appreciated.
11-02-2018 09:28 AM
Hello,
a few initial thoughts:
Make sure th DHCP pool excludes the IP address of the default gateway. Also, since the Fortigate seems to be the exit point for the Internet, do you see anything in the logs ?
Also, make sure all the SG switches are running the latest firmware...currently release 1.1.4.1
11-02-2018 09:45 AM
11-02-2018 09:50 AM
Hello,
I don't know what you already did, so I might just mention to check the uptime of the switches and maybe even the Fortigate. Did you reboot all devices ?
11-02-2018 09:56 AM
You could also try and change the uplink port from the SG220 to the Fortigate...
11-02-2018 10:02 AM - edited 11-02-2018 10:06 AM
The switches got rebooted 2 days ago when I upgraded the firmware. After the reboot, the connection got restored until the next day. It seems that everytime the end devices get disconnected and reconnected to the network, that fixes the issue temporarily...
For the Fortinet, I replaced the entire device with a spare they had on the shelf. Restored the config on that device and replaced the production firewall. When the issue occurs, I connected the internal interface of the firewall to multiple ports to multiple switches (one at a time of course) but that didn't do anything either.
11-05-2018 10:13 AM - edited 11-05-2018 10:23 AM
So far everything has been up and running fine since Friday morning. It happened once more on Friday in the early morning (after upgrading the firmware on all the switches on Thursday night). The guy on-site did an arp -a command on a workstation to check for incorrect or duplicate entries. There was none and the MAC address was that of the firewall. After he issued the arp -a command on the workstation, the connection was stable again which I cannot explain what an arp -a has got to do with fixing this issue.
No outages on Friday or over the weekend and still doing fine so far on Monday morning, but I still don't have peace of mind on it as I still don't know what the root cause of the problem is. I feel like it is just a matter of time before it'll happen again.
11-05-2018 10:30 AM
Hello Martijn,
I wonder what happens if you set a permanent ping (ping -t) from one of the workstations to the Fortigate; sort of a surrogate keepalive (the SG switches don't have that option) ?
11-05-2018 12:53 PM
11-05-2018 11:49 AM - edited 11-05-2018 11:50 AM
Hello
@Martijn de Loos wrote:
I can't figure out what is going on and the times it happens is randomly and also every time I have to disconnect different cables in order to fix the problem.
Also, when this problem occurs I tried connecting my laptop straight into the inside interface of the fortinet firewall and that was working perfectly fine so I do not think the problem is caused by the firewall.
What can be the issue here?
Any help is greatly appreciated.
1) Is it possible you are exceeding your maximum concurrent registered internet user allocation on the firewall?
2) you may be experiencing a intermittent loop in you network, one possible way to find the source would be to initiate extended pings from some users and at the same time from the core switch individually (one at a time) disconnect/reconnect a uplink to an access closet, if at that time the ping establishes connection then you have a starting point to where this possible loop is occurring, then it would be just a matter of doing the same test downstream until you find the switch/host port that it causing the problem
11-05-2018 12:50 PM
11-05-2018 01:22 PM - edited 11-05-2018 01:24 PM
Hello
I think this is going to be a mater of discovery, as it does indeed sound like you have a issue with some device looping, do these clients have wifi/wired capability at the same time? - is it possible a client is using some sort of bridging via their network cards?
Do your switches have redundant interconnects to the core, as you only have 5 switches you could check what port should be in a forwarding state and what should be blocking as a baseline and then at the time of outage check again.
Apply some port security like bpduguard/port security maximum/violation/storm control broadcast/multicast on the access ports and make sure you dont have bpdu filtering enabled where you shouldn't have, and Disable any unused ports.
11-05-2018 01:27 PM
11-05-2018 01:29 PM - edited 11-05-2018 01:31 PM
Hello Martin
Well then may I suggest you also apply udld and loopguard features,,,, udld monitors physical unidirectional failures and loopguard detects logical failures.
One thing i forgot to mention DONT enable any error recovery so you can capture any failure regards the features i have mentioned.
11-07-2018 09:07 AM
I have been monitoring the network for the past couple of days. We had zero issues for 5 consecutive days, but last night the disruptions to the gateway came back again. With lots of delays and disconnects I was able to connect to the Cisco switches. From there I checked the logs. I had even turned on debug logging a while back but again, the logs did not show me any information that could be related to this issue. I then started to shutdown ports one by one. At some point I found the port that was causing the issue. As soon as I shut it down, the connection became stable again. When I re-enabled the port, the issue returned. I did this a couple of times back and forth to confirm it is indeed a device behind this connection causing the disruptions. I then left the port off until the next morning.
This morning we physically traced the cable and there was a small 6-port D-Link switch behind it with 2 computers and a printer connected to it. The computers and printer had no conflicting network settings. While we were checking the computers, strangely enough the disruption happened again. We then physically disconnected the D-Link switch from the wall jack and the connection became stable again, even though the switchport was shutdown in the switch itself, so I don't know why this made a difference.
Only an hour later the disruption to the gateway returned again and that D-Link switch + the computers and printer were still physically disconnected from the network. Again, the switch logs didn't tell me anything, but even the coreswitch, where the inside interface of the firewall is directly connected to, was unable to ping the gateway. I checked the MAC table and confirmed that the IP address and MAC address of the firewall were associated to the right switchport on the coreswitch. Moments later the connection became stable again while we didn't do anything.
I'm running out of ideas and just cannot explain why that particular port last night caused the issues (I switched it on and off to confirm the problem came from that port and it seemed it did) and this morning it seems to be coming from somewhere else again. And again, it is only the connection to the gateway that is having issues. Any other device in the LAN can be reached without an issue. We already replaced the firewall for a spare unit and that did not fix the issue either.
As these SG220 switches are brand new and the issues came up only a few days after installing them, I am considering switching back to the old switches and see how that goes. I'm running out of options with how inconsistent this problem is.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide