08-20-2012 10:36 PM - edited 03-07-2019 08:27 AM
Hi All,
We have a 3750x switch with some issues on the arp. Our servers become unreachable for some time, buy when I clear the arp cache, I am able to ping the server again. I have tried adjusting the arp timeout but I don't think doing so would be the permanent solution. Any help is greatly appreciated.
08-21-2012 12:05 AM
Hi All,
Any ideas why this is happening? Is there anything you need me to provide?
Thanks!
08-21-2012 01:22 AM
Hi Mark,
A couple of questions:
Thank you!
Best regards,
Peter
08-21-2012 06:35 AM
Hi Peter,
Thanks for the reply! Answers below:
1) C3750E Software (C3750E-UNIVERSALK9-M), Version 12.2(53)SE2
2-3) Yes, they are VMs configured with NIC teaming, but not using Route based on IP hash (where you would need to aggregate ports on the switch). Previously, they were just connected on an unmanaged 3Com switch. We had the problem after we moved them to the 3750.
4) I don't think so. Can you advise how to check?
5) Yes, the MAC address entry at the arp table is the same before and after I clear the ARP cache.
6) I will revert back when we got the chance as there is no pattern when the servers are unreachable, and I have put the arp timeout to 5 mins.
7) The switch is actually a stack of 2 3750 switches. We are running STP on the 3750 switch to connect redundant links to the access layer switches. If clear arp-cache corrects the corrupted servers' ARP cache, what could be the reason why it persist to get corrupted in less than an hour?
Hope you could help me out.
Thanks!
08-21-2012 09:05 AM
Is the old 3Com switch still in play on the network?
Have you tried sniffing the traffic with wireshark? setup a span port on the server interface then start looking for the ARP traffic.
08-21-2012 09:21 AM
Hi James,
Thanks for the suggestion. I will do that when no one is using the network since we need to reproduce the downtime to check on the issue. For now, we can't afford downtime since the servers are being accessed by our users from overseas on different time zones as well.
Btw, the 3Com has been removed from the network.
08-22-2012 04:18 PM
1) C3750E Software (C3750E-UNIVERSALK9-M), Version 12.2(53)SE2
Stay away from the 12.2(53)SE version (and later).
Try upgrading to 12.2(55)SE5 (or SE6) or 15.0(2)SE.
08-22-2012 04:40 AM
08-22-2012 07:29 AM
Hi Mark,
i was wondering if you can define on all the "access vlan 10" and "access vlan 100" ports, the command switchport mode access and if you can give us an idea about how big is the broadcast domain. Can you confirm that the ip routing command is on?
Alessio
Update: you should shutdown all the interfaces which are not operational and to use storm-control broadcast level 10
by the way 1 VLAN for 150 users might be not the ideal choice especially because the broadcast domain is not limited to the 3750. Enable STP on the 3750 and disable it on the switchports directly connecting the users. You can use portfast on the server switchport on the 3750 stack (vlan 10). I would even do a sniffing session with wiresharke to confirm that a too big amount of broadcast and multicast packets are crossing your network.
08-22-2012 07:57 AM
Hi Alessio,
Thanks, I will issue the access mode. IP routing is on.
VLAN 100, which is for the users, have about 120-150 devices (no user directly connected to the 3750).
VLAN 10, which is for the servers, have about 20 NICs (12 of which are connected to the 3750).
We tried putting back the old dummy switch and the servers worked as they should. Really weird.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide