cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
7688
Views
0
Helpful
9
Replies

Cisco 3750 Switch Need to Constantly Clear ARP Cache

markrebuldad
Level 1
Level 1

Hi All,

We have a 3750x switch with some issues on the arp. Our servers become unreachable for some time, buy when I clear the arp cache, I am able to ping the server again. I have tried adjusting the arp timeout but I don't think doing so would be the permanent solution. Any help is greatly appreciated.

9 Replies 9

markrebuldad
Level 1
Level 1

Hi All,

Any ideas why this is happening? Is there anything you need me to provide?

Thanks!

Hi Mark,

A couple of questions:

  1. What exact IOS version are you running?
  2. Are your servers connected by multiple NICs  into the network?
  3. If yes, are you using any high availability feature on your servers that somehow manage the activation and use of multiple NICs?
  4. Is there any possibility for ARP spoofing attack, either deliberate or inadvertent?
  5. When unable to ping a server, is the IP/MAC mapping shown in show ip arp output correct, i.e. do you see the correct MAC being mapped to the server's IP?
  6. If this mapping appears to be correct but the server still cannot be pinged, issue the show adjacency IP encapsulation command where IP is the IP address of the server. You will see the pre-constructed frame header in this output in hexadecimal, the first twelve hexadecimal digits (i.e. the first 6 bytes) being the destination MAC address. Verify that this MAC address is correct (the server's MAC) and that it is identical to the MAC address seen in the show ip arp output from the previous step.
  7. Are you running protocols such as HSRP on the routers/multilayer switches that connect your servers to other networks? Is it possible that there is an IP address conflict on the routers themselves? I am thinking about a problem in the reverse direction - that the servers are actually receiving the pings but they are responding to a wrong device. The reason why clear arp-cache command helps is because the switch itself sends gratuituous ARPs after this command, possibly correcting corrupted servers' ARP caches.

Thank you!

Best regards,

Peter

Hi Peter,

Thanks for the reply! Answers below:

1) C3750E Software (C3750E-UNIVERSALK9-M), Version 12.2(53)SE2

2-3) Yes, they are VMs configured with NIC teaming, but not using Route based on IP hash (where you would need to aggregate ports on the switch). Previously, they were just connected on an unmanaged 3Com switch. We had the problem after we moved them to the 3750.

4) I don't think so. Can you advise how to check?

5) Yes, the MAC address entry at the arp table is the same before and after I clear the ARP cache.

6) I will revert back when we got the chance as there is no pattern when the servers are unreachable, and I have put the arp timeout to 5 mins.

7) The switch is actually a stack of 2 3750 switches. We are running STP on the 3750 switch to connect redundant links to the access layer switches. If clear arp-cache corrects the corrupted servers' ARP cache, what could be the reason why it persist to get corrupted in less than an hour?

Hope you could help me out.

Thanks!

Is the old 3Com switch still in play on the network?

Have you tried sniffing the traffic with wireshark? setup a span port on the server interface then start looking for the ARP traffic.

Hi James,

Thanks for the suggestion. I will do that when no one is using the network since we need to reproduce the downtime to check on the issue. For now, we can't afford downtime since the servers are being accessed by our users from overseas on different time zones as well.

Btw, the 3Com has been removed from the network.

1) C3750E Software (C3750E-UNIVERSALK9-M), Version 12.2(53)SE2

Stay away from the 12.2(53)SE version (and later).

Try upgrading to 12.2(55)SE5 (or SE6) or 15.0(2)SE.

Hi Guys,

I think it would help if I attach the show tech-support. Any more thoughts?

Thanks!

Hi Mark,

i was wondering if you can define on all the "access vlan 10" and "access vlan 100" ports, the command switchport mode access and if you can give us an idea about how big is the broadcast domain. Can you confirm that the ip routing command is on?

Alessio

      

Update:  you should shutdown all the interfaces which are not operational and to use storm-control broadcast level 10

       by the way 1 VLAN for 150 users might be not the ideal choice especially because the broadcast domain is not limited to the 3750. Enable STP on the 3750 and disable it on the switchports directly connecting the users. You can use portfast on the server switchport on the 3750 stack (vlan 10). I would even do a sniffing session with wiresharke to confirm that a too big amount of broadcast and multicast packets are crossing your network.

Hi Alessio,

Thanks, I will issue the access mode. IP routing is on.

VLAN 100, which is for the users, have about 120-150 devices (no user directly connected to the 3750).

VLAN 10, which is for the servers, have about 20 NICs (12 of which are connected to the 3750).

We tried putting back the old dummy switch and the servers worked as they should. Really weird.

Review Cisco Networking for a $25 gift card