06-15-2012 02:16 PM - edited 03-07-2019 07:16 AM
Principle Facts:
1-6509 distribution switch, L2&L3, up for over 1 year
2-6509 access switches, L2 only
2 vlans, source vlan 160 on 1 access switch, dest vlan 1115, multiple ports and ip addrs on both access switches.
Some of the dest ports are servers and some are unintelligent virtual devices with little to no configurations.
all ports up/connected, and pingable from dist switch. all vlan 1115 ip addresses are on the same subnet with /24 mask.
Problem:
Some of the dest ip addresses are and some of the dest ip addresses are not pingable from the source vlan 160 ip addresses.
(NO ACL interference),(End Units on vlan1115 configured properly with right DFGW), (Could not determine any common threads)
(Some vlan 1115 ip addrs were pingable, some were not)
Everything was pointing to problems on the end units (vlan1115) configuration wise, but could not confirm that.
We were close, it turned out NOT to be a problem in the configurations, but an obvious problem within the arp table of the distribution switch.
Once the ARP table was cleared, all connectivity resumed. The working tech determined that one possibility was that the servers on vlan 1115
did not have the right mac addr of its default gateway to get back out to the source. He figured that clearing the arp table would force them to re-learn the right mac information.
One thing I forgot too mention, this was all working in the morning. At about 2pm that day, customer started noticing the problem.
No changes in the network or on the servers were made that day.
Question is:
Does anyone have a theory on how this could have happened? and more important, how can we prevent this surprise from happening again?
Thanks. Tony
06-15-2012 11:54 PM
Hello Tony,
Is it at least remotely possible that someone exerted an ARP Spoofing attack on your network? That would explain the behavior - poisoned ARP caches would cause your devices to incorrectly address outbound frames containing IP packets. This is of course hard to prove at this point.
Are you using any gateway redundancy protocol like HSRP, VRRP or GLBP? Also, were you perhaps able to specifically detect an anomaly in the ARP tables, like an obviously incorrect MAC address being mapped to an IP address? Where exactly were the ARP tables incorrect - on the distribution switch or on the servers (end hosts)?
Best regards,
Peter
06-18-2012 06:43 AM
Peter,
thanks for the response. We are using HSRP on the VLAN1115 interface on the Dist Switch. and the Vlan160 interface on the dist switch. I don't think he specifically saw the anomaly, but his extended, non-timeout ping of the server ip address, combined with the incrementing of matches on the ACL, told him the server was getting the ping, but it became obvious that the server didn't not know the return trip.. (We kind of determined that before anyway). So, he figured since some these devices had no real configuration, that the problem was either a bug, or the server had the wrong Mac address of its DFGW. Hence, he cleared ARP Cache.. Still trying to make sense of it..
Tony
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide