ARP/Ping Issues

mperdue20 · ‎12-01-2016

recently we had a power outage in our building which cause our core and 3 switch stack to lose power. 2 switch stack on a separate side of the building did not lose power due to a generator owned by another company in the building. My core is a 4507 and my switch stack are 2960x. the issue we are having is the core can ping every switch stack except for one. when i do a show ip arp x.x.x.x for that switch stack the arp table will say incomplete. when i ssh into each switch stack and ping or do a sh ip arp for the stack im having issues with all the stacks except for one cannot ping the stack and either has no arp entry or incomplete arp entry. One switch stack out of 5 can ping the troubled stack and has a arp table for the troubled stack. So im confused as to why the core and other stacks cannot ping and have bad arp entries when one stack can ping and has an arp entry for the troubled switch stack. I have cleared the arp-cache on all switch stacks and core so at this point im lost. can anyone assist? Also i can ssh into the troubled stack with no issue, but when i ping anything from that stack only one switch stack responds and only one switch stack has a arp entry from the troubled stack.

GRANT3779 · ‎12-01-2016

You say you can SSH into the troubled stack. What is the source address are you coming in from when you SSH? Is it in the same range that you are trying to source your pings from on the Core?

Has some unsaved config potentially been lost to any trunk links when the power went?

When trying to ping your "troubled" switch, have you sourced from all SVIs on the Core?

What VLANs are allowed over any trunks to the "troubled" stack?

mperdue20 · ‎12-01-2016

yes i have my pc in the same subnet as the core and all switches

i checked the saved config file and all matches

i have ssh into every switch to try and ping the troubled switch. out of 5 switches including the core, only one switch can ping the troubled switch. the core and other 4 switches cannot.

the appropriate vlans are on all the trunk ports as well.

GRANT3779 · ‎12-02-2016

From what I know, When you ping from say your core device, it will automatically create an incomplete entry in the arp table. If it doesn't receive a reply the entry stays there as incomplete for a certain amount of time before removing itself - Just tested this myself to check -

sh ip int brief
Interface IP-Address OK? Method Status Protocol

Vlan970 10.44.70.1 YES manual up up

UK-6880X-CORE-01#ping 10.44.70.101 ( This address does not exist on the connected interface)
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.44.70.101, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

UK-6880X-CORE-01#sh arp | inc 101
Internet 10.44.0.5 4 00a0.c900.0101 ARPA Vlan500
Internet 10.44.0.13 101 0025.b5a1.010b ARPA Vlan500
Internet 10.44.1.101 1 842b.2b16.87b2 ARPA Vlan501
Internet 10.44.23.101 1 f8b1.56da.b855 ARPA Vlan523
Internet 10.44.45.101 18 f41f.c267.be90 ARPA Vlan645
Internet 10.44.61.101 5 0020.6b9b.3f9e ARPA Vlan861
Internet 10.44.70.5 3 00a0.c901.0101 ARPA Vlan970
Internet 10.44.70.101 0 Incomplete ARPA

This entry flushed itself out not too long afterwards.

UK-6880X-CORE-01#sh arp | inc 101
Internet 10.44.0.5 9 00a0.c900.0101 ARPA Vlan500
Internet 10.44.1.101 2 842b.2b16.87b2 ARPA Vlan501
Internet 10.44.23.101 2 f8b1.56da.b855 ARPA Vlan523
Internet 10.44.45.101 22 f41f.c267.be90 ARPA Vlan645
Internet 10.44.61.101 0 0020.6b9b.3f9e ARPA Vlan861
Internet 10.44.61.113 101 000d.2700.0951 ARPA Vlan861
Internet 10.44.70.5 8 00a0.c901.0101 ARPA Vlan970

This is not an answer as to what is stopping the ARP replies in your case, but just some information in general.

A topology might help in understanding your issue.

mperdue20 · ‎12-02-2016

so i have a 4507 core, connected to that core i have as follows:

core int g1/1 directly connected via fiber to switch stack #1 (4 - 2960x's)

core int g1/2 directly connected via fiber to switch stack #2 (4 - 2960x's)

core int g1/3 directly connected via fiber to switch stack #3 (4 - 2960x's)

core int g1/4 directly connected via fiber to switch stack #4 (2 - 2960x's) - troubled switch

core int g1/5 directly connected via fiber to switch #5 (single switch)

switch #5 int g1/0/51 directly connected via fiber to switch #6 (single switch)

core can ping and ARP all switches EXCEPT #4

#3 CAN ping and ARP #4, AND all other switches/core

#1, #2, #5, #6 cannot ping or ARP #4, BUT can ping and ARP other switches/core

#4 can ONLY ping and ARP #3

hopefully this helps.

GRANT3779 · ‎12-02-2016

I am assuming all switches involved share a common management subnet and and these are the addresses being used for your testing? You said you could SSH to the troubled switch from a machine. Where was this PC plugged into on your network?

You also said there were two stacks that did not reboot during the power outage, was this stacks 3 and 4? Might be worth reloading the troubled switch stack one member at a time.

mperdue20 · ‎12-02-2016

#4, #5, and #6 did not reboot during the outage. my pc is on #1.

Yes all switches and core have same vlan and subnet information. trunk ports allowed vlans are all set and correct also.

that what i was thinking was to reload the troubled switches. Ill have to do that over the weekend after hours. ill try that and see if that helps.