Unable to ping until arp cache cleared - Page 6

samirshaikh52 · ‎02-04-2014

Hello Experts

I have 3 servers connected to Cisco Catalyst Switch C2960 and this switch has uplink to one of the access switch and ultimately this access switch connected to our 2 Core Switch

We are running HSRP and core switches has direct link between them.

Today I encountered an issue these server are unreachable from vlans other that its own. I just cleared arp-cache and it started pinging.

Please can you help since this happening repeadetly.

Thanks

samirshaikh52 · ‎02-06-2014

I am also trying to understand why continous ping to server 2 IP addresses i.e 10.1.1.15, .21, 23 and VIP 10.1.1.17.

creating no issues once stopped the ping then dead

JohnTylerPearce · ‎02-06-2014

So you're clearing the ARP cache on the core and generating a ping packet to 10.1.1.17, 10.1.1.21, 10.1.1.5, and 10.1.1.23 right?

If that's the case there is a stale ARP/MAC entry somehere or a bug.

samirshaikh52 · ‎02-06-2014

yes once cleared and generate ping packets to any one of these IP it will work. Any guess where could be I have no stale entry on Core as you see neither on server.

Thanks

samirshaikh52 · ‎02-06-2014

Also it could be something related to ARP aging time ? Just a thought.

JohnTylerPearce · ‎02-06-2014

Well, that's why I asked if the cluster was active/active. I have no idea how it's distributing the traffic. The default ARP aging time is 4 hours and 5 minutes for MAC Address aging.

With active/active server clusters, you can run into issues from time to time with this.

I just don't know how it' distributing the traffic. Because users are getting to this from remote subnets via 10.1.1.17, which points to the VIP, so obviously the cluster software has to have a way of distributing this traffic. Understanding how this works would help us out.

JohnTylerPearce · ‎02-06-2014

I did some research on the 2960G you mentioned above.

WS-C2960G-24TC-L - Catalyst 2960G IOS 12.2(25)SEE

That specific image according to Cisco, has issues, and the recommended upgrade is

12.2.(44)SE

That's something I would look into doing, if it were me.

samirshaikh52 · ‎02-06-2014

upgrade the switch will produce downtime..

this servers are medical server..very hard to reboot switch.

No other option we can try now..:(

JohnTylerPearce · ‎02-06-2014

I completely understand, it can be very hard to get downtime for anything in those situations.

Like Jon Marshall said way above in this thread......

The VIP is 10.1.1.17, and I'm a little confused on how the cluster service is working. If you think about it logically, if users are going to the VIP, and the ARP entry issues one MAC address, it's going to go to a single NIC/Device unless, it automatically updates new requests with a different MAC in a round robin fashion for the VIP ARP entry.

Generally in a Active/Active system you could do something like.

The software will send out a gratuitous arp (which basically has a destination MAC of ff:ff:ff:ff:ff:ff and a source mac of whatever device sent the gARP and it could do this a round robin basis so to speak.

Or, there could be a VIP , the the software sends the real MAC address of all the ports in this "cluster" in a round robin fashion to so speak. I just have no idea how that software works in your situation.

Have the server guys tried restarting the cluster service, rebooting the servers etc? I understand if you can't do this due to your situation.

samirshaikh52 · ‎02-06-2014

Well so far I can say that no problem in my core or access switch. I guess this is something related within the server infrastructure, one of the below will reslove the issue

- Update IOS on Switch. They are very outdated and as you have mentioned some issue in the past with this image related with arp.

- Reboot the switches

- Reboot the server

JohnTylerPearce · ‎02-06-2014

Well, I didn't say there is an issue directly related to ARP, but that IOS software is very old, and Cisco doesn't even support it, and I noted the recommended update above. If you go to support.cisco.com and then downloads, you can search for your specific switch, and look at known issues, fixes, etc.

samirshaikh52 · ‎02-06-2014

Oop..Yes.

What if upgrade to 15 directly Will there be any obstacles

Thanks

JohnTylerPearce · ‎02-06-2014

I honestly couldn't tell you that. If you go to support.cisco.com, go to your specific switch, and looks at that specific version, that check out the release notes. It will tell you all known bugs and issues fixed with that specific IOS version.

I also can't tell you that will fix the issue, but since no one here has any idea about how the *nix-cluster server is working, and various other things, and the fact that it was working perfectly fine and then it stopped, upgrading the code wouldn't really be a bad thing, considers Cisco doesn't suppose that IOS anymore.

samirshaikh52 · ‎02-06-2014

Also I have tried shuting down/up port on access switch on which 2960 is connected.

samirshaikh52 · ‎02-08-2014

Actually I recieved the wrong information from one of the admin. Sorry for that.

The configuration is Active/Passive.

They are saying duplicate IP address but I cannot see any duplication.

Thanks

samirshaikh52 · ‎02-13-2014

Here is the latest update to this problem.

Today we tried shuting down the server 1 by 1 to see if the this problem remain.Yes, firstly we turned off Server 2 so all the resume transeffer to Server 1, there was a problem after a certain period I cannot ping the real and logical ip addresses of server 1

then we did vice versa, Server 1 down and Server 2 up, still the same problem.

Then we run both the servers at the same time ( active/passive) same problem.

Just for more info.

Server 1 has real ip address 10.1.1.14

Server 2 10.1.1.15

Logical IP addresses : 10.1.1.18, 10.1.1.21, 10.1.1.23 ( services associated to it )

VIP: 10.1.1.17

When one of the server is active it takes the above logical addresses and core switch show arp table to 1 single mac addresses

and standby show arp entry on core for only realy ip address

For eg: If server 1 is active, here is the arp table on core

Internet 10.1.1.15 0 0014.5ebc.7466 ARPA Vlan2

Internet 10.1.1.18 96 0014.5ebc.7466 ARPA Vlan2

Internet 10.1.1.17 96 0014.5ebc.7466 ARPA Vlan2

Internet 10.1.1.23 96 0014.5ebc.7466 ARPA Vlan2

Internet 10.1.1.21 96 0014.5ebc.7466 ARPA Vlan2

Server 2 arp entry on Core

Internet 10.1.1.14 0 0014.5ebc.0c84 ARPA Vlan2

And if the server 2 is active, this will be arp table

Internet 10.1.1.14 0 0014.5ebc.0c84 ARPA Vlan2

Internet 10.1.1.18 96 0014.5ebc.0c84 ARPA Vlan2

Internet 10.1.1.17 96 0014.5ebc.0c84 ARPA Vlan2

Internet 10.1.1.23 96 0014.5ebc.0c84 ARPA Vlan2

Internet 10.1.1.21 96 0014.5ebc.0c84 ARPA Vlan2

Server 1 arp entry on Core :

Internet 10.1.1.15 0 0014.5ebc.7466 ARPA Vlan2