Problem with stale nd entry on server ASA devices

angelohongens · ‎02-05-2016

Hey,

I'm trying to debug an issue we have with our cluster software. The problem is ipv6 failover doesn't work, and I suspect the Cisco ASA's to be the culprit. But if could also be my lack of understanding ;)

We use several clustered machines in our network. The simplest setup is two simple linux vm's running corosync/pacemaker/heartbeat to keep one ipv4 and one ipv6 virtual ip up and running in a passive-active setup. They have a single nic eth0. We have a vmware setup and a hyper-v setup, I see the same issue on both networks. The machines are behind Cisco ASA devices.

I only have this problem in our internal networks behind the ASA's, we don't see the issue outside of our network (our ISP's routers seem to behave nicely).

Down to the nitty gritty (mac addresses and public ip’s have been altered a bit)

machine1 has mac 00:50:12:34:26:c6, dedicated ipv6 address 2001:1234:1234:30::91/64
machine2 has mac 00:50:12:34:10:42, dedicated ipv6 address 2001:1234:1234:30::92/64

The virtual ip they’re holding up is 2001:1234:1234:30::90. The ip will prefer to be on machine1.

If I do a ping from outside to ip 2001:1234:1234:30::90, that works perfectly. I see the pings going to machine2 using tcpdump.

If I look at the neighbour cache, I see what I would expect:

nmt-frw-01# sh ipv6 neighbor misc
IPv6 Address Age Link-layer Addr State Interface
2001:1234:1234:30::90 0 0050.1234.26c6 REACH MISC
2001:1234:1234:30::91 0 0050.1234.26c6 REACH MISC
2001:1234:1234:30::92 0 0050.1234.1042 REACH MISC

Three minutes later, i see the cache entries turned stale:

nmt-frw-01# sh ipv6 neighbor misc 
IPv6 Address Age Link-layer Addr State Interface
2001:1234:1234:30::90 3 0050.1234.26c6 STALE MISC
2001:1234:1234:30::91 3 0050.1234.26c6 STALE MISC
2001:1234:1234:30::92 3 0050.1234.1042 STALE MISC

That's not exactly what I would expect, since I still have a ping open that pings every second, but okay.

If I do a cluster failover (I set machine1 to go to standby), the vip moves to machine2, and machine2 sends out an unsolicited neighbour advertisement:

13:50:03.252717 IP6 2001:1234:1234:30::90 > ff02::1: ICMP6, neighbor advertisement, tgt is 2001:1234:1234:30::90, length 32

If I look at the neighbour cache on the ASA, I see the entry is updated immediately, although still stale:

nmt-frw-01# sh ipv6 neighbor misc
IPv6 Address Age Link-layer Addr State Interface
2001:1234:1234:30::90 4 0050.1234.1042 STALE MISC
2001:1234:1234:30::91 4 0050.1234.26c6 STALE MISC
2001:1234:1234:30::92 4 0050.1234.1042 STALE MISC

However, the traffic still flows to machine1! I can still see the pings going to machine1 using tcpdump. And this, ladies and gentlemen, is my big issue!

If I do a clear ipv6 neighbours, the traffic is then sent to machine2, and everything works as expected again.

Does anyone have an idea what the problem is? If I read the documentation about ND, I would think that a STALE entry is exactly that: stale. When the cisco wants to send the next packet to the ip, it will do an NS again to confirm it's reachable. But that does not seem to happen. Are my expectations wrong? Or is this a known bug in our cisco device? I know we're not running the latest and greatest versions of the devices, but we don't have the time and money to upgrade them all yet (and not every device has the required memory I think).

I see on this device we’re running asa821-k8.bin. Yes, it’s a bit older, I know. Running the default configuration on the interface:

nmt-frw-01# sh ipv6 interface MISC
MISC is up, line protocol is up
 IPv6 is enabled, link-local address is fe80::ca4c:73ff:fe52:1fcf 
 Global unicast address(es):
 2001:1234:1234:30::1, subnet is 2001:1234:1234:30::/64 
 Joined group address(es):
 ff02::1
 ff02::2
 ff02::1:ff00:1
 ff02::1:ff52:1fcf
 ICMP error messages limited to one every 100 milliseconds
 ICMP redirects are enabled
 ND DAD is enabled, number of DAD attempts: 1
 ND reachable time is 30000 milliseconds
 ND advertised reachable time is 0 milliseconds
 ND advertised retransmit interval is 1000 milliseconds
 ND router advertisements are sent every 200 seconds
 ND router advertisements live for 1800 seconds
 Hosts use stateless autoconfig for addresses.

angelohongens · ‎02-05-2016

Hmm.. Could be bug CSCto81636..