I have an IPsec L2L tunnel between two ASA 5525-x firewalls running 9.0(2), negotiating IKEv2 with certificate authentication of the endpoints. Frequently, as expected, SA's will rekey due to time or data rollover, logging things like %ASA-7-702307 ... is rekeying due to data rollover.
Sometimes, maybe 10% of the time, both ends try to rekey and the collision detection kicks in and deletes one of the sets of newly proposed replacement SA's. The winner logs %ASA-5-750005: ... IPsec rekey collision detected. I am lowest nonce initiator, deleting SA ... while the loser logs
%ASA-4-750003: ... Negotiation aborted due to ERROR: Failed to insert SA due to ipsec rekey collision. The new surviving SA pair takes over and my packets continue to flow across the tunnel.
Once in a while, the rekey fails, the tunnel dies, and ongoing TCP sessions crash. In this case at least one side will log something like:
%ASA-5-750007: ... SA DOWN. Reason: IPsec rekey collision handling failed
%ASA-4-113019: ... Session disconnected. Session Type: LAN-to-LAN, Duration: ... Reason: Phase 2 Error
Has anyone else seen this?
Does anyone have a fix or a workaround, like upgrading to 9.1 firmware, or downgrading to IKEv1 negotations, or some mystical tweak to the tunnel-group settings?
I'm not sure if it's killing 100% of the TCP sessions when the tunnel is renegotiated from scratch, or just the active ones, but our operational annoyance for the users and DBA's re-establishing application sessions and unlocking orphaned transactions is considerable. We could probably tolerate the 2 second wait for the tunnel to come back up, but not the dead TCP sessions.
My google-fu isn't turnning up a lot of examples of this, and I've opened a TAC about it too, but I'm hoping someone might have additional insight.
-- Jim Leinweber, WI State Lab of Hygiene