ASA 5585-X, slow OSPF convergence after failover, 180 second

d.pakhomov · ‎12-04-2021

Hello everyone,

We faced this issue more than year ago. Software version is ASA 9.12.4 or 9.6.4. We had few cases in Cisco (like 692054705) and in our local support partner. But it looks like nobody have real willing to do anything with this bug.

After the failover event occurred on pair of ASA 5585X devices they can start to ignore OSPF updates packets from DR router. They are stuck in INIT/DROTHER state for 180 seconds. We can see the next messages in logs:

OSPF: AAA.B.232.193 address XXX.YY.90.193 on TO_INSIDE is dead, state DOWN
OSPF: DR/BDR election on TO_INSIDE
OSPF: OSPF: Rcv pkt from TO_INSIDE src XXX.YY.90.196 dst 224.0.0.5 id AAA.B.223.1 type 4 if_state 2 : ignored due to unknown neighbor
OSPF: rcv. v:2 t:4 l:64 rid:AAA.B.223.1 aid:0.0.0.0 chk:685e aut:0 auk: from TO_INSIDE
OSPF: OSPF: Rcv pkt from TO_INSIDE src XXX.YY.90.196 dst 224.0.0.5 id AAA.B.223.1 type 4 if_state 2 : ignored due to unknown neighbor

Usually "ignored due to unknown neighbor" messages caused by enabled lls capability on peers. But in our case we are already disabled such function. It's also doesn't related to the any MTU issues, because the all devices are connected back to back by wire and have fixed MTU 1500 on all interfaces. It's also worked flawlessly after manual OSPF process cleaning. Our topology is the same for many years. We do not have any changes in it before problem started.

The topology is pretty simple. We had only three devices: 1xASA5585 and 2xC6500. I attached our scheme to this message.

Steps we are already did:

1. Checked all MTU;

2. Disabled capability lls;

3. We run another ospf process between these devices, on different physical links. The new ospf process work flawless;

In my assumption this bug is defenetly related to our exact topoly/configuration/load. It's pretty hard to reproduce whole environment in details, we have about 10-20Gbps of traffic and large ACL with more than 600k entries.

I will really appreciate for any help and ideas.

WBR,

Daniel

balaji.bandi · ‎12-04-2021

3. We run another ospf process between these devices, on different physical links. The new ospf process work flawless;

if you already 1 process running, how is the other OSPF process configured, post both the OSPF config required configuration here to look? are you looking to run same OSPF process using same interface ? how is this interface configured ?

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

d.pakhomov · ‎12-04-2021

Yes, we can configure them on the same physical interfaces, but with different sub interfaces. We can do it only after NY, beause of change_freeze/high_season.

Our ospf and interface configurations are below:

router ospf 106 //production process
 router-id AAA.B.232.193
 network XXX.YY.90.192 255.255.255.248 area 0 //network faced to SW
 area 0
 area AAA.B.0.0
 log-adj-changes
 redistribute connected metric-type 1 subnets route-map Redistribute_connected_to_OSPF
 redistribute static metric-type 1 subnets route-map Redistibute_static_to_OSPF
!
router ospf 100 //process for test
 router-id AAA.B.223.227
 network AAA.B.223.224 255.255.255.240 area 0
 area 0
 log-adj-changes
 redistribute static metric 1 subnets route-map TEST_Redistibute_static_to_OSPF
!
interface TenGigabitEthernet0/7.104  //production subinterface faced to SW
 description TO_INSIDE
 vlan 104
 nameif TO_INSIDE
 security-level 100
 ip address XXX.YY.90.193 255.255.255.248 standby XXX.YY.90.194
 ospf cost 10
!
interface GigabitEthernet0/3.113 //test subinterface faced to SW
 description EXT_ATM
 vlan 113
 nameif EXT_ATM
 security-level 56
 ip address AAA.B.223.227 255.255.255.240 standby AAA.B.223.228

MHM Cisco World · ‎12-04-2021

according to this
you have two L3SW with SVI in each one.
now in ASA pair (2XASA) you receive OSPF of vlan 113 on the interface TO_INSIDE so it is unknown peer for both ASA that use OSPF VLAN 104 in that interface !!!
AM I RIGHT?

d.pakhomov · ‎12-04-2021

No, vlan 113 was created only for the test purposes.

Let me try to describe whole setup:

2xASA (Act/Stb) and 2xL3SW with SVI on each one. Yes. Actually we have two SVI on each SW. One is Vlan 104, second one is Vlan 113 (for testing).

OSPF Process 106 is for production, it's run across Vlan 104 and TenGigabit interface. And in this Vlan we recieved unknown peer messages. And we stuck in INIT state for 180 seconds after the failover event.

The second one is OSPF Proccess 100 created only for test purposes. It's run across Vlan 113 and via Gigabit interface. This configuration work flawlessly.

We do not share vlans/network between OSPF processes. We created vlan 113 and new OSPF instance months after the original problem started.

MHM Cisco World · ‎12-04-2021

you already disable ospf LLS in SW side ?

d.pakhomov · ‎12-04-2021

Yes, on both SW:

interface Vlan104
 ip ospf lls disable

Also, we don't see LLS bit in the OSFP Hello packets from SW:

Options: 0x02, (E) External Routing
0... .... = DN: Not set
.0.. .... = O: Not set
..0. .... = (DC) Demand Circuits: Not supported
...0 .... = (L) LLS Data block: Not Present
.... 0... = (N) NSSA: Not supported
.... .0.. = (MC) Multicast: Not capable
.... ..1. = (E) External Routing: Capable
.... ...0 = (MT) Multi-Topology Routing: No

We also have other interfaces that connected to the Nexus SW. They are works fine. Nexuses by default have LLS enabled,