I have a customer that has 2 x FTD 2130 units in an HA pair running 6.6.4 on FMC. They have tried various code versions thus far and have had similar experiences.
Symptoms:
After about 4-8 hours in production, the CPU on the active unit hits 100%. This is a gradual climb that starts at idle/5-6%. After hitting 100%, all dataplane traffic stops and the only solution is to fail over services to the standby unit, causing it to start the gradual climb to 100%, starting at 5%. Again, this process takes about 4-8 hours, depending on activity.
Other Notes:
- behavior has been present since the initial deployment (3 months ago)
- unit currently homes about 50 AnyConnect users running latest version (again, multiple AC versions have been tried)
- Goal is to home about 5000 AC users on unit. Obviously, this is not possible if it cannot handle 50 users.
- These 4950 existing AC users are currently homed to an old 5585 ASA and are doing just fine with a similar design.
- Internet traffic on the unit seems to only consume about 5-6% of the CPU
- We have noticed that the amount of ICMP traffic destined for the AC users gradually increases as the CPU increases. This has led us to believe the increase in ICMP traffic might be the cause of the climbing CPU.
- Customer found some 7-year-old KB for an ASA that recommended creating a more specific static route for the AC subnet that directed the traffic out the outside interface. This had no effect.
- It seems that the ICMP packets might simply be never leaving the network. This has led us to theorize this might be some kind of session state timeout issue or something to that effect.
At this point, the customer has had 3 TAC cases open and the regional SE involved. To-date, no one involved has found a resolution.
Any help would be most appreciated.