Solved: SNMP Monitoring of Cisco ASA Firewalls in HA

ronit · ‎09-27-2022

What is the best practice when monitoring Cisco ASA Firewalls (ASA5525-X or FPR1120-ASA) configured in Active/Standby or Active/Active HA?

If we configure them as separate hosts in our SNMP software using their interface IPs, everything is fine till the primary unit fails and the primary IP shifts to the backup unit. The SNMP mis-understands this as the backup unit failing, instead.

Using out of band management using the dedicated management port wouldn't work because both firewalls share the same SNMP engineid.

Looks like the only correct way is to monitor the primary IP only. If we do that, is there a way to

1. Monitor the interface states for the backup unit, by polling the primary unit?

2. Monitor the state of both physical units monitoring only the primary IP?

Overall, just want to understand what the best practice is, when monitoring HA firewalls.

tvotna · ‎09-29-2022

Well, we monitor both of them (ifMIB, CPU, memory, etc), because it's not possible to monitor standby interfaces/memory by polling active (although see MIBs below).

SNMP engineID is shared between units in ASA 9.13 and below. In 9.14 Cisco moved to netsnmp and also now each unit responds to polls with its own unique engineID (CSCvu47989 snmpwalk on ASA does not return back the LOCAL engine ID, but only the active ID). This is a bad idea IMO. NMS needs to rediscover engineID upon each failover, as IP moves to another unit in the pair, but engineID doesn't move. Also, this affects how SNMP traps are sent before and after failover. So far as netsnmp implementation on ASA is concerned, it is awful with plenty of bugs. So, both decisions, to not share engineID anymore and migrate to netsnmp, were simply wrong.

You can use CISCO-FIREWALL-MIB to let your NMS understand which unit is active and which is standby. Something like this:

Primary unit

CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.2.4 = STRING: "Failover LAN Interface"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.2.6 = STRING: "Primary unit (this device)"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.2.7 = STRING: "Secondary unit"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.3.4 = INTEGER: 2
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.3.6 = INTEGER: 9
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.3.7 = INTEGER: 10
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.4.4 = STRING: "folink Ethernet1/5 (system)"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.4.6 = STRING: "Active unit"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.4.7 = STRING: "Standby unit"

Secondary unit

CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.2.4 = STRING: "Failover LAN Interface"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.2.6 = STRING: "Primary unit"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.2.7 = STRING: "Secondary unit (this device)"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.3.4 = INTEGER: 2
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.3.6 = INTEGER: 9
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.3.7 = INTEGER: 10
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.4.4 = STRING: "folink Ethernet1/5 (system)"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.4.6 = STRING: "Active unit"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.4.7 = STRING: "Standby unit"

Failover not configured

SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.2.4 = STRING: "Failover LAN Interface"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.2.6 = STRING: "Primary unit"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.2.7 = STRING: "Secondary unit (this device)"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.3.4 = INTEGER: 3
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.3.6 = INTEGER: 3
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.3.7 = INTEGER: 3
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.4.4 = STRING: "not Configured"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.4.6 = STRING: "Failover Off"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.4.7 = STRING: "Failover Off"

Primary and Secondary is offline

SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.2.4 = STRING: "Failover LAN Interface"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.2.6 = STRING: "Primary unit (this device)"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.2.7 = STRING: "Secondary unit"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.3.4 = INTEGER: 4
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.3.6 = INTEGER: 9
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.3.7 = INTEGER: 1
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.4.4 = STRING: "fover GigabitEthernet0/4"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.4.6 = STRING: "Active unit"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.4.7 = STRING: "No mate found"

Status codes:

- 1 - other
- 2 - up
- 3 - down
- 4 - error
- 5 - overTemp
- 6 - busy
- 7 - noMedia
- 8 - backup
- 9 - active
- 10 - standby

In 9.15 CISCO-UNIFIED-FIREWALL-MIB has few other OIDs:

show snmp oid | i 1.3.6.1.4.1.9.9.491.1.4
[601] .1.3.6.1.4.1.9.9.491.1.4.2.1.1.1 CISCO-UNIFIED-FIREWALL-MIB::cufwFOGroupIndex
[602] .1.3.6.1.4.1.9.9.491.1.4.2.1.1.2 CISCO-UNIFIED-FIREWALL-MIB::cufwFOGrpLastFailoverAt
[603] .1.3.6.1.4.1.9.9.491.1.4.2.1.1.3 CISCO-UNIFIED-FIREWALL-MIB::cufwFOGrpHAstate
[604] .1.3.6.1.4.1.9.9.491.1.4.2.1.1.4 CISCO-UNIFIED-FIREWALL-MIB::cufwFOGrpUpTime
[605] .1.3.6.1.4.1.9.9.491.1.4.2.1.1.5 CISCO-UNIFIED-FIREWALL-MIB::cufwFOGrpContextCount

E.g.

.1.3.6.1.4.1.9.9.491.1.4.2.1.1.1.0 = INTEGER: 0 <-- failover group
.1.3.6.1.4.1.9.9.491.1.4.2.1.1.2.0 = STRING: "12:14:55 MSK Feb 5 2021" <-- "show failover" Last Failover at: 12:14:55 MSK Feb 5 2021
.1.3.6.1.4.1.9.9.491.1.4.2.1.1.3.0 = INTEGER: 9 <-- role = active
.1.3.6.1.4.1.9.9.491.1.4.2.1.1.4.0 = Gauge32: 180065 <-- see MIB
.1.3.6.1.4.1.9.9.491.1.4.2.1.1.5.0 = Gauge32: 0 <-- number of contexts

View solution in original post

tvotna · ‎09-29-2022

Well, we monitor both of them (ifMIB, CPU, memory, etc), because it's not possible to monitor standby interfaces/memory by polling active (although see MIBs below).

SNMP engineID is shared between units in ASA 9.13 and below. In 9.14 Cisco moved to netsnmp and also now each unit responds to polls with its own unique engineID (CSCvu47989 snmpwalk on ASA does not return back the LOCAL engine ID, but only the active ID). This is a bad idea IMO. NMS needs to rediscover engineID upon each failover, as IP moves to another unit in the pair, but engineID doesn't move. Also, this affects how SNMP traps are sent before and after failover. So far as netsnmp implementation on ASA is concerned, it is awful with plenty of bugs. So, both decisions, to not share engineID anymore and migrate to netsnmp, were simply wrong.

You can use CISCO-FIREWALL-MIB to let your NMS understand which unit is active and which is standby. Something like this:

Primary unit

CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.2.4 = STRING: "Failover LAN Interface"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.2.6 = STRING: "Primary unit (this device)"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.2.7 = STRING: "Secondary unit"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.3.4 = INTEGER: 2
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.3.6 = INTEGER: 9
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.3.7 = INTEGER: 10
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.4.4 = STRING: "folink Ethernet1/5 (system)"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.4.6 = STRING: "Active unit"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.4.7 = STRING: "Standby unit"

Secondary unit

CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.2.4 = STRING: "Failover LAN Interface"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.2.6 = STRING: "Primary unit"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.2.7 = STRING: "Secondary unit (this device)"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.3.4 = INTEGER: 2
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.3.6 = INTEGER: 9
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.3.7 = INTEGER: 10
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.4.4 = STRING: "folink Ethernet1/5 (system)"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.4.6 = STRING: "Active unit"
CISCO-SMI::ciscoMgmt.147.1.2.1.1.1.4.7 = STRING: "Standby unit"

Failover not configured

SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.2.4 = STRING: "Failover LAN Interface"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.2.6 = STRING: "Primary unit"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.2.7 = STRING: "Secondary unit (this device)"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.3.4 = INTEGER: 3
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.3.6 = INTEGER: 3
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.3.7 = INTEGER: 3
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.4.4 = STRING: "not Configured"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.4.6 = STRING: "Failover Off"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.4.7 = STRING: "Failover Off"

Primary and Secondary is offline

SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.2.4 = STRING: "Failover LAN Interface"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.2.6 = STRING: "Primary unit (this device)"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.2.7 = STRING: "Secondary unit"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.3.4 = INTEGER: 4
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.3.6 = INTEGER: 9
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.3.7 = INTEGER: 1
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.4.4 = STRING: "fover GigabitEthernet0/4"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.4.6 = STRING: "Active unit"
SNMPv2-SMI::enterprises.9.9.147.1.2.1.1.1.4.7 = STRING: "No mate found"

Status codes:

- 1 - other
- 2 - up
- 3 - down
- 4 - error
- 5 - overTemp
- 6 - busy
- 7 - noMedia
- 8 - backup
- 9 - active
- 10 - standby

In 9.15 CISCO-UNIFIED-FIREWALL-MIB has few other OIDs:

show snmp oid | i 1.3.6.1.4.1.9.9.491.1.4
[601] .1.3.6.1.4.1.9.9.491.1.4.2.1.1.1 CISCO-UNIFIED-FIREWALL-MIB::cufwFOGroupIndex
[602] .1.3.6.1.4.1.9.9.491.1.4.2.1.1.2 CISCO-UNIFIED-FIREWALL-MIB::cufwFOGrpLastFailoverAt
[603] .1.3.6.1.4.1.9.9.491.1.4.2.1.1.3 CISCO-UNIFIED-FIREWALL-MIB::cufwFOGrpHAstate
[604] .1.3.6.1.4.1.9.9.491.1.4.2.1.1.4 CISCO-UNIFIED-FIREWALL-MIB::cufwFOGrpUpTime
[605] .1.3.6.1.4.1.9.9.491.1.4.2.1.1.5 CISCO-UNIFIED-FIREWALL-MIB::cufwFOGrpContextCount

E.g.

.1.3.6.1.4.1.9.9.491.1.4.2.1.1.1.0 = INTEGER: 0 <-- failover group
.1.3.6.1.4.1.9.9.491.1.4.2.1.1.2.0 = STRING: "12:14:55 MSK Feb 5 2021" <-- "show failover" Last Failover at: 12:14:55 MSK Feb 5 2021
.1.3.6.1.4.1.9.9.491.1.4.2.1.1.3.0 = INTEGER: 9 <-- role = active
.1.3.6.1.4.1.9.9.491.1.4.2.1.1.4.0 = Gauge32: 180065 <-- see MIB
.1.3.6.1.4.1.9.9.491.1.4.2.1.1.5.0 = Gauge32: 0 <-- number of contexts

ronit · ‎09-29-2022

Thank you for the clarification, very helpful. When you say you monitor both units using their own IPs, do you mean the interface failover IPs or the IPs configured on the dedicated management port?

We monitor them currently using the interface failover IPs and the problem is that if Unit-1 goes down, the same IP shifts to Unit-2 and the NMS thinks Unit-1 is still active whereas Unit-2 is down. How do you get around this issue?

tvotna · ‎09-30-2022

We poll over management interface, but this interface also has standby IP configured, so IP addresses are swapped when failover happens. So, yes, NMS thinks that the unit, which was polled before, is still active. But this is what we actually need, because the main purpose of monitoring for us is to monitor things like CPU and interface utilization. The drawback is that failover event can went unnoticed, but for this syslog is typically used or the MIBs above (they give access to roles of primary and secondary unit, so you know which one is active and which one is standby). Also, SNMP trap for failover event is supported as of 9.15:

OID cufwFailoverRoleChanged in CISCO-UNIFIED-FIREWALL-MIB

snmp-server enable trap failover-state

- .1.3.6.1.4.1.9.9.491.1.4.2.1.1.1.{0|1|2} = {0|1|2} - failover group
- .1.3.6.1.4.1.9.9.491.1.4.2.1.1.3.{0|1|2} = state (9 - active)

ronit · ‎10-17-2022

I tested this in our lab and I can see the different states of failover. Now just need to figure out how to use this data in our NMS (Zabbix) to get the desired results.