Health Monitoring Best Practices for Cisco ACE

Sandeep Singh · ‎01-17-2012

Introduction
Active Health Probe
General Guidelines
Probing Individual rservers versus serverfarms
Resource Limiting Probes per Context
Default Ports
Scripted TCL Probes
Example ICMP Probe
Related Information

Introduction

This document describes best practices to follow when configuring health monitoring on ACE. For any load balancing device it is important to ensure that server failures are quickly identified and the malfunctioning devices are taken out of operation quickly. Proactive health monitoring is used to regularly inspect the status of a device and to ensure it is operating properly. This health monitoring is done using health probes, or keepalives, that can range from a simple ICMP ping to a higher level protocol probe that will send a request to a service and monitor the result for specific values or return codes, which is then used to make decisions on the pass or fail status of the service.

ACE identifies the health of a server in the following categories:
Passed— The server returns a valid response.
Failed— The server fails to provide a valid response to the ACE and is unable to reach a server for a specified number of retries.

Active Health Probe

By configuring the ACE for health monitoring, the ACE sends active probes periodically to determine the server state. The ACE supports 1000 unique probe configurations, which includes ICMP, TCP, HTTP, and other predefined health probes. The ACE can execute only up to 200 concurrent script probes at a time. The ACE also allows the opening of 2048 sockets simultaneously.

You can associate the same probe with multiple real servers or server farms. Each time that you use the same probe again, the ACE counts it as another probe instance. You can allocate a maximum of 4000 probe instances. By default, no active health probes are configured on the ACE. You can configure health probes on the ACE to actively make connections and explicitly send traffic to servers. The probes determine whether the health status of a server passes or fails by its response.

General Guidelines

It is important to evaluate the needs of the environment when configuring health probes. The following items should be taken into consideration:

a) Probing Interval
While it is possible to set a service’s probe interval to a very low value and determine failures quickly, this has the side effect of introducing extra load on the server, as well as being resource intensive on the ACE. However, it is also important to not set a probing interval that is too high, as a service can go down and its failure won’t be detected for an unacceptably long period of time.

b) Initial Probe State
The initial state of a probe when first run will be the “INIT” state. In this state only one pass/fail is required to change the status of the probe from “INIT” to “PASS” or “FAILED”. Once this state is set, the regular rules for state transition are used, so if a service is not yet running, when the probe is first run, it will remain in the “FAILED” state, until the passdetect interval has passed. Take this into consideration when first bringing up a serverfarm, in an environment where it is important that the server be brought up in an operational state right away.

c) Keepalive Connection Termination Method
For TCP based probes, ACE will open the connection using a standard TCP 3-way handshake. However, unlike CSM, by default ACE will gracefully close the connections using a FIN instead of a RST. If this behavior is undesirable, it is possible to change it using the “connection term forced” command. Also note that some services have negative reactions to connections being closed forcefully, using a RST.

d) Staggering of Multiple Keepalives
When running multiple services on a single server, each requiring separate probing, it is preferable to stagger the intervals so that all probes are not run at the same time. This will allow ACE to run more probes, as well as reduce the load incurred on the servers as a result of probing.

e) ACE Probe Capacity
The ACE supports up to 4096 configured probes across all contexts and 2048 simultaneous sockets in use by probes. This means that if 3000 probes are scheduled to run every 2 seconds and they are all triggered to run at the same time, only 2048 will run at the first 2 second interval and the rest will be skipped. However, the remaining 952 probes will be first in the queue to be sent out during the next 2 second interval. You can associate the same probe with multiple real servers or server farms. Each time that you use the same probe again, the ACE counts it as another probe instance. You can allocate a maximum of 16K probe instances. ACE can execute only up to 200 concurrent script probes at a time.

f) Reverse DNS
Some services will perform a reverse DNS lookup on the IP address of a connecting host before allowing a connection, for logging purposes. In situations where your ACE does not have an entry in DNS, this can cause a delay in connections, since the service may attempt to do a DNS lookup, wait for the DNS resolution to fail and time out, and then finally allow the connection. In these cases, it is necessary to either add a DNS entry for your ACE, or to specify a longer open timer, on the ACE, using the “open” command. Note that this is not common.

Probing Individual rservers versus serverfarms

ACE allows health probes to be configured at an rserver and serverfarm level, as well as associated multiple probes with each of these. It is normally desirable to assign probes to both of these locations, as it aids in scalability of configurations, so that more generic probes (icmp, etc) can be used at an rserver level and more specific probes can be assigned to serverfarms accepting connections for these specific probes. If a probe associated to an rserver fails, this rserver will be taken out of rotation for all serverfarms it belongs to. However, if a probe associated to a serverfarm fails for a specific server, that server will only stop servicing requests in that serverfarm.

For example, consider a scenario where a context has servers R1, R2 and R3 defined. Additionally, serverfarms SF1 and SF2 are defined, where SF1 is the destination for an FTP VIP and SF2 is the destination for an HTTP VIP. R1 and R2 are running FTP servers, while R2 and R3 are running web servers. In such a scenario, one could configure an ICMP probe to probe all three rservers (R1, R2 and R3), while configuring and FTP probe to only probe SF1 and an HTTP probe to probe SF2. The benefit of such a design is that if the FTP service goes down on R2 and the probe detects this, but the HTTP service still responds to the probe, it will continue to service requests in the SF2 (HTTP) serverfarm, however be taken out of rotation in the SF1 (FTP) serverfarm.

Resource Limiting Probes per Context

The connections created by health probes utilizes “mgmt-connection” resources in ACE, for performing health probes. These connections are sent from the control plane and therefore are considered separate from data plane connections, which handles client to VIP/server traffic. For example, to restrict a context to only being able to utilize 50 simultaneous connections for health monitoring, you can limit the mgmt-connections to 1% of all available connections:

resource-class abc
limit-resource all minimum 0.00 maximum unlimited
limit-resource mgmt-connections minimum 1.00 maximum equal-to-min
context FreeService
allocate-interface vlan 10
allocate-interface vlan 11
member abc

Default Ports

By default the health monitoring subsystem determines the port number based on the type of probe. The following table shows these default port selections:

Table 1: Default Probe Ports

TypeDefault Port

TCP	80
UDP	53
HTTP	80
HTTPS	443
FTP	21
Telnet	23
Echo	7
Finger	79
IMAP	143
POP3	110
SMTP	25
DNS	53
RADIUS	1812

A commonly overlooked misconfiguration is configuring a TCP probe for a service that does not run on port 80, however omitting the port specification in the probe configuration. ACE 1.x does not inherit probe port numbers from serverfarm or VIP definitions, therefore it must be defined in the probe. For this reason if there are multiple services running on different ports that require health monitoring it is necessary to define a separate probe instance for each unique port.

Scripted TCL Probes

ACE supports the ability to use custom probes based on the TCL scripting language for performing server health checks. Scripted probes are especially useful for load balancing applications that do not use protocols for which ACE has built in probes, but still require application level health checking to ensure rapid failure detection.

ACE supports a maximum of 200 simultaneous scripted probes executing at the same time, and up to 256 total installed scripted probes. The TCL interpreter used by ACE supports most commands except for those that perform file system I/O, including commands that reference other TCL files, such as “package” and “source”.

Example ICMP Probe

The ICMP probe should be configured for a 10 second interval and is used to verify basic connectivity to the server. Sample probes for ICMP is provided below:

probe icmp probe1
interval 10
passdetect interval 10
passdetect count 3

Related Information

Configuring Health Monitoring on ACE
Troubleshooting ACE Health Monitoring

Replacing a failed ACE appliance in failover pair
Understanding ACE MAC Address Allocations