Outbound Queue Status Warnings

rcox825_ironport · ‎04-11-2007

Hello,

Is there anyway to create an automatic warning if the Outbound queue grows to a certain size/age? We had a situation where our Ironport was unable to communicate with our external DNS source and mail queued on the box for over 5 hours until users began complaining. It would be great to have an alert kicked off after some threshold? Is there any SNMP MIBs or other solution that someone has come up with? Ironport tech support did not know of any mechanism. Seamless email flow with alerting would seem to be something that Ironport would have built in.

Thanks!
Rod

si_ironport · ‎04-11-2007

Hi Rod,

We use Cricket and Nagios to monitor our Ironports. The Cricket checks hook into our backed ticket logging system so that we get a notification when something goes wrong.

I use perl scripts to accomplish this, you can either use the output of "ssh host status" or SNMP via the following resource. Ensure you have the MIB file in your mibs directory.

/usr/bin/snmpget -m ASYNCOS-MAIL-MIB -Ov -v 3 -l AuthNoPriv -u v3get -a MD5 -A $snmppasswd $hostname ironPort.asyncOSAppliances.asyncOSMail.asyncOSMailObjects.workQueueMessages.0

There are a bunch of different parameters we monitor for either capacity or thresholds, here are a few ideas from my script:

$ /opt/cricket/util/ironportmon.pl -h
Usage: ironportmon.pl
Available parameters are:
mem - % memory utilization
cpu - % CPU utilization
disk - % disk I/O utilization
queue - % of total queue capacity used
filesock - Number of open files or sockets
workqueue - Number of messages in the work queue
transthread - Number of threads performing mail transfering tasks
temp_cpu_a - Current temperature of CPU A (Degrees Celcius)
temp_cpu_b - Current temperature of CPU B (Degrees Celcius)
temp_ambient - Current amibent temperature (Degrees Celcius)
temp_planar - Current planar temperature (Degrees Celcius)
temp_riser - Current riser temperature (Degrees Celcius)
power - Current status of both power supplies (1)OK (2)WARN (3)ERROR
raid - Current status of all 4 RAID disks (1)OK (2)WARN (3)ERROR
license - Current license expiration (1)OK (2)WARN (3)ERROR

si

jaigill · ‎04-11-2007

You cannot monitor the number of active recipients via SNMP. The current Async OS MIB does not provide support for this field. However, the number of active recipients may be monitored via the following perl script. The following script will send out an email when the number of active recipients exceeds 1000. You need to modify 'exp_active_recips' to trigger an email alert at a value of your choice. For instance, if this is set to 20000, an email alert will be sent out when the number of active recipients exceeds 20000. Also, modify the 'To' address to send a message to the correct recipient.

-bash-3.00$ more monitor_active_recips_amfam.pl
#!/usr/bin/perl

##### Perl script to monitor log usage on 2 hosts ######
use CGI;

open STDERR, ">/dev/null";

######### Set threshold for active recipients ########
$exp_active_recips =1000;

######### Add hosts that need to be monitored ########
@host = ( "iron1.com", "iron2.com" );

for ($i = 0; $i < @host; $i++) {
##################### Get Stats for hosts #########################
$val = `curl -k https://$host[$i]/xml/status -u admin:ironport`;
$trigger = 0;
if ($val =~ /active_recips\" current=\"(\d+)\"/s) {
$active_recips = $1;
}

if ($active_recips > $exp_active_recips) {
$trigger += 1;
}

if ( $trigger == 1) {

open(MAIL, "| /usr/lib/sendmail -oi -n -t");
print MAIL <To:user1\@domain.com
From:LOG_MONITOR\@ironport.com
Subject: Critical Resource Utilization Values Exceeded

STATUS SUMMARY OF $host[$i]

active_recips = $active_recips

EMAIL_TO_USER
close MAIL;
}
}

dbeste_ironport · ‎04-12-2007

Hello,

from the V5.1 release notes:

workqueue-count filter rule. This rule checks to see if the work queue count is equal to, greater or lesser than a specified value. The following example shows the filter used to skip spam filtering if the queue is greater than 1000:

wqfull: if (workqueue-count > 1000) {
skip-spamcheck();
}

Cheers
Dirk

salware_ironport · ‎04-13-2007

The SNMP variable counts the number of messages in the work queue. I use this counter in Zabbix and I trigger alarms when it gets over 1000 messages.
When this happens it may be because a great input of messages that are being scanned by CASE with high using of CPU.

But it would be nice to have another SNMP variable to take count of active recipients. I only can figure it by the % of disk using. Normally it's not upper that 1 or 2 %, but it would be nice to know how many messages are waiting to be delivered by SNMP.

shannon.hagan · ‎04-16-2007

We have bash scripts that query the xml status pages for a given domain and looks for oldest message and queue size - it looks at all of the machines that we have in the cluster and then sends a mail message out if certain thresholds are exceeded. You can also query the xmlstatus pages and put together a page that indicates how many messages you have queued for the domains with the most mail - helpful if you have multiple machines and you need to know which domains that you need to be looking at for delivery problems.

Bash script to notify based on oldestmessage or queue size.
In the code below, you would need to change ownerlist, userpass, listofironports, and replace yourdomain.com with your domain name.


#!/usr/bin/bash

#--------------------------------------------------------
# This script checks on specified domains and attempts
# to send a message to warn a person if there are messages
# over 3600 seconds (1 hour) or more than 1,000 recipients
# in our queues for their domain.  If the domain is down
# and the email address given to us is on their domain
# or forwards to their domain then it won't be delivered
# until their domain is back up.
# The script also saves historical data so we can look at
# it.
#--------------------------------------------------------

#--------------------------------------------------------
# Get the name of the script so we can send the output
# to the right place
#--------------------------------------------------------
outputdir=`dirname $0`

#--------------------------------------------------------
# Determine if running in test mode
#--------------------------------------------------------
if [[ "$1" == "test" ]]
then
  testing=1
else
  testing=0
fi

#--------------------------------------------------------
# Get the current date information
#--------------------------------------------------------
currentdateinfo=`date +'%m/%d/%y %H:%M:%S'`

#--------------------------------------------------------
# Setup list of domain owners
#--------------------------------------------------------
ownerlist=("domain1:notify1 domain2:notify2")

#--------------------------------------------------------
# Setup list of domain owners
#--------------------------------------------------------
userpass="username:password"

#--------------------------------------------------------
# Set list of mtas to check
#--------------------------------------------------------
listofironports="ironport1.mydomain.com ironport2.mydomain.com"

for curowner in $ownerlist
do
  hostname=`printf "%s\n" "$curowner"|cut -d ":" -f 1`
  hostowner=`printf "%s\n" "$curowner"|cut -d ":" -f 2`
  printf "%s:%s\n" "$hostname - $hostowner"

  totalrecipients=0
  oldestmessage=0

  for curironport in $listofironports
  do
    hostinfo=`/usr/local/bin/curl -s -k https://$curironport/xml/hoststatus?hostname=$hostname -u "$userpass"`
    printf "%s\n" "$hostinfo" >> $outputdir/$curironport.txt
    hoststatus=`expr "$hostinfo" : '.*<\(host name="[^/]*\).*"'`
    status=`expr "$hoststatus" : '.*status="\([^"]*\)"'`
    tmpoldestmessage=`expr "$hoststatus" : '.*oldest_message="\([^"]*\)"'`
    if [[ "$tmpoldestmessage" = "" ]]
    then
      tmpoldestmessage=0
    fi
    if (( $tmpoldestmessage > $oldestmessage ))
    then
      oldestmessage=$tmpoldestmessage
    fi # if tmpoldestmessage > oldestmessage
    lastactivity=`expr "$hoststatus" : '.*last_activity="\([^"]*\)"'`
    activerecipients=`expr "$hostinfo" :  '.*"active_recips" current="\([^"]*\)"'`
    if [[ "$activerecipients" == "" ]]
    then
      activerecipients=0
    fi
    deliveredrecipients=`expr "$hostinfo" :  '.*"delivered_recips" current="\([^"]*\)"'`
    if [[ "$deliveredrecipients" == "" ]]
    then
      deliveredrecipients=0
    fi
    totalrecipients=`expr $totalrecipients + $activerecipients`
    printf "%s:%s:%s:%s\n" "$curironport" "$hostname" "$activerecipients" "$tmpoldestmessage" >> $outputdir/$hostname.info
  done  # for curironport

  printf "%s:%s:%s:%s\n\n" "$hostname" "$currentdateinfo" "$totalrecipients" "$oldestmessage" >> $outputdir/$hostname.info

  if (( $oldestmessage > 3600 || $totalrecipients > 1000 ))
  then
    printf "You have %s messages queued at yourdomain.com for you.\nThe oldest message in seconds is %s.\n" "$totalrecipients" "$oldestmessage"
    if [[ "$testing" == "0" ]]
    then
      printf "You have %s messages queued at yourdomain.com for you.\nThe oldest message in seconds is %s.\n" "$totalrecipients" "$oldestmessage"|mailx -s "Messages queued for $hostname as of $currentdateinfo GMT: $totalrecipients" -r postmaster@yourdomain.com $hostowner
    fi # if testing
  fi  # if oldest message
done # for curowner

To get the tophosts, you would do


 statusinfo=`/usr/local/bin/curl -s -k https://$curironport/xml/tophosts -u "$userpass"--max-time 30`

and then parse out the hostname and active recipients.

steven_geerts · ‎02-24-2008

Hello,

there is is nice plugin for Nagios available on NagioExchange.

it does several checks including monitoring the workqueue.
i spend some time translating the output from spanish to english but now it's very usefull.

The original version:
http://www.nagiosexchange.org/AddOn_Projects.22.0.html?&tx_netnagext_pi1[p_view]=1220

by the way: if you can script a littlebit you can easily add counters to the script to make your monitoring more complete.

Best regards Steven