|Product Support||Talos Support||Cisco Support||Reference +||Current Release|
|Gateway||Reputation Lookup||Open a support case||Secure Email Guided Setup|
|Cloud Gateway||Email Status Portal||Support & Downloads||docs.ces.cisco.com|
|Email and Web Manager||Web & Email Reputation||Worldwide Contacts||Product Naming Quick Reference|
|Cloud Mailbox||Notification Service|
Hello IronPort Admins,
I recently built a web-based dashboard to help me monitor key health statistics for the ESAs in our environment. I'm sharing the code here so that others might benefit. The HTML is written using the Bootstrap3 framework, so it's easy to update the look of the page. The PHP can be modified to query the ESA API for other information as well. All you need is a web server running PHP that can access your ESA's API port to query the statistics.
I appreciate any feedback. Please let me know if you have any questions.
The data is retrieved from the API when the page loads. No data is stored for the historical stats. Currently, the Historical page returns stats for the previous day, but could be modified to show the last 7 days, for example, by changing the API query.
The stats from the box suck. Too much consolidation.
Syslog all the status logs and use that data @ 60 sec intervals. ( I think in logconfig that can be changed but haven't tried )
another approach is using syslog and a SIEM like Splunk, this gives us all the current and historical data but also allows for better alerts.
Very nice, Marc! Thanks for sharing. Perhaps you could provide the community with some guidance on how to get Splunk setup for this?
Add Log Subscription for Status Logs to a Syslog server.
This can be Splunk, but Splunk advises not to use Splunk as a direct Syslog server. This avoids service issues with Splunk Forwarder restarts for app rollouts / upgrades etc.
Tell the Splunk Forwarder which logs relate to which host based on the host name being used in the path is easiest, to override the host name - otherwise everything will be from your Syslog server.
Once the logs are in Splunk, then extract all the fields using either Field Extraction or directly use rex in search.
Pipe to a table or timechart.
| rex field=_raw "InjBytes (?<cisco_esa_inj_bytes>\S+)"
Then use the magic of Splunk for graphing and tables etc. Go as basic or extreme as you wish.
To display many graphs, I use 1 single search to collect all events and then use post process searches to filter the results. This means the load is quick and its light for Splunk. I also use trickery to auto-refresh each min without using Realtime searching and dynamically expand and hide graphs based on various thresholds.
Just in case you needed it :
index=email log_source="status_logs_splunk" | dedup gateway | sort gateway | table gateway , CPULoad , RAMUtil , DiskIO , ResourceConstraint, WorkQueueLength, MMLen, WorkQueueQuarantine, CurrentInboundConnections, CurrentOutboundConnections
The last variables are field extractions out of the status_logs. Gateway is a lookuptable IP address to gatewayname.
Hope that helps. Maybe share your search querys.
My Splunk skills are very basic, so I'm probably doing this wrong, but I'm charting CPU and RAM with:
timechart avg(CPU_Total) by host
index=* RAM_Used>0 | timechart span=1h avg(RAM_Used) by host
where CPU_Total and RAM_Used are field extractions from the status log.
With Splunk, the aim is to ask the indexer to find as few events as possible (to complete your task), ask the indexer to perform most of the leg work, before it transfers those events to the Search Head where 'enrichment' occurs on those events. You can look up about efficient searches, after a while it becomes more natural as you design your searches.
1) So...first you need to specify the exact Index if possible. ( not always dedicated, and it may not be in an exactly known index - it isn't for me )
2) Next, you want to focus on the Status Logs from the ESAs. This all depends on how it comes in, but if you are picking up Status logs from a specific directory, then you can specify the SourceType for those events as they come into Splunk.
If you are receiving on Syslog 514 directly, then everything coming in will likely have the same SourceType.
Check the host on the events are the ESA hosts - otherwise, you need to extract this from the path (inefficient) or do some work on the input to get the host representing the ESA ( e.g. not the centralised Rsyslog server )
3) You need to extract the field values at Search Time. There is Field Extraction in the GUI, or you just write the extraction into the search.
| rex field=_raw "RAMUsd (?<cisco_esa_RAMUsd>\S+)"
| eval RAM_Used_MB = cisco_esa_RAMUsd/1024/1024
| timechart span=1h avg(RAM_Used_MB) by host limit=0
However, that is an almighty average down. You could hide 30mins of maximum RAM with 30mins of minimum RAM and reveal 50% RAM usage over the entire hour.
Ask yourself why you want to know about the stat. If its when things run out of memory, then I would go for max(RAM_Used) - I always go for max, as average hides the issues. I then add more complexity to remove spikes, which applies mostly to CPU stats.