Bosun is used on the Tetration cluster to monitor hundreds of metrics regarding various aspects of the system. When a define threshold is crossed, it may generate a ‘critical’ event and an email alert. There are details in the email (as well as with the Customer Support Role viewing the Sentinel page under Monitoring) that should tell you what happened, when, why and potentially what does it mean. Once the event threshold falls below the defined limit for that alert, a ‘normal’ email should be generated
We have seen some issues with alerts being sent that don’t necessarily mean there is an impact to the performance of the cluster. By the 2.1.1.31 release, many of these issues have been corrected but we are still seeing a couple of metrics that appear to be generated when there is no noticeable impact; for example CSCvg49095.
I would advise if you are trying to understand the health of the cluster, use the Cluster Status and Service Status pages which are in the Maintenance page in the UI. These typically will give a more accurate state of the cluster health in a nice summary view.
If you still have questions around what you are seeing, please open up a TAC case.
Bryan