Regex Help to Selectively Block Messages

sam_wynens · ‎10-23-2019

Hi,

I have a customer ask to block messages containing links to (.windows.net|.azurewebsites.net|.web.core.windows.net|.blob.core.windows.net) due to the recent escalation in phishing attacks originating from compromised business sites. I built a content filter that looks for this regex: only-body-contains("(?i)(\\.windows\\.net|\\.azurewebsites\\.net|\\.web\\.core\\.windows\\.net|\\.blob\\.core\\.windows\\.net)", 1)

This filter is catching legitimate business email because some of the messages have embedded images hosted on these sites. So I came up with this regex to ignore the <img> tag containing the above sites but match on everything else.

(?i)<img[^>]*>(*SKIP)(*FAIL)|https?:\/\/.*(\.windows\.net|\.azurewebsites\.net|\.web\.core\.windows\.net|\.blob\.core\.windows\.net)\/

This works great in regex101.com, but Ironport complains with "Illegal regular expression: nothing to repeat".

Can anyone think up another way to go about this?

Thanks in advance!

Mathew Huynh · ‎10-23-2019

Hey Sam,

it's likely the content filter regex engine doesn't allow the syntax compared to the flexibility on regex101.
To construct the filter, I would generally suggest having the source email available (in .eml format) and checking the contents in itself, verify the exact string we see and use message filters rather than content filters.

I do suspect some limitations as we take a first match wins approach.. so it can be quite easy to bypass since the body scan rule doesn't take into account each URL as an individual instance.

so a filter that could be written is:

Mathew_Filter_example:
if not only-body-contains("(?i)<IMG>.*(url1\\.com|url2\\.com)")
{
if only-body-contains("(?i)(url1\\.com|url2\\.com)")
{
quarantine('Policy');
}
}
.

Essentially if we find an <IMG>url1.com we'll bypass the filter to avoid an FP.
If we do NOT find this, we'll look at the domain info somewhere within the email.

The reason why this can be bypassed is, if someone inserts an image tag with the URL (as per your example) then they skip this filter.
We cannot treat each URL as an individual entity using message/content filters, so we generally need to use URL filtering for that circumstance.

I hope this clears up some info.

Regards,
Mathew