Forged Email Detection - Dealing with Middle Initials

jjthomas · ‎10-22-2019

I had a question concerning the executive names in the dictionary used by Forged Email Detection (FED). We have successfully implemented FED via a content filter, and it was worked pretty well thus far. I did tune the scoring, however, moving the Similarity Score to 95. We have recently encountered a spoof that was, unfortunately, successfully delivered.

This spoof, however, was different, as it included the middle initial of one of our executives.

Hence my my question: what is the guidance on the use of middle initials in the dictionary of names? Should they be included as a separate entry or part of the same entry?

For example, I have dictionary entries of the format FirstName LastName.

Jane Doe

Robert Jones

...

Should the dictionary now include entries as follows instead?

Jane Doe

Jane A. Doe

Robert Jones

Robert C. Jones

I suspect the issue is similarity score and inclusion of the middle initial only. Thus, where possible, we should include middle initials in the dictionary of executive names.

Jason

Libin Varghese · ‎10-22-2019

Jason,

Initially FED had a problem of scoring 100 for display names containing middle names, this was corrected as part of

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvb90531/?reffering_site=dumpcr

Currently, presence of middle names or initials would result in a lower similarity score to avoid false positives.

Considering your threshold set to 95, I assume false positives are a concern. With this high a threshold, we are unlikely to match Robert C. Jones against Robert Jones.

It would certainly make sense to add multiple entries in dictionaries with/without middle names/initials to see what best works for you.

Regards,

Libin

jjthomas · ‎10-22-2019

Hi Libin,

Your reply is very helpful. Based on that, my two options would be to either:

change our threshold to, say, 90, and monitor for false positives; or
add an entry to the dictionary that contains the middle initial for the executive in question, in addition to the current entry without a middle initial.

In either case, we would monitor for behavior. You are correct that my setting of the threshold at 95 was to combat some false positives we observed after our initial deployment.

I might do some testing today to see what the threshold would be if I just had the executive's name include the middle initial. There might be some "sweet spot" that I might be able to find.

Thanks,

Jason

jjthomas · ‎10-22-2019

And a very quick update based on some spoof testing.
If I change the dictionary entry to include a middle initial, that means an exact match of that name will score 100. (Splunk makes finding this scoring a lot easier.)
In my testing, when I do not include a middle initial, the FED score is 94.
I have tweaked my score threshold to 92, but I will continue to monitor how future messages are scored.

marc.luescherFRE · ‎10-22-2019

Another approach since you already have the moving parts in place.

a) extract the friendly name from the ESA and sent them to Splunk.

b) create a list of monitored email addresses you care about in a Splunk csv or text file

c) create a list of excluded email adresses you know about like CEO private email as csv or text file

d) create some regex rules in Splunk to detect when a from, sentby or replyto address is detected in the friendly from and create an alert/report when it is not part of your exclude list.

e) automate message removal using Splunk and Powershell

we never got with FED good enough results.

-Marc