Regional Language Spam

prash_ironport · ‎11-17-2009

Hi,
Offlate we are receiving loads of spam in Russian, Spanish and French language. I have seen Ironport is not efficient in blocking such spam and takes too long time to include them in the CASE signature updates. :oops:

Is there a way to block these messages using a Content/ Message Filter? :idea:

Cheers!
Prash

kluu_ironport · ‎11-20-2009

Yes, it can be done. If your company doesn't deal with those languages or other Cyrillic languages, then you can implement the following solution.

How to block Russian / Cyrillic / Ukrainian char sets

There are 2 options:

Write a filter.
Refer to a dictionary text file in a message filter.

1. You can write either a content filter or a message filter to catch these charsets if your business does not interact with Russian / Cyrillic / Ukranian senders.

Here is an outline for a filter.

quarantine_russian_spam:

if (recv-listener == "InboundMail") AND ((body-contains("windows-1251")) OR (header("Content-type") == "(?i)windows-1251")) {
quarantine ("Policy");
}

You may want to place this in the content filters since content filters occur after the anti-spam scanning. Placing this filter in the message filters may be resource-expensive in order to scan the body of the email for the charsets.

2. Another option is to add the list of character sets to a dictionary text file and refer to that in your message filter.

Below are some of the charsets that you can use depending on what language the spam is.

=========================================================

References

http://msdn.microsoft.com/en-us/library/aa752010.aspx

http://en.wikipedia.org/wiki/ISO_8859-1

28591 iso-8859-1 Western European (ISO)
28592 iso-8859-2 Central European (ISO)
28593 iso-8859-3 Latin 3 (ISO)
28594 iso-8859-4 Baltic (ISO)
28595 iso-8859-5 Cyrillic
(ISO) (i.e.Eastern European (Cyrillic-based: Bulgarian, Byelorussian, Macedonian, Russian, Serbian, Ukrainian) )

28596 iso-8859-6 Arabic (ISO)
28597 iso-8859-7 Greek (ISO)
28598 iso-8859-8 Hebrew (ISO-Visual)
28599 iso-8859-9 Turkish (ISO)
28603 iso-8859-13 Estonian (ISO)
28605 iso-8859-15 Latin 9 (ISO)

ISO 8859-1

Western European (Albanian, Basque, Breton, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, German, Greenlandic, Icelandic, Irish Gaelic, Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish)

ISO 8859-2

Eastern European (Albanian, Croatian, Czech, English, German, Hungarian, Latin, Polish, Romanian, Slovak, Slovenian, Serbian)

ISO 8859-3

Southeastern European (Afrikaans, Catalan, Dutch, English, Esperanto, German, Italian, Maltese, Spanish, Turkish)

ISO 8859-4

Northern European (Danish, English, Estonian, Finnish, German, Greenlandic, Latin, Latvian, Lithuanian, Norwegian, Sámi, Slovenian,
Swedish)

ISO 8859-5

Eastern European (Cyrillic-based: Bulgarian, Byelorussian, Macedonian, Russian, Serbian, Ukrainian)

ISO 8859-6

Arabic

ISO 8859-7

Greek

ISO 8859-8

Hebrew

ISO 8859-9

Western European (Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch, English, Finnish, French, Frisian, Galician, German, Greenlandic, Irish Gaelic, Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish,
Turkish)

ISO 8859-10

Northern European (Danish, English, Estonian, Faeroese, Finnish, German, Greenlandic, Icelandic, Irish Gaelic, Latin, Lithuanian, Norwegian, Sámi, Slovenian, Swedish)

ISO 8859-15

Western European (Albanian, Basque, Breton, Catalan, Danish, Dutch, English, Estonian, Faroese, Finnish, French, Frisian, Galician, German, Greenlandic, Icelandic, Irish Gaelic, Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish,
Swedish)

------------------------------------

United States,UK
Western European (ISO)

iso-8859-1

http://en.wikipedia.org/wiki/ISO_8859-1

-----------------------------------

For Arabic (most common: iso-8859-6)

ASMO-708
DOS-720
iso-8859-6
csISOLatinArabic
ECMA-114
ISO_8859-6
ISO_8859-6:1987
iso-ir-127
x-mac-arabic
windows-1256
cp1256

-----------------------------------

Baltic:

ibm775
CP500
iso-8859-4
windows-1257

-----------------------------------

Central European (Crotia, Czech, Hungary) (most common: iso-8859-2)

ibm852
iso-8859-2
csISOLatin2
iso_8859-2
so_8859-2:1987
iso8859-2
iso-ir-101
x-mac-ce
latin2_croatian_ci
latin2_czech_cs
latin2_general_ci
latin2_hungarian_ci
latin2_bin
x-cp1250

-----------------------------------

Chinese (most common: all)

EUC-CN
x-euc-cn
gb2312
CN-GB
csGB2312
csGB231280
csISO58GB231280
GB_2312-80
GB231280
GB2312-80
hz-gb-2312
x-mac-chinesesimp
iso-ir-58
cn-big5
x-Chinese

-----------------------------------

Russian, Ukranian, Cyrillic ( most common: windows-1251)
(i.e. Eastern European (Cyrillic-based: Bulgarian, Byelorussian, Macedonian, Russian, Serbian, Ukrainian)

cp866
iso-8859-5
koi8-r
koi8-u
x-mac-cyrillic
windows-1251
windows-1257

-----------------------------------

German

x-IA5-German

Sample German

Ich weiß es nicht
Da während des ganzen Mittelalters im Unterschied zu den Nachbarländern in dem Land der Teutschen stark territorial zersplitterte politische Strukturen existierten, entwickelten sich die zum Teil extrem unterschiedlichen deutschen Dialekte (deutsche Mundarten) lange parallel nebeneinander her.
rächen

------------------------------------

Greek (most common: windows-1253)

ibm869
ibm737
iso-8859-7
x-mac-greek
windows-1253

-----------------------------------

Hebrew (most common: windows-1255)

iso-8859-8
x-mac-hebrew
windows-1255

-----------------------------------

Japanese (most common: iso-2022-jp, shift_jis)

shift_jis
x-mac-japanese
csISO2022JP
euc-jp
x-euc
x-euc-jp
iso-2022-jp

-----------------------------------

Korean (most common: iso-2022-kr, euc-kr)

ks_c_5601-1987
csKSC56011987
euc-kr
so-ir-149
ks_c_5601
ks_c_5601_1987
ks_c_5601-1989
KSC_5601
KSC5601
csEUCKR
iso-2022-kr
csISO2022KR
x-mac-korean