What are Regular Expressions and why should I read this ?

Stijn Vanveerdeghem · ‎12-14-2010

Overview

Introduction - What are Regular Expressions and why should I read this ?

Regex Basic Functions

Reading Regular Expressions

Regex for IPS Signatures

Conclusion and links

What are Regular Expressions and why should I read this ?

Regular Expressions (or short REGEX) provide a flexible way for matching patterns of characters in data. They are written in a formal language and are used by many programming languages and tools to search for and manipulate data based on patterns.

Regular Expressions are often confusing to anyone who comes across them, at the same time, they can be an extremely powerful tool. Because of that, I brushed up my knowledge on and put together some key points about regular expressions in general and about using regular expressions for IPS signature pattern matching. This is by no means a complete training on regular expressions, but what I hope to achieve is to peak your interest in regular expressions and show you how they can be used in custom IPS signatures.

Regex Basic Functions

Literals

A literal in a regular expression means "the actual character must be there".

For example, the regular expression "MPG" matches only one string "MPG". This expression really means "M" followed by "P" followed by "G". It's important to note that regular expressions are case sensitive ! Some characters have a special purpose in regex and need to be "escaped" when you want to use them as al literal. These "special function" characters are called Meta Characters. I've listed them in the below table.

For example, to match the literal character "?", you need to use the expression "\?".

Meta Characters
[	{	\
\|	>	^
$	(	)
<	.	*
+	?

Boolean "or"

The or expression can be used to match one of a number of listed options. The options are separated by a "|'.

•“a|b” = “a or b”

•“blog|vlog” = “blog” or “vlog”

Repetition and quantifiers

Repetition and quantifier functions can be used if you need ot match a certain number of occurrences of a character.

• “C*” = “matches “zero or more of the character C”

• “C+” = “match one or more of the character C”

• “C?” = “match one or none of the character C”

• “C{n}” = “match n times the character C”

• “C{n,m)” = “match between n and m times the character C"

• “.*” = “match any number of any character ”

• "(a|b)+” = “one or multiple times “a” or “b”

Here's an example. If we want to match the strings "yumy", "yummy" and "yummmy", we could use:

• "yum+y" ("y" followed by "u" followed by one or more "m" followed by "y")

• "yum{1-3}y" ("y" followed by "u" followed by one to three "m" followed by "y")

Ranges

Can be used to match ranges of character, for example any letter or any number. Any character within the [ ] is a match.

• “[123]” = “matches “1” or “2” or “3”

• “[0-9]” = “matches any digit”

• “[a-z]” = “match any lowercase letter”

• “[A-Z]” = “match any uppercase letter”

• “[a-z | A-Z]” = “match any letter”

Example: Match "round", "hound", "sound"

• "[rhs]ound" ("r" or "h" or "s" followed by "ound")

Negations

You can use the negation character to match any character except for the character listed. The negation character is "^". Any character following the negation character will be a negative match.

•“noth[^i]ng” = “matches anything (noth*ng) except “nothing”

•“h[^io]ppy” = “matches anything except “hippy” and “hoppy"

Grouping

Grouping can be used to define both the scope and precedence of operators within the expression. It also makes the expression easier to read. Grouping will capture/match everything enclosed in the round brackets ().

• “(d|c)isco” = “matches “disco” or “cisco”

• “(abc)“ = “captures abc” , same as literal

Special Characters

You may want to match on something else besides Alphabetical letters or numbers (for example a new line or whitespace). In that case, you can use one of the special characters listed below in your expression.

• "\s" = “matches on white space”

• "\S" = “matches on not white space”

• “\d" = “matches on any digit”

• “\w" = “matches on any word”

• “\W” = "matches any non-word"

• “\xhh" = "matches hexadecimal value hh"

• “." ' = "matches any character except new line"

• “\n" = "matches new line"

Reading Regular Expressions

In order to read and understand a regular expression, you should split up the expression into it's components. Just like with mathematical formulas, this will make things a lot easier.

Example

Expression

([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})

Components

([a-z0-9_\.-]+)

@

([\da-z\.-]+)

\.

([a-z\.]{2,6})

Matches

(ILoveRegex)

@

([Cisco)

\.

(com)

Regex for IPS Signatures

General Overview

Regular expressions are used in many existing signatures on the Cisco IPS sensors to match certain characters or patterns in traffic. Some of these expressions are very basic, others are more complex. Very often, regular expressions on the sensors are using hexadecimal codes to match certain ASCII characters.

There basically are 3 main categories of how regular expressions are applied to signatures:

Regex to match string anywhere in the payload

This is used in "String TCP", "String UDP" and "String ICMP" type signatures was well as sin signatures of the "Fixed UDP" or "Fixed TCP" type. Here's an example:

Regex to match string anywhere within L3 or L4 header and payload

This is used in some of the "Atomic" signatures as well as in some of the signatures that use the "Service" engine.

Regex to match string at a specific location in the application payload

This is used in some of the "Service HTTP" signatures. IPS sensors have a significant amount of signatures that are specific to HTTP traffic. Because of that, with the "Service HTTP" signatures, we have a lot more options when it comes to regex matching. When creating a custom signature of this type, we can match traffic on various application layer fields specific to HTTP: "URI", "Arguments" "Header" and "Request". Because of the large number of HTTP specific signatures and Regex options, we will look deeper into each of these options.

Regex in "Service HTTP" signatures

Understanding HTTP messages

In order to understand the different options we have for matching strings in HTTP packets, we first need to understand some basics about how HTTP messages look like. I would recommend having a look at the HTTP RFC, but here's an illustration of the basics we need to know to understand where our regex options apply:

HTTP Regex Matching Options

As I mentioned before, we can create regular expressions match on several fields within the HTTP application layer data. Below is a list of the different matching options.

1. URI Regex: Regular expression to search in the URI field. The URI
field is defined as after the HTTP method (i.e. GET, POST) and before
the first CRLF ((\r\n).

2. Arg Name Regex: Regular expression to search in the HTTP arguments
field (variable names within form input, for instance). This is defined
as after the '?' and in the entity body as defined by Content-Length.

3. Arg Value Regex: Regular expression to search in the HTTP arguments
field after Arg Name Regex is matched. This is searching on the value
defined by the variable name, above.

4. Header Regex: Regular expression to search in the HTTP header. The
header is defined as after the first CRLF(\r\n) but before CRLFCRLF (\r\n\r\n).

5. Request Regex: Regular expression to search in both the HTTP URI
and HTTP header.

Now let's look at these different options in more detail

Service HTTP: URI Regex

URI stands for "Uniform Resource Identifier". It basically is a string of characters used to identify a name our resource on the Internet. Here's an example: http://cisco.com/cgi-sys/defaultwebpage.cgi. In the actual HTTP Request (GET) packet, the URI is the field immediately after the "request method" field and before the first CRLF (\r\n). See screenshot of HTTP packet payload:

Here's an example of a regular expression we could apply to the URI. Let's say we want to match any HTTP packet with the string "badword" in the URI. The screenshots below show what we are trying to match (in the packet) and the (very simple) regular expression we can write for this:

Service HTTP: Arguments Regex

Arguments regular expressions can be used to search on the HTTP arguments field, such as variables within a form input. The HTTP arguments are defined after the "?" in the URI. See screenshot below:

And here's an example using an arguments regex. In this example, we want to match HTTP requests in which the users enters "purple" or "pink" as their favorite color.

Service HTTP: Header Regex

The header-regex will search inside the "HTTP header" field for a matching pattern. The header is defined as "after the first CRLF (\r\n) and until CRLFCRLF (\r\n\r\n). You can find a list of possible header fields in HTTP request/response here. An example of a field in the HTTP header is "content-type" or "user-agent". User agent can be used to identify the browser, browser version, browser language, operating system language and more. Here's a screenshot showing the HTTP header:

This time, let's write a regular expression to match HTTP packets sent from a Firefox or Safari browser:

You may noticed that in for this example, we actually used Hexadecimal codes to represent certain characters we want to match in our header. We used [\x20] which is the HEX representing "white space" in ASCII, we also used [\x0d\x0a] which is HEX for "Carriage return / New Line". Finally we used "[^\x0d\x0a]*" which will match on any number of any character except for carriage return or new line (notice the negation character "^"). This last one is basically used to ensure we just search on one line within the header.

Service HTTP: Request Regex

Enables searching in both the HTTP URI field and HTTP header. Since this is a combination of the fields discussed above, I don't have a separate example for the Request regex.

Final Exercise showing all the different HTTP regex match options.

Let's write a "service-http" custom signature to alert on an HTTP GET request matching the following conditions:

- URI includes "test.php" or "test.asp" or "test.html"

- OS-language = US-EN

- Browser-language = NL-BE

- Host: guimp.com

- Argument: "name" = "John.Doe"

Solution

•URI-regex:

[Tt][Ee][Ss][Tt][\.]([Pp][Hh][Pp]|[Aa][Ss][Pp]|[Hh][Tt][Mm][Ll])

•header-regex:

([Uu][Ss][Ee][Rr][-][Aa][Gg][Ee][Nn][Tt][:][\x20][^\x0d\x0a]*([Ee][Nn]\-[Uu][Ss]))(.|\n)*

([Aa][Cc][Cc][Ee][Pp][Tt][-][Ll][Aa][Nn][Gg][Uu][Aa][Gg][Ee][:][\x20][^\x0d\x0a]*([Nn][Ll][-][Bb][Ee]))

•Request-regex :

[Hh][Oo][Ss][Tt][:]\x20[Gg][Uu][Ii][Mm][Pp][\.][Cc][Oo][Mm]

•ARG-name regex:

[Nn][Aa][Mm][Ee]

•ARG-Value regex:

[Jj][Oo][Hh][Nn][\.][Dd][Oo][Ee]

Conclusion and Links

I hope this blog article has answered some of the questions you may have had about regular expressions in general and about how to use regular expressions for writing IPS custom signatures in particular. At the very least, I hope this has shown that regular expressions can be a very powerful tool for a lot of different applications including IPS pattern matching. If you have any questions about this, please feel free to reply to this article or contact me directly.

I found the below links very helpful:

Rubular: Regex editor and tester, very useful to quickly test your expressions

Regex Cheat Sheet

shivapd · ‎03-11-2011

Great information.

Note that there is now a whitepaper on writing custom sigs. It can be found at http://www.cisco.com/web/about/security/intelligence/ips_custom_sigs.html.

Introduction To Regular Expressions for IPS

What are Regular Expressions and why should I read this ?

Regex Basic Functions

Literals

Boolean "or"

Repetition and quantifiers

Ranges

Negations

Grouping

Special Characters

Reading Regular Expressions

Regex for IPS Signatures

General Overview

Regex to match string anywhere in the payload

Regex to match string anywhere within L3 or L4 header and payload

Regex to match string at a specific location in the application payload

Regex in "Service HTTP" signatures

Understanding HTTP messages

HTTP Regex Matching Options

Service HTTP: URI Regex

Service HTTP: Arguments Regex

Service HTTP: Header Regex

Service HTTP: Request Regex

Final Exercise showing all the different HTTP regex match options.

Solution

Conclusion and Links

AnyConnect Certificate Based Authentication.

Getting past intermittent/unexplained 802.1x problems on Windows 7

Insights About Multiple Vulnerabilities in Cisco Discovery Protocol Implementations (CDPwn)