Introduction - What are Regular Expressions and why should I read this ?
Regex Basic Functions
Reading Regular Expressions
Regex for IPS Signatures
Conclusion and links
Regular Expressions (or short REGEX) provide a flexible way for matching patterns of characters in data. They are written in a formal language and are used by many programming languages and tools to search for and manipulate data based on patterns.
Regular Expressions are often confusing to anyone who comes across them, at the same time, they can be an extremely powerful tool. Because of that, I brushed up my knowledge on and put together some key points about regular expressions in general and about using regular expressions for IPS signature pattern matching. This is by no means a complete training on regular expressions, but what I hope to achieve is to peak your interest in regular expressions and show you how they can be used in custom IPS signatures.
A literal in a regular expression means "the actual character must be there".
For example, the regular expression "MPG" matches only one string "MPG". This expression really means "M" followed by "P" followed by "G". It's important to note that regular expressions are case sensitive ! Some characters have a special purpose in regex and need to be "escaped" when you want to use them as al literal. These "special function" characters are called Meta Characters. I've listed them in the below table.
For example, to match the literal character "?", you need to use the expression "\?".
The or expression can be used to match one of a number of listed options. The options are separated by a "|'.
Repetition and quantifier functions can be used if you need ot match a certain number of occurrences of a character.
• “C*” = “matches “zero or more of the character C”
• “C+” = “match one or more of the character C”
Here's an example. If we want to match the strings "yumy", "yummy" and "yummmy", we could use:
Can be used to match ranges of character, for example any letter or any number. Any character within the [ ] is a match.
• “” = “matches “1” or “2” or “3”
• “[0-9]” = “matches any digit”
• “[a-z]” = “match any lowercase letter”
• “[A-Z]” = “match any uppercase letter”
• “[a-z | A-Z]” = “match any letter”
Example: Match "round", "hound", "sound"
• "[rhs]ound" ("r" or "h" or "s" followed by "ound")
You can use the negation character to match any character except for the character listed. The negation character is "^". Any character following the negation character will be a negative match.
• “(d|c)isco” = “matches “disco” or “cisco”
• “(abc)“ = “captures abc” , same as literal
• “\w" = “matches on any word”
• “\W” = "matches any non-word"
• “\xhh" = "matches hexadecimal value hh"
• “." ' = "matches any character except new line"
• “\n" = "matches new line"
Regular expressions are used in many existing signatures on the Cisco IPS sensors to match certain characters or patterns in traffic. Some of these expressions are very basic, others are more complex. Very often, regular expressions on the sensors are using hexadecimal codes to match certain ASCII characters.
There basically are 3 main categories of how regular expressions are applied to signatures:
This is used in some of the "Atomic" signatures as well as in some of the signatures that use the "Service" engine.
This is used in some of the "Service HTTP" signatures. IPS sensors have a significant amount of signatures that are specific to HTTP traffic. Because of that, with the "Service HTTP" signatures, we have a lot more options when it comes to regex matching. When creating a custom signature of this type, we can match traffic on various application layer fields specific to HTTP: "URI", "Arguments" "Header" and "Request". Because of the large number of HTTP specific signatures and Regex options, we will look deeper into each of these options.
In order to understand the different options we have for matching strings in HTTP packets, we first need to understand some basics about how HTTP messages look like. I would recommend having a look at the HTTP RFC, but here's an illustration of the basics we need to know to understand where our regex options apply:
As I mentioned before, we can create regular expressions match on several fields within the HTTP application layer data. Below is a list of the different matching options.
1. URI Regex: Regular expression to search in the URI field. The URI
Now let's look at these different options in more detail
URI stands for "Uniform Resource Identifier". It basically is a string of characters used to identify a name our resource on the Internet. Here's an example: http://cisco.com/cgi-sys/defaultwebpage.cgi. In the actual HTTP Request (GET) packet, the URI is the field immediately after the "request method" field and before the first CRLF (\r\n). See screenshot of HTTP packet payload:
Here's an example of a regular expression we could apply to the URI. Let's say we want to match any HTTP packet with the string "badword" in the URI. The screenshots below show what we are trying to match (in the packet) and the (very simple) regular expression we can write for this:
Arguments regular expressions can be used to search on the HTTP arguments field, such as variables within a form input. The HTTP arguments are defined after the "?" in the URI. See screenshot below:
And here's an example using an arguments regex. In this example, we want to match HTTP requests in which the users enters "purple" or "pink" as their favorite color.
The header-regex will search inside the "HTTP header" field for a matching pattern. The header is defined as "after the first CRLF (\r\n) and until CRLFCRLF (\r\n\r\n). You can find a list of possible header fields in HTTP request/response here. An example of a field in the HTTP header is "content-type" or "user-agent". User agent can be used to identify the browser, browser version, browser language, operating system language and more. Here's a screenshot showing the HTTP header:
This time, let's write a regular expression to match HTTP packets sent from a Firefox or Safari browser:
You may noticed that in for this example, we actually used Hexadecimal codes to represent certain characters we want to match in our header. We used [\x20] which is the HEX representing "white space" in ASCII, we also used [\x0d\x0a] which is HEX for "Carriage return / New Line". Finally we used "[^\x0d\x0a]*" which will match on any number of any character except for carriage return or new line (notice the negation character "^"). This last one is basically used to ensure we just search on one line within the header.
Enables searching in both the HTTP URI field and HTTP header. Since this is a combination of the fields discussed above, I don't have a separate example for the Request regex.
Let's write a "service-http" custom signature to alert on an HTTP GET request matching the following conditions:
- URI includes "test.php" or "test.asp" or "test.html"
- OS-language = US-EN
- Browser-language = NL-BE
- Host: guimp.com
- Argument: "name" = "John.Doe"
I hope this blog article has answered some of the questions you may have had about regular expressions in general and about how to use regular expressions for writing IPS custom signatures in particular. At the very least, I hope this has shown that regular expressions can be a very powerful tool for a lot of different applications including IPS pattern matching. If you have any questions about this, please feel free to reply to this article or contact me directly.
I found the below links very helpful:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.