Solved: Re: Cisco WSA Custom URL Category Regex Syntax

SHABEEB KUNHIPOCKER · ‎09-14-2022

Hello,

I am in the process migrating from a third-party proxy to Cisco WSA. The third-party proxy has custom url categories containing entries in the syntax "example.com/path1/path2". When I tried to enter the same in the custom URL category "Sites" field it rejected saying invalid syntax. The next available option is to Regex entries I guess. Is there any online regex generation tool to generate the entries. I have a lot of entries to move and doing them manual will take a long time. I tried to do it from the site https://ibnuhx.com/regex-generator but the syntax it generated is not accepted by WSA. For example the URL I am trying to generate regex is .bbc.com/news/business and the generated regex from the site is /\.bbc\.com\/news\/business/ and it is not accepted in WSA. Kindly help.

The sample sites that we are looking to create regexes are

.bbc.com/news/business

.epayment.www.gov.qa/eGovPaymentWeb/SendPaymentAction

www.cnbc.com/world/?region=world

Thanks and Regards

amojarra · ‎09-14-2022

Hi @SHABEEB KUNHIPOCKER

WSA uses the Flex regular expression analyzer.

you can use this URL for testing : flex lint - Regex Tester/Debugger

here are the list of escaped characters

Escaped characters
\. \* \\	escaped special characters
\t \n \r	tab, linefeed, carriage return
\u00A9	unicode escaped

for .bbc.com/news/business you can use \.bbc\.com/news/business

please note that the URL categories order are top down, so if you have bbc.com in top category, and the above sample in custom URL category 10, the url .bbc.com/news/business will match the 1st category.

Not adding the escaped character will affect the performance and it will create slowness during surfing the web. This is because the pattern-matching engine will go through thousands or millions of possibilities until matching the correct entry.

+++++++++++++++++++++++++++++++++++++++++++++++++++

++++ If you find this answer helpful, please rate it as such ++++

+++++++++++++++++++++++++++++++++++++++++++++++++++

Regards,
Amirhossein Mojarrad
+++++++++++++++++++++++++++++++++++++++++++++++++++
++++ If you find this answer helpful, please rate it as such ++++
+++++++++++++++++++++++++++++++++++++++++++++++++++

View solution in original post

Konstantinos9 · ‎09-14-2022

Hi Shabeeb,

I just tested the URLs mentioned above and they work fine. The issue appears to be that the tool you used tries to escape the "/", which it's not needed, at least not for WSA.

It's very important though to escape the "dots" (\.) , to avoid any performance impact. The following URL was accepted in my WSA:

\.bbc\.com/news/business

You may just try to copy and paste the list of URLs you have, one per line, and just make sure you escape the "dots" where needed.

Hope that helps. Let me know if you have more questions.

Kind regards,

Konstantinos

amojarra · ‎09-14-2022

Hi @SHABEEB KUNHIPOCKER

WSA uses the Flex regular expression analyzer.

you can use this URL for testing : flex lint - Regex Tester/Debugger

here are the list of escaped characters

Escaped characters
\. \* \\	escaped special characters
\t \n \r	tab, linefeed, carriage return
\u00A9	unicode escaped

for .bbc.com/news/business you can use \.bbc\.com/news/business

please note that the URL categories order are top down, so if you have bbc.com in top category, and the above sample in custom URL category 10, the url .bbc.com/news/business will match the 1st category.

Not adding the escaped character will affect the performance and it will create slowness during surfing the web. This is because the pattern-matching engine will go through thousands or millions of possibilities until matching the correct entry.

+++++++++++++++++++++++++++++++++++++++++++++++++++

++++ If you find this answer helpful, please rate it as such ++++

+++++++++++++++++++++++++++++++++++++++++++++++++++

Regards,
Amirhossein Mojarrad
+++++++++++++++++++++++++++++++++++++++++++++++++++
++++ If you find this answer helpful, please rate it as such ++++
+++++++++++++++++++++++++++++++++++++++++++++++++++

SHABEEB KUNHIPOCKER · ‎09-21-2022

Hi Guys,

Thanks a lot for your response. I managed to configure and test the regex syntaxes mentioned above. For example if I need to add the entry .cnn.com/business/news then I am using the regex entry as \.example\.com/business/news

Now I have a use case like I have to use a regex for the entry "autodiscover.*" which will basically match autodiscover.com, autodiscover.in etc. Kindly help.

Thanks

amojarra · ‎09-21-2022

Hi

If you want to block autodiscover in every URLs, such as

URL.COM/SOMETHING/autodiscover/OTHERTHINGS
AUTODISCOVER.COM

Simply just put autodiscover in the regEx

no need for any dots or stars

+++++++++++++++++++++++++++++++++++++++++++++++++++

++++ If you find this answer helpful, please rate it as such ++++

+++++++++++++++++++++++++++++++++++++++++++++++++++

Regards,
Amirhossein Mojarrad
+++++++++++++++++++++++++++++++++++++++++++++++++++
++++ If you find this answer helpful, please rate it as such ++++
+++++++++++++++++++++++++++++++++++++++++++++++++++

SHABEEB KUNHIPOCKER · ‎09-22-2022

Hello,

I need to block all URLs in the syntax autodiscover.* which will basically block URLs like autodiscover.com, autodiscover.in etc. If I put just autodiscover in regex it will block example.com/ autodiscover as well which I don’t need.

Thanks

Ken Stieers · ‎09-22-2022

Try
\/\/autodiscover\.

that should catch match on //autodiscover.

SHABEEB KUNHIPOCKER · ‎09-22-2022

Hi,

I will try and update you ASAP.

Thanks