Using memoization to combat ReDoS

Image for post
Image for post

Introduction

So a lot of big words were used in the title. What does memoization mean? What does ReDoS mean? Fear not, young padawan, for everything will be revealed in the next couple of minutes.

Problematic Regexes

A problematic regex is a regex that leads to an epsilon loop if we regard the regex as an NFA. Okay, so we’re back to fancy words, so let’s first talk about what an NFA is. An an example, picture 4 locations. Let’s say you have your house, nearby restaurant, mall and pub chosen as places and there is only one route to each place from the previous place as shown in the undirected graph.

Image for post
Image for post
Undirected graph of places
Image for post
Image for post
Regex NFA for a|b
Image for post
Image for post
Regex NFA for (a|a)*

Memoization

Firstly let’s define memoization, it is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again. By using this technique we can reduce the cost of traversing the whole NFA (yip, those two edges) to a single return. We must keep track of the inputs and their corresponding outputs, so this will cost us some space. Sadly I am not referring to the place with pretty stars, but more to the fancy word computer scientists use to describe memory usage. (No one knows why they use all these fancy words).

Image for post
Image for post
Image for post
Image for post

Conclusion

Regexes are very powerful and can be used for a lot of good. Unfortunately, one needs to careful of how you use it since it can lead to catastrophic damages through Denial-of-Service. Okay, maybe catastrophic is a bit hyperbolic, but if you’re running a business which has to validate emails and passwords for new users but due to ReDoS they have have to wait minutes for validation then they could quickly dislike your website. Luckily, memoization is here to the rescue from this conundrum by effectively reducing the strain that problematic regexes can have on a server, machine or program. Now that you’ve learned some ways as how to memoization can implemented, the world is you oyster!

Written by

A young computer scientist with interests in almost all fields of CS. Okay I may have lied, I really dislike HTML and CSS, but the rest is interesting.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store