Note to myself: Regular Expressions performance
This post is mostly as a reminder for myself, to not loose these important links again. But said that, it’s probably interesting for you too, when you care about performance and the .NET behaviour around regular expressions (Regex).
The common things
In the .NET world, some things are very common. First is, you are advised to use a StringBuilder whenever you concatenate some strings. Second is: If a Regex is slow, use RegexOptions.Compiled to fix it. Well… Now, in fact there are reasons for this sort of advise. String concatenation IS slow, for various, commonly known reasons. But still a StringBuilder has some overhead and there are situations where using it imposes an unwanted overhead.
The very same goes for RegexOptions.Compiled, and Jeff Atwood, aka Coding Horror, wrote a very good article about that a few years ago: To compile or not to compile (Jeff Atwood).
In one of the comments another article from MSDN (BCL Blog) is referenced, where the different caching behaviour of Regex in .NET 1.1 vs. .NET 2.0 is explained: Regex Class Caching Changes between .NET Framework 1.1 and .NET Framework 2.0 (Josh Free).
The not-so-common things
There is only a single thing that is true for each and every kind of performance optimization. And it’s the simple two words: “It depends.”.
With regular expressions, the first thing any performance issue depends on is, if you really need a regular expression for the task. Of course, if you really know regular expressions, what they can do and what they can’t, and for what they are the correct tool, you are very likely to not run into those kinds of problems. But when you just learned about the power of Regexes (all you have is a hammer) everything starts to look as a string desperatly waiting to get matched (everything is a nail). What I want to say is: Not everything that could be solved with a Regex also should be solved by one. Again, I have a link for me and you to keep in your Regex link collection: Regular Expressions: Now You Have Two Problems (Jeff Atwood).
Now, finally to performance optimization links.
There is a good blog article series on the MSDN BCL Blog (like the one above) that goes very deep into how the Regex class performs in different scenarios. You find them here:
- Optimizing Regular Expression Performance, Part I: Working with the Regex Class and Regex Objects (Ron Petrusha)
- Optimizing Regular Expression Performance, Part II: Taking Charge of Backtracking (Ron Petrusha)
- Optimizing Regex Performance, Part 3 (Ron Petrusha) [this article is commonly NOT in lists referring to the first two articles, probably because of the somewhat inconsistent naming ;-)]
And, besides those, once again a nice article on “catastrophic backtracking” from Jeff: Regex Performance (Jeff Atwood).
One more thing
There are three articles, that are not really available anymore. Three very good articles from Mike, that you can only retrieve from the wayback machine. I’m really thinking hard about providing a mirror for these articles on my blog too. But until then, here are the links: