C# Implementation for Balanced Matches

Saturday, December 24, 2005 12:00:00 AM (Central Standard Time, UTC-06:00)
What is Balanced Matching?  Balanced matching can be used to parse expressions embedded into one another, or can be used to match an open token to its closing counterpart.  Examples of this would be parenthesis matching in expressions, HTML Tag matching and pairing tokens for a template engine.  For example, lets say you had the following expression:
 
(z(a + b(x + y)))
 
when parse it should look something like this:
 
x + y
a + b(x + y)
z(a + b(x + y))
 
 
After spending sometime trying to get a balanced match regular expression to support embedded expressions N levels deep, I ended up writing my own C# implementation. If you are interested in pursuing the Regular Expression implementation, these two articles reference an approach, but didn't give me the solution I was looking for (support for multiple levels of embedding).  You can find a pretty good article on Balanced Matching and regular expressions at Wes' Puzzle Blog, it looks like it was based on this article found on the BCL Team log. 
 
 
The approach I used was to mix Regex and a Stack class from the System.Collections namespace.  The basic logic is as follows:
 
  1. Use Regex to capture all of the "Start" tokens and "End" tokens.
  2. Sort them by the position they are found in the text. 
  3. Loop through all of the tokens:
    a) If the token is a start token, push it onto the stack.
    b) else pop it off of the stack, and use as the matching end token
  4. At the end if the stack is not empty, there is a token mismatch (all start tokens must have a matching end token).
 
You can download BalancedMatching.zip to see the implementation.
 
Note: You can also use regular expressions as Start and End tokens.