Getting caught up on archived posts

Thursday, December 29, 2005 12:00:00 AM (Central Standard Time, UTC-06:00)
I have a bunch of partial posts (around 65) that I have been archiving over the last 6 months with no time to complete them.  The Balanced Match problem was one of those, though not a perfect solution, it solves about 90% of the problem and I felt that was good enough at this point.  If I continue to stock pile these posts, trying to perfect them, either they will loose their timely relevance <sidenote>I had a couple of comments on the DSL Toolkit for Beta 2, but being almost two months past RTM its probably to late to post it </sidenote> or they will never get published.  This has been a struggle for me for sometime in that I want to provide technically accurate and detailed posts.  I thought I would have more time to spend on writing and speaking when I went independent 18 months ago, but my time constraints have only gotten tighter.   The point is that I am going to publish more posts at 80%-90% in order to get more volume on the blog.   I think it is more important to get content out for others to see rather them spend the time perfecting it.  Not saying that it shouldn't be accurate, rather it should be a starting point for others to extend or learn from.  I think that is one of the biggest advantages of many blogs.
 
 

C# Implementation for Balanced Matches

Saturday, December 24, 2005 12:00:00 AM (Central Standard Time, UTC-06:00)
What is Balanced Matching?  Balanced matching can be used to parse expressions embedded into one another, or can be used to match an open token to its closing counterpart.  Examples of this would be parenthesis matching in expressions, HTML Tag matching and pairing tokens for a template engine.  For example, lets say you had the following expression:
 
(z(a + b(x + y)))
 
when parse it should look something like this:
 
x + y
a + b(x + y)
z(a + b(x + y))
 
 
After spending sometime trying to get a balanced match regular expression to support embedded expressions N levels deep, I ended up writing my own C# implementation. If you are interested in pursuing the Regular Expression implementation, these two articles reference an approach, but didn't give me the solution I was looking for (support for multiple levels of embedding).  You can find a pretty good article on Balanced Matching and regular expressions at Wes' Puzzle Blog, it looks like it was based on this article found on the BCL Team log. 
 
 
The approach I used was to mix Regex and a Stack class from the System.Collections namespace.  The basic logic is as follows:
 
  1. Use Regex to capture all of the "Start" tokens and "End" tokens.
  2. Sort them by the position they are found in the text. 
  3. Loop through all of the tokens:
    a) If the token is a start token, push it onto the stack.
    b) else pop it off of the stack, and use as the matching end token
  4. At the end if the stack is not empty, there is a token mismatch (all start tokens must have a matching end token).
 
You can download BalancedMatching.zip to see the implementation.
 
Note: You can also use regular expressions as Start and End tokens.