[DGD]parse_string()

Fri Jun 15 06:27:33 CEST 2001

More parse_string stuff from me I'm afraid.

Here's how I understand how parse_string works.  Please correct me if I'm 
wrong.  I suspect I have some fundamental misconceptions.

You have a string to parse, some token rules, and some production rules.
Is the string tokenized immediately with respect to the token rules prior to 
the production rules?  The concept in my head is that the string is 
translated into some sequence of tokens, and then the production rules get 
used to try to recreate the same token sequence.  If the production rules 
can create a match, then the string may be parsed according to the grammar.  
I'm starting to feel as if I have it totally backwards though.

I have a more practical question that I've tried to solve on and off for a 
few months now.  I'm trying to write a function that will explode a string 
with a delimiter specified by a regular expression.  My initial thought was 
to simply take the regular expression I wanted to use as a delimiter, use a 
catchall token to catch everything else (like /.+/) and it would be cake.  
Initially I assumed that there was a way to write the production rules to 
get around the longest match rule, but since the tokenizing of the string 
happens prior to the workings of the production rules (err...right?) this is 
impossible.

Next, I figured I could use /./ instead of /.+/, and just use some LPC 
functions in the production rule to craft the array.  But that's insane of 
course, because you are generating an ungodly number of tokens, since the 
tokenizing occurs prior to the production rules (err... right?).  So while 
it would kinda work, it's incredibly inefficient.

So I started wondering if every regular expression A had some inverse 
regular expression B, such that everything A match, B didn't, and everything 
A didn't, B did.  If I could generate B from A, then I could easily write 
the production rules and limit the number of tokens to the bare minimum 
required...

Where I'm currently stuck is how to express the inverse of the concatenation 
of two regular expressions, a and b.  Is there some trick to it?  You can 
assume for this that I am able to express the inverse of a (!a) and the 
inverse of b (!b).

I apologize for this long rambling post, but I felt I should post my entire 
train of thought so people who know better can more easily see where my 
misconceptions lay.

If what I'm asking here is for more help than is appropriate, I apologize.

Thanks in advance,
--Steve
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com

List config page:  http://list.imaginary.com/mailman/listinfo/dgd