[DGD]parse_string()
S. Foley
s_d_foley at hotmail.com
Fri Jun 15 06:27:33 CEST 2001
More parse_string stuff from me I'm afraid.
Here's how I understand how parse_string works. Please correct me if I'm
wrong. I suspect I have some fundamental misconceptions.
You have a string to parse, some token rules, and some production rules.
Is the string tokenized immediately with respect to the token rules prior to
the production rules? The concept in my head is that the string is
translated into some sequence of tokens, and then the production rules get
used to try to recreate the same token sequence. If the production rules
can create a match, then the string may be parsed according to the grammar.
I'm starting to feel as if I have it totally backwards though.
I have a more practical question that I've tried to solve on and off for a
few months now. I'm trying to write a function that will explode a string
with a delimiter specified by a regular expression. My initial thought was
to simply take the regular expression I wanted to use as a delimiter, use a
catchall token to catch everything else (like /.+/) and it would be cake.
Initially I assumed that there was a way to write the production rules to
get around the longest match rule, but since the tokenizing of the string
happens prior to the workings of the production rules (err...right?) this is
impossible.
Next, I figured I could use /./ instead of /.+/, and just use some LPC
functions in the production rule to craft the array. But that's insane of
course, because you are generating an ungodly number of tokens, since the
tokenizing occurs prior to the production rules (err... right?). So while
it would kinda work, it's incredibly inefficient.
So I started wondering if every regular expression A had some inverse
regular expression B, such that everything A match, B didn't, and everything
A didn't, B did. If I could generate B from A, then I could easily write
the production rules and limit the number of tokens to the bare minimum
required...
Where I'm currently stuck is how to express the inverse of the concatenation
of two regular expressions, a and b. Is there some trick to it? You can
assume for this that I am able to express the inverse of a (!a) and the
inverse of b (!b).
I apologize for this long rambling post, but I felt I should post my entire
train of thought so people who know better can more easily see where my
misconceptions lay.
If what I'm asking here is for more help than is appropriate, I apologize.
Thanks in advance,
--Steve
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com
List config page: http://list.imaginary.com/mailman/listinfo/dgd
More information about the DGD
mailing list