[DGD] Re: RFC: parse_string()
Felix A. Croes
felix at dworkin.nl
Mon Oct 13 14:41:17 CEST 1997
Richard Braakman wrote:
> > token: 'regular_expression'
>
> Why ' ? The slash is the usual delimiter for regexps, and anything
> that looks like /this/ is immediately recognisable as a regexp. That
> also avoids the punctuation clash:
>
> > 'xxx' in a rule specifies the literal string xxx, not
> > a regular expression.
This is a good idea.
> Or you could do like lex and not have explicit delimiters at all.
> That avoids the need to escape the delimiter if it occurs inside the
> expression (which is awkward when machine-generating a grammar),
> but leading whitespace still has to be escaped.
Not just leading, but also trailing whitespace. I prefer delimiters
over lex's solution, because unlike lex, parse_string() won't separate
rules with newlines.
> My second question is about the implementation. Will parse_string()
> be efficient enough to be invoked with a fairly complex grammar for
> every command line? I would expect that to bring up the same problems
> as the regexp packages try to handle, i.e. some way to store or cache
> a compiled version of the grammar. Do you have any plans for that?
Parse_string() will use lazy construction for both the regular expression
DFA and the parser PDA. Information will be cached between successive
calls, but only for a single grammar, so it makes sense to reserve an
object for each different grammar.
> Perhaps I'll address more high-level issues after I've tried to
> construct a grammar for LPC :)
The grammar part of dgd/src/comp/parser.y can be used for this. I do
intend to make parse_string() general enough to make pre-parsing of
LPC code (and perhaps dealing with language extensions) feasible.
Dworkin
More information about the DGD
mailing list