[DGD] Re: RFC: parse_string()

Mon Oct 13 14:41:17 CEST 1997

Richard Braakman wrote:
> >     token:	'regular_expression'
>
> Why ' ?  The slash is the usual delimiter for regexps, and anything
> that looks like /this/ is immediately recognisable as a regexp.  That
> also avoids the punctuation clash:
>
> >    'xxx' in a rule specifies the literal string xxx, not
> > a regular expression.

This is a good idea.

> Or you could do like lex and not have explicit delimiters at all.
> That avoids the need to escape the delimiter if it occurs inside the
> expression (which is awkward when machine-generating a grammar),
> but leading whitespace still has to be escaped.

Not just leading, but also trailing whitespace.  I prefer delimiters
over lex's solution, because unlike lex, parse_string() won't separate
rules with newlines.

> My second question is about the implementation.  Will parse_string()
> be efficient enough to be invoked with a fairly complex grammar for
> every command line?  I would expect that to bring up the same problems
> as the regexp packages try to handle, i.e. some way to store or cache
> a compiled version of the grammar.  Do you have any plans for that?

Parse_string() will use lazy construction for both the regular expression
DFA and the parser PDA.  Information will be cached between successive
calls, but only for a single grammar, so it makes sense to reserve an
object for each different grammar.

> Perhaps I'll address more high-level issues after I've tried to
> construct a grammar for LPC :)

The grammar part of dgd/src/comp/parser.y can be used for this.  I do
intend to make parse_string() general enough to make pre-parsing of
LPC code (and perhaps dealing with language extensions) feasible.

Dworkin