[DGD] RFC: parse_string()

Tue Oct 7 18:36:29 CEST 1997

This is a request for comments in the literal sense: please comment.

    mixed *parse_string(string grammar, string sentence)

A grammar is a single string made up of rules.  It can look like this:

    token:	'regular_expression'
    whitespace:	'regular_expression'
    FOO =	BAR '+' GNU
    FOO =	token FOO ( BAR ) '+' GNU ? lpc_function_name

A rule either starts with `identifier :' or `idendifier ='.

A `:' rule specifies a class of tokens in a regular expression.  As a
special case, `whitespace :' specifies token-separating white space.

A `=' rule specifies a grammar rule.  All the usual stuff about context-free
grammars applies.  'xxx' in a rule specifies the literal string xxx, not
a regular expression.  Rather than specifying alternatives for a rule
with `|', additional rules must be given for the same non-terminals.

Optionally at the end of a rule comes `? function', which means that
application of this rule will be verified with a call to LPC.  Arguments
to the function, which is called in the object that does parse_string(),
will be all the words from the input that match the current rule, as an
array of strings.  If the function returns zero, the rule is assumed not
to match the input.

Optionally, `(' and `)' may be used in rules, in balanced pairs, to
indicate structuring of the input.  For example, if the rules

    FOO = 'first' ( BAR 'last' ) ? function
    BAR = 'one' 'two'

match the input, then the LPC function will be called with

    ({ "first", ({ "one", "two", "last" }) })

The return value of parse_string() is a parse tree of the input structured
as specified by use of `( )'.  If the input cannot be parsed, the value 0
is returned.

---

I haven't decided what type of grammar (LL, LR, LALR, other) to accept yet.

Dworkin