[MUD-Dev] string parsing

Thu Oct 30 18:57:53 CET 1997

Ola Fosheim Grøstad <olag at ifi.uio.no> wrote:
>[...]
> Hi Felix, I salute your effort in trying to introduce more intelligent
> grammarbased parsing to LPC. (hopefully you take advantage of the
> freely available libraries for regexp's etc)

Um, why do you hope this?  I usually ignore existing code (as opposed
to algorithms) because I enjoy programming.

> I am currently sketching ways of using grammars for *generating*
> output, as opposed to the more traditional parsing of input. Have you
> considered to include this in your work?  As you are working with a
> textbased system I think this could be useful for programming NPC
> behaviour or at the very least to aid the simulation meaningful
> talking.

Hmm.  You could write a grammar for the language of annotated BNF
grammars, and use it with a string parser for syntax-directed
translation of an output grammar into code for a meaningfully talking
NPC.

Note that I am not working with a text-based system per se.  It is
not just text that has to be parsed.

> >    Thing: Noun
> >    Thing: ( Adjs ) Noun
> >    Noun: word
> >    Adjs: word Adjs
> >    Adjs: word
> >
> >parse_string(grammar, "bottle") will return
> >
> >    ({ "bottle" }),
>
> Uhm, shouldn't it return that "bottle" is a *Thing* ?  Isn't that the point
> with parsing? Sortof?  I guess I missed you somewhere...

If I had wanted it to be known as a Thing, I would have made a rule
like the following:

    Thing: Noun		? its_a_thing

with this LPC function:

    mixed *its_a_thing(string *rule)
    {
	/*
	 * the argument is an array of a single word, make clear that
	 * it is a thing
	 */
	return ({ THING_TAG, rule[0] });
    }

and the result, generalized, would have been

    ({
	...
	({ THING_TAG, "bottle" })
	...
    })

> >breadth-first LR(k) parsing algorithm.  Next, any LPC functions specified
> >for rules are called (bottom-up), and the intermediate parse trees for
> >those rules are replaced by the return values of those functions.  As
> >a special case, if the LPC function returns zero, the entire parse tree
> >that this intermediate parse tree is part of is discarded.  Finally,
> >the resulting parse tree(s) are returned.
>
> So your functioncalls are a mixture of a "generating rule" and a
> "production rule"?  Won't you "loose" some information in the process
> in the case of user errors, or is that the responsibility of the
> function being called?  I'm probably misinterpreting you, parserdesign
> is not my primary interst although I have some interests in the field.

If you want to avoid information loss, you can let the function return
an array, the first element of which is the original subtree, and the
second element being the meaning that the function gives to that subtree.
You can also use alternative grammar rules to catch user errors, for
instance:

    GiveCommand: 'give' Object 'to' Living		? do_give
    GiveCommand: 'give' Object 'to' ZeroOrMoreWords	? give_to_whom
    GiveCommand: 'give' Object				? give_to_whom
    GiveCommand: 'give' ZeroOrMoreWords			? give_what

The part that I didn't specify, of course, is which of these
production rules is eventually picked: a valid give command
matches all of them.  My options seem to be to either select
the first matching one, or to sort the resulting parse trees
by the order of the production rules that were used to generate
them (resulting in the "most valid" parsing being first).  I
am still undecided about this.

Felix Croes