[MUD-Dev] string parsing

Wed Oct 29 22:33:43 CET 1997

[Felix C:]

:What follows is a description of something I am currently implementing
:for my server.  I am providing it here for others to comment on.  Also,
:I'd like to hear of other approaches to the same problem.

In general, I think such a facility is a very good idea. That's why I have
one in my system! I've found it to be useful in a couple of places, such
as "sublanguages" for my online building commands, etc.

One of my main concerns, however, is efficiency. LPC isn't as fast as
native code for string handling, so anything you do on a per-character
basis is going to be expensive. In my system, I did a number of builtin
(native) functions to perform most of the core of the work. You can
probably do that too.

Another efficiency issue is that you should probably "compile" the
grammars into an internal form. Especially if the grammar is likely to
be used more than once or twice. Traversing through structures is more
efficient than going back and forth in strings. Note that both lex and
yacc essentially "compile" into a more efficient representation. I
believe there are a number of regexp packages out there that compile the
regexp, thus making the matches faster (up to an order of magnitude).
Of course, I'm not at all sure how you would adapt one of them to your
purposes! (I don't use regexp's myself - just some custom code.)

Some other issues I've bumped into relating to this kind of parsing:

- some kind of "escape" commands are very useful. For example, an input
    line starting with a quote is usually speech, regardless of what the
    rest of the line contains. You might be able to handle that OK with
    your grammars, or you could do preprocessing on input lines. Similar
    for using command aliases.

- some kind of quoting mechanism is useful. For you, perhaps it can be
    put into the lexical definition of a 'word'. This is useful when
    the normal whitespace rules do something undesireable, or for the
    exceptional cases that always seem to come up.

- interpretation can sometimes be ambiguous. I note that you seem to be
    intending that commands often have keywords within them that separate
    the various pieces (e.g. put <obj> IN <obj>). They also end up
    distinguishing various forms of input (versus put <obj> DOWN). Those
    keywords will occasionally also be verbs themselves (like DOWN). I
    don't see any problems for you here, since you can put all of the
    various forms in alternative productions.

- the resolution of a noun-phrase string to a MUD-world object can be
    context dependent on the verb in the command. E.g. 'pick up the rock'
    can resolve to a different rock than 'take the rock from the sack'.
    I handle this by not doing the resolution until I am inside the code
    for the specific verb. Perhaps you could pass the verb to your
    'find_obj' and 'find_liv' routines, but that doesn't result in very
    modular verbs.

My current system is less like your proposal than an earlier one I did.
I don't quite remember why I changed! I do recall, however, that I ended
up with a *lot* of rules in the grammar, to handle the various ways that
the player could give a command. I didn't do the lexical stuff, just the
syntactic level. The lexical stuff was hard-coded.

In your last "give sword to dwarf" example, what would parse_string return
if find_obj couldn't find a sword? An array containing the words of the
noun-phrase, so that higher-level code could properly complain about not
being able to find a sword?

--
Chris Gray   cg at ami-cg.GraySage.Edmonton.AB.CA