[DGD] Re: Verb Parsing Ideas

Tue May 12 00:30:26 CEST 1998

-----Original Message-----
From: Felix A. Croes <felix at xs1.simplex.nl>
To: dgd at list.imaginary.com <dgd at list.imaginary.com>
Date: Monday, May 11, 1998 3:54 PM
Subject: [DGD] Re: Verb Parsing Ideas

First off, thanks to Dan and Dworkin for the replies regarding mudlib
distribution.  They made a whole lot of sense and many good points were
raised.  I'll release the source once it's finished.

>How closely are you simulating the MudOS parser?

The extent which I can answer this question is limited by my knowledge of
the particulars of MudOS' parser.  Specifically, the extent it will go to
find a matching object when needed.  Syntactically speaking, though, I
believe it follows quite closely.

Verb themselves exist as separate objects.  For example a 'test' verb would
look something like the following:

#include <inherits.h>
inherit VERB_I;

static void create()
{
    ::create();
    add_rules(({ "", "OBJ" }));
    set_error_message("You can not test that.\n");
}

int can_test()
{
    return 1;
}

int can_test_obj(object oObject)
{
    return 1;
}

int do_test()
{
    write("You do a simple test.\n");
    return 1;
}

int do_test_obj(object oObject()
{
    write("You test " + object_name(oObject) + "\n");
    return 1;
}

For those that are familiar with Lima's implementation of the MudOS verb
parser, this should look familiar.  For those that aren't, here's a brief
summary of how it (mine) works:

* add_rules() defines the grammar rules that the input the player types
should be evaluated with.  General composition of the grammar rules is the
same, but the types of tokens used are different - explained later in this
email.

* The function can_<lowercase rule>() is called with the appropriate
parameters and give the verb a chance to verify whether or not the current
player can successfully use the verb given whatever requirements the verb
coder wishes to impose.  More often that not, returning a 1 will be used.
In fact, if the "can" function doesn't exist, it is assumed that the verb
can be used; returning a 1 and not having the function at all are one in the
same.  It should be noted that the "can" function will only be called if the
parser (parse_string()) is able to match the input with the given rule.  If
the "can" function returns 0, all parsing on the given input will stop.

* The function do_<lowercase rule>() is called with the same parameters as
the matching "can" function.  If this function is called, the parser matched
the input with the rule and the "can" function returned 1 (or didn't exist).
It is here in the "do" function that the action really takes place.  If the
"do" function returns 1, verb parsing stops and command parsing (%%) stops.
If the "do" function returns 0, verb parsing stops, but command parsing
continues.

>In what areas have you made improvements?

Out of not knowing for sure, I will make the assumption that the following
differ (and are good/better things) with respect to MudOS.

* The types of tokens used are different for the most part.  The ones that
are currently supported are: OBJ, OBJI, OBJE, OBJA, LIV, and DIR.  OBJ will
match an object that exists in the player's immediate inventory or immediate
environment.  OBJI will match an object that exists only in the player's
immediate inventory.  OBJE will match an object that exists only in the
player's immediate environment.  OBJA will match any phrase that matches an
ambiguous noun.  The object doesn't need to a loaded physical object - just
have the grammatical syntax of one.  LIV will match an object in the
player's immediate environment that is considered to be living (NPC/monster,
player).  DIR will match a direction.  It is relatively simple to expand the
rule parser to use new tokens - you just need to understand how the
parse_string() kfun works as the actual grammar strings are formed
dynamically.

* I consider an acceptable object syntax to be the following:

  <Optional Preposition> <Optional Article> <Optional Adjectives> <Noun>
<Optional Numerical Identifier>

The OBJA token will return an array with the more important bits of the
above information in the form:

({ ({ <adjective string array> }), <noun string>, <numerical id integer> })

Before the parse_string() kfun was functional, I had a similar system that
employed sscanf() and some rather heinous algorithms to parse grammars.  But
it also accepted numerical identifiers before the noun in word form ("first
ball" as opposed to "ball 1", "sixty-eighth axe" as opposed to "axe 68",
etc.).  This is not yet implemented, but will be soon.

* These same grammars can be used with add_action()-like functions called
triggers.  For those familiar with add_actions() will recognize the
following:

void init()
{
    add_action("test_function", "test");
}

int test_function(sting sInput)
{
    write(sInput + " was input after \"test\"\n");
    return 1;
}

The same thing can be accomplished with my system:

void init()
{
    add_trigger("test");
}

int do_test(string sInput)
{
    write(sInput + " was input after \"test\"\n");
    return 1;
}

The advantage of triggers come into play with the following example:

void init()
{
    add_trigger("test", ({ "OBJA" }));
}

int can_test_obja(mixed *mpObjPacket)
{
    write("Number of adjectives for Ambiguous Object: " +
        sizeof(mpObjPacket[0]) + "\n");
    write("Name of Ambiguous Object: " + mpObjPacket[1] + "\n");
    write("Which Ambiguous Object: " + mpObjPacket[2] + "\n");

    return 1;
}

In other words, triggers <can> be (but don't have to be) object-defined
verbs.  Of course, any number/combination of rules can be used with verbs
and triggers.  Oh, almost forgot.  Each rule is assigned to a 'verbrule'
object (where the parsing is actually done) to take advantage of the
driver-level cachine of parse information for a given grammar.  If the rule
"OBJA on OBJE" is used in 3 different verbs and in 4 triggers, there will
exist only one 'verbrule' object for that rule, not 7.

In short, I think the real advantages of this system over MudOS' are:

* Ability to parse ambigous objects.
* Ability to use parsing functionality on an object level via triggers.
* It's written in LPC, so anyone who understands how parse_string() grammars
are formed can alter/add/remove to their liking.

I hope that made some sort of sense. :)  Specifically, I would like to
solicit suggestions on new token types.  One I'm considering right now is
really just a modification to the OBJ token.  I think it would be useful to
specify an optional search depth via OBJ:2 (would immediate environment,
immediate inventory, and the respective environments/inventories of each of
those).  I don't know, though.  I could see it being useful (as I said), but
I could also see it being a huge performance issue.  It also complicates
rule caching as the number is part of the rule, but isn't used in the
grammar composition - multiple depths should still use one 'verbrule'
object.

(%%) Command parsing in CSLib is done in this order: alias conversion
(player only) -> movement -> verb -> bin (global, then personal) (player
only) -> channel (player only) -> soul -> triggers.

--
  Jason H. Cone
  Dept. Computer Science
  Texas A&M University
  jcone at cs.tamu.edu

List config page:  http://list.imaginary.com/mailman/listinfo/dgd