[DGD] Working with parse_string()

Mon Jul 20 00:02:52 CEST 2009

Hello again,
First I'd like to thank you all for helpful replies in "Shadowing"
thread, I would never imagine there are so many active people on this
list, thats great :)

(small warning now - its long and insane post about try to parse
insane language, so if anyone doesn't really like this stuff, I warned
you ;)

While learning and trying to use parse_string I ran into a problem
which is quite serious one and I have already spent hours trying to
resolve it myself. My mudlib will be created to use with Polish
language, which is very (and I mean - very) complex if it comes to
using it in computer environment, parsing it etc. It uses tons of
strange forms, has tons of exceptions etc. etc. Most of these things
can be avoided (or partially avoided), but I encountered serious
problems when using parse_string to parse "simple" verb commands. To
explain how these problems usualy look like, lets consider this
example:

In English:   give [something] to [someone]

In Polish:     give [something, Accusative form] [someone, Dative form]

So the grammar here is really ambigous for parser. As you can see
there is no separator which helps with parsing, we're left with
possibility of 2 different object names existing next to each other in
one phrase. First problem, or rather difficulty (can be avoided but
requires alot more writing) is that we have cases, which mean 7
different forms of word depending on context. So in above example
context requires item thats given to be in Accusative form, and person
receiving to be in Dative. This information should be passed to LPC
function finding these objects, so it knows what case of object name
to compare (so when we search for object to give we check its name in
Accusative form). Very helpful addition could be then to include some
kind of additional arguments that would be passed down the parse tree
to every function call. In above example it could be:

VERB: give OBJI{case=accusative} LIV{case=dative}

OBJI: OBJ         ? find_object_in_inventory
LIV: OBJ           ? find_living

OBJ: ADJ NOUN
ADJ: ADJ word
ADJ: word
NOUN: word

Then, when parser calls "find_object_in_inventory" it calls it with
usual array argument PLUS whatever additional argument we specified in
rules above, so: find_object_in_inventory(string *arr, mixed case)
with case being equal to 'accusative' for OBJ and 'dative' for LIV. It
can be as well mapping of attributes so instead of unknown number of
arguments we could have (string *arr, mapping args), and args = ([
'case': 'accusative' ]) etc.

I don't really know if such thing is possible, but that would allow
parsing to be more "stateful" because right now all LPC functions that
are called don't know anything except for array of tokens that have
been parsed. Other workaround (simpler and not requiring any
modifications) would be just putting alot more grammar, and using rule
names as indication which case should be used to find objects:

VERB: give OBJI_ACC LIV_DAT

OBJI_ACC: OBJ         ? find_object_in_inventory_accusative
OBJI_DAT: OBJ         ? find_object_in_inventory_dative
[ and so on for all cases ]

LIV_ACC: OBJ           ? find_living_accusative
LIV_DAT: OBJ           ? find_living_dative
[ and so on for all cases ]

[ and the same for any other finder like OBJE OBJC and plural versions of them ]

OBJ: ADJ NOUN
ADJ: ADJ word
ADJ: word
NOUN: word

So its not that bad, but really ugly (requires having multiple
functions & grammar rules to satisfy all cases for all possible object
searches), so I'm wondering if its even possible to have such
extension to parse_string, or its against some design decisions. I
don't really know C too much, I'm more into object languages like
Ruby, Smalltalk, Python and of course LPC, so I can't even try to
write such extension on my own. I'm curious if you ever had similar
problem and found any "cleaner" solution than one described above.

Thats the first problem. The other one is far more complex than this
and involves parsing two object names that are not separated by any
separator like "to". I will again use example presented above, but
lets ignore cases for now, just assume that finding function magically
knows what case to look for. So we start with:

VERB: give OBJI LIV

OBJI: OBJ         ? find_object_in_inventory
LIV: OBJ           ? find_living

OBJ: ADJ NOUN
ADJ: word ADJ
ADJ: word
NOUN: word

This is just sample grammar but similar to what I use. This means we
can have following possibilities:

give OBJECT[noun] LIVING[noun]
give OBJECT[noun] LIVING[adj noun]
give OBJECT[noun] LIVING[adj adj noun]
give OBJECT[adj noun] LIVING[noun]
give OBJECT[adj noun] LIVING[adj noun]
give OBJECT[adj noun] LIVING[adj adj noun]
give OBJECT[adj adj noun] LIVING[noun]
give OBJECT[adj adj noun] LIVING[adj noun]
give OBJECT[adj adj noun] LIVING[adj adj noun]

You'll probably think I'm crazy by now :) Somehow I managed to find a
way to parse this successfully using "return nil" in LPC functions
that are object finders (find_*). I'm not sure if my use of this is
allowed and not some very bad habit, but this is how I made it work:

find_object_in_inventory(arr) {
    ob = FIND_IN_INVENTORY(arr)   /* try to find objects in inventory
matching following array, considering last element in array as noun
and others as adjectives */
    if (ob) {
       return ({ ob });
    }
    else {
       return nil;
    }
}

find_living(arr) {
    ob = FIND_LIVING(arr)   /* try to find living objects in
environment matching following array, considering last element in
array as noun and others as adjectives */
    if (ob) {
       return ({ ob });
    }
    else {
       return nil;
    }
}

This approach works well if we assume all input will result in
existing objects. Returning "nil" when it couldn't find proper object
allowed it to iterate over another set of parsed phrases and finally
it ended with one word to check. Then if it found object to give it
started to look for living using all tokens that were left. I will
provide few examples of how this worked for both existing and non
existing items:

1. give sword tall man
Both sword and tall man exist.

    ({ "sword", "tall" })  => find_object_in_inventory()
       FAIL, LPC func returns nil so parser tries with new token set
    ({ "sword" })       => find_object_in_inventory()
       SUCCESS!
    ({ "tall", "man" })    => find_living()
       SUCCESS!

 so parser returned ({ give, ({ SWORD_OB }), ({ MAN_OB }) })

3. give sword weird man
Sword exists but weird man doesnt, parser tried it like this:

     ({ "sword" })   => find_object_in_inventory()
         SUCCESS
     ({ "weird", "man" }) => find_living()
         FAIL, returns nil but whole tree gets discarded and there are
no more tokens to try, so in effect parse_string returns nil, making
me unable to process these result and form some meaningful message to
user (other than maybe message: Give [what] [whom]?, without telling
player which object name he specified wrong, was it item to give, or
maybe target?)

So in above example parser returns NIL instead of something like: ({
give, ({ SWORD_OB }), ({ }) }) that I hoped to get :(

Becasue I want to pass this array later to some function like
do_give(object, target) I'd like to give player some meaningful
message when one or more arguments can't be parsed to objects, but if
whole parse_string fails with NIL I don't even know what went wrong. I
tried to play with LPC functions and what value they return but it
seems to be either array (which doesnt do anything, because if its
array it means parse_string already succeeded in parsing and it
proceeds further, so with empty array it proceeded with next tokens
for LIV rule, instead of having few more tries for different token
sets matching OBJI rule) or NIL (whole tree is discarded, which works
nicely to find objects if all of them exist but if target or object
can't be found it just returns nil as a result of whole parsing).

So when I changed return nil to return ({ }) in cases where no object
matching given name tokens could be found this is what happened:

3. give sword tall man
Both sword and tall man exsist.

   ({ "sword", "tall" })  =>  find_object_in_inventory()
      FAIL, returns ({ }) but parser thinks its been successful and
considers "sword" & "tall" to be matching the pattern so we're left
with
   ({ "man" })             => find_living()
      SUCCESS, but FAIL - even though we find someone matching "man",
its completly inaccurrate because we didn't find "sword" and we should
be looking for "tall man", not just "man"

I know main problem here is lack of any separator that would make
whole thing less ambigous, but its just not possible with this
language, so all I can do is finding exact objects if parser allows
some "trial and error" methodology. And it does if I return nil, but
then I run into this problem with parser failing all over if parser
can't find any object matching one of required arguments. I know it
may be not even a parser problem, because its probably not supposed to
work in such trial mode, but if its allowed, is there any chance to
have some way to force it to try few rules but if it goes down to one
word and still can't find anything it should just make it empty array,
not discard whole parsing. The only way to parse something so crazy
like OBJI LIV being next to each other is just trial&error while
trying to resolve object on the fly while parsing. It would be
impossible without trying to find objects while parsing, because there
can be way too many combinations of adjectives & nouns and parser will
never be able to distinct them.

I don't really think I explained it well enough to have anyone knowing
what I'm talking about at this point, but maybe someone had similar
problems doing some parsing with parse_string and can give me a hint,
maybe I'm just doing it all wrong :) This language is really hell to
parse, its been done once for a MUD but it was CD mudlib and it
required its parse functions to be recoded in C to allow such complex
parsing - I just wonder if it is possible to achieve this with
parse_string or I'm just banging my head against the wall.

If anyone read this far, thanks for patience, if anyone is interested
in help but what I described is not clear enough I can try again later
and rephrase it, give better examples etc.

Best regards,
KN