[DGD] Re: parse string difficulties

Sun Mar 28 22:13:13 CEST 2004

>From: Erwin Harte <harte at is-here.com>
>I like a challenge like that and did some experimenting.  This is the
>grammar I came up with:
>
>     string query_grammar()
>     {
>	return
>	    "whitespace = /[\b\r\t ]+/\n" +
>	    "newline    = /\n/\n" +
>	    "word       = /[a-zA-Z0-9]+/\n" +
>	    "operator   = /[\\.\\+\\=\\-]+/\n" +
>
>	    "SENTENCE   : OPERATION          ? fun_a\n" +
>	    "SENTENCE   : SENTENCE OPERATION ? fun_b\n" +
>
>	    "OPERATION  : word operator word newline ? fun_1\n" +
>	    "OPERATION  : word operator      newline ? fun_2\n" +
>	    "OPERATION  :      operator word newline ? fun_3\n";
>     }
>
>You need to double-escape the ., +, = and - so that the parse_string()
>kfun actually _sees_ \. while "\." is identical to "." (hope that made
>sense).  You didn't include digits in your original word regexp.
>
>I took the newline out of the whitespace regexp so that it could be
>used separately and avoid grammar confusion between
>
>   word operator word
>   operator word
>
>and
>
>   word operator
>   word operator word
>
>which would otherwise be impossible to distinguish reliably.
>
>The fun_a and fun_b functions create and append to lists of
>word/operator/word combinations.
>
>     static mixed *fun_a(mixed *tree)
>     {
>	return ({ tree });
>     }
>
>     static mixed *fun_b(mixed *tree)
>     {
>	return ({ tree[0] + ({ tree[1] }) });
>     }
>
>The fun_1, fun_2 and fun_3 functions fill in the blanks (nils) where
>appropriate and create 3-tuples (3-sized arrays) in the order you
>wanted.
>
>     static mixed *fun_1(mixed *tree)
>     {
>	return ({ ({ tree[1], tree[0], tree[2] }) });
>     }
>
>     static mixed *fun_2(mixed *tree)
>     {
>	return ({ ({ tree[1], tree[0], nil }) });
>     }
>
>     static mixed *fun_3(mixed *tree)
>     {
>	return ({ ({ tree[0], nil, tree[1] }) });
>     }
>
>Throwing something like ".food\nweight=8\n.chocolate\n" at it, it
>returns to me with:
>
>   ({ ({ ({ ".", nil, "food" }),
>         ({ "=", "weight", "8" }),
>         ({ ".", nil, "chocolate" }) }) })
>
>In general:
>
>     static mixed *parse_text(string text)
>     {
>         mixed result;
>
>         result = parse_string(query_grammar(), text);
>         return result ? result[0] : nil;
>     }

This is great, I'm glad you've made something that actually works. I've 
poured over it for a good half hour and there are still some parts that 
confuse me. I understand the token rules but the production rules still have 
me baffled. There are a few bits I'm still uncertain about and I'd like to 
take a stab at guessing what they do:

	    "SENTENCE   : OPERATION          ? fun_a\n" +
	    "SENTENCE   : SENTENCE OPERATION ? fun_b\n" +

Now presumably that first line is where it all begins. And the entire string 
is regarded as an 'OPERATION'. Of course, the entire string is composed of 
several operations, so the second line repeatedly acts to break the string 
down into many smaller operations. Is this correct? So, "op1\nop2\nop3\n", 
when having the first rule applied to it, becomes ({ OPERATION }) and then 
the second rule... no wait, I don't get it, I really am still guessing here. 
Could you explain exactly what happens on these two lines, and why the 
functions need to be there? I know you already explained what the functions 
were for but I don't understand how it relates to these two rules. It really 
is confusing me.

_________________________________________________________________
Express yourself with cool new emoticons http://www.msn.co.uk/specials/myemo

_________________________________________________________________
List config page:  http://list.imaginary.com/mailman/listinfo/dgd