[DGD] Re: parse string difficulties
Robert Forshaw
iouswuoibev at hotmail.com
Sun Mar 28 22:13:13 CEST 2004
>From: Erwin Harte <harte at is-here.com>
>I like a challenge like that and did some experimenting. This is the
>grammar I came up with:
>
> string query_grammar()
> {
> return
> "whitespace = /[\b\r\t ]+/\n" +
> "newline = /\n/\n" +
> "word = /[a-zA-Z0-9]+/\n" +
> "operator = /[\\.\\+\\=\\-]+/\n" +
>
> "SENTENCE : OPERATION ? fun_a\n" +
> "SENTENCE : SENTENCE OPERATION ? fun_b\n" +
>
> "OPERATION : word operator word newline ? fun_1\n" +
> "OPERATION : word operator newline ? fun_2\n" +
> "OPERATION : operator word newline ? fun_3\n";
> }
>
>You need to double-escape the ., +, = and - so that the parse_string()
>kfun actually _sees_ \. while "\." is identical to "." (hope that made
>sense). You didn't include digits in your original word regexp.
>
>I took the newline out of the whitespace regexp so that it could be
>used separately and avoid grammar confusion between
>
> word operator word
> operator word
>
>and
>
> word operator
> word operator word
>
>which would otherwise be impossible to distinguish reliably.
>
>The fun_a and fun_b functions create and append to lists of
>word/operator/word combinations.
>
> static mixed *fun_a(mixed *tree)
> {
> return ({ tree });
> }
>
> static mixed *fun_b(mixed *tree)
> {
> return ({ tree[0] + ({ tree[1] }) });
> }
>
>The fun_1, fun_2 and fun_3 functions fill in the blanks (nils) where
>appropriate and create 3-tuples (3-sized arrays) in the order you
>wanted.
>
> static mixed *fun_1(mixed *tree)
> {
> return ({ ({ tree[1], tree[0], tree[2] }) });
> }
>
> static mixed *fun_2(mixed *tree)
> {
> return ({ ({ tree[1], tree[0], nil }) });
> }
>
> static mixed *fun_3(mixed *tree)
> {
> return ({ ({ tree[0], nil, tree[1] }) });
> }
>
>Throwing something like ".food\nweight=8\n.chocolate\n" at it, it
>returns to me with:
>
> ({ ({ ({ ".", nil, "food" }),
> ({ "=", "weight", "8" }),
> ({ ".", nil, "chocolate" }) }) })
>
>In general:
>
> static mixed *parse_text(string text)
> {
> mixed result;
>
> result = parse_string(query_grammar(), text);
> return result ? result[0] : nil;
> }
This is great, I'm glad you've made something that actually works. I've
poured over it for a good half hour and there are still some parts that
confuse me. I understand the token rules but the production rules still have
me baffled. There are a few bits I'm still uncertain about and I'd like to
take a stab at guessing what they do:
"SENTENCE : OPERATION ? fun_a\n" +
"SENTENCE : SENTENCE OPERATION ? fun_b\n" +
Now presumably that first line is where it all begins. And the entire string
is regarded as an 'OPERATION'. Of course, the entire string is composed of
several operations, so the second line repeatedly acts to break the string
down into many smaller operations. Is this correct? So, "op1\nop2\nop3\n",
when having the first rule applied to it, becomes ({ OPERATION }) and then
the second rule... no wait, I don't get it, I really am still guessing here.
Could you explain exactly what happens on these two lines, and why the
functions need to be there? I know you already explained what the functions
were for but I don't understand how it relates to these two rules. It really
is confusing me.
_________________________________________________________________
Express yourself with cool new emoticons http://www.msn.co.uk/specials/myemo
_________________________________________________________________
List config page: http://list.imaginary.com/mailman/listinfo/dgd
More information about the DGD
mailing list