[DGD] Re: parse string difficulties

Erwin Harte harte at is-here.com
Sun Mar 28 22:31:15 CEST 2004


On Sun, Mar 28, 2004 at 08:13:13PM +0000, Robert Forshaw wrote:
> >From: Erwin Harte <harte at is-here.com>
> >I like a challenge like that and did some experimenting.  This is the
> >grammar I came up with:
> >
> >    string query_grammar()
> >    {
> >	return
> >	    "whitespace = /[\b\r\t ]+/\n" +
> >	    "newline    = /\n/\n" +
> >	    "word       = /[a-zA-Z0-9]+/\n" +
> >	    "operator   = /[\\.\\+\\=\\-]+/\n" +
> >
> >	    "SENTENCE   : OPERATION          ? fun_a\n" +
> >	    "SENTENCE   : SENTENCE OPERATION ? fun_b\n" +
> >
> >	    "OPERATION  : word operator word newline ? fun_1\n" +
> >	    "OPERATION  : word operator      newline ? fun_2\n" +
> >	    "OPERATION  :      operator word newline ? fun_3\n";
> >    }
> >
[...]
> 
> This is great, I'm glad you've made something that actually works. I've 
> poured over it for a good half hour and there are still some parts that 
> confuse me. I understand the token rules but the production rules still 
> have me baffled. There are a few bits I'm still uncertain about and I'd 
> like to take a stab at guessing what they do:
> 
> 	    "SENTENCE   : OPERATION          ? fun_a\n" +
> 	    "SENTENCE   : SENTENCE OPERATION ? fun_b\n" +
> 
> Now presumably that first line is where it all begins. And the entire 
> string is regarded as an 'OPERATION'. Of course, the entire string is 
> composed of several operations, so the second line repeatedly acts to break 
> the string down into many smaller operations. Is this correct?

Turning the input into tokens and juggling the tokens around to match
the production rules are done entirely separate.  It's not like
parse_string() thinks "Ok, I've got the first OPERATION match, let's
see what comes next in this input string" if you see what I mean.

So, given something like ".food\nweight=8\n.chocolate\n" it'll break
that up into the following tokens no matter what:

    <operator> <word> <newline> <word> <operator> <word>
    <newline> <operator> <word> <newline>

Then the production rules come into play.  Beginning at the bottom,
basically.

The parse_string() code might for instance say "Hey, that set of
<operator> <word> <newline> at the end, I can match that to an
OPERATION rule, let's see what happens if I use that."

That means that fun_3 is presented with a 'mixed *tree' parameter that
has as value

    ({ ".", "chocolate", "\n" })

and it replaces that with:

    ({ ({ ".", nil, "chocolate" }) })

Meaning you replace a list of tokens by a 3-sized array.  If the
combination of tokens matches one of the other patterns, something
similar happens.

On a higher level, we have the 'SENTENCE' production rule that says
that, well, a SENTENCE can either be a single OPERATION, or a SENTENCE
followed by an OPERATION.  That's parse_string()ese for saying that a
SENTENCE is "one or more OPERATIONs".

Hope this gets you a few steps further in the right direction.  I'm
not going to write a full-blown introduction because I think there are
other resources out there that explain grammars better than I can. :-)

Cheers,

Erwin.
-- 
Erwin Harte <harte at is-here.com>
_________________________________________________________________
List config page:  http://list.imaginary.com/mailman/listinfo/dgd



More information about the DGD mailing list