[DGD]Parse String
S. Foley
s_d_foley at hotmail.com
Thu Jun 7 03:49:13 CEST 2001
I apologize in advance if I am working off of an outdated parse_string
help file.
My first question relates to the type of 'operators' (I don't know what
else to call them) usable in token rules. The help file for parse
string indicates the following such operators are available:
>and, with regular expressions "a" and "b":
>
> a* zero or more occurrences of a (highest precedence)
> a+ one or more occurrences of a
> ab the concatenation of a and b
> a|b a or b
> (a) a (lowest precedence)
Yet looking at an example grammar Mr. Croes wrote I see the following:
>FLOAT_CONST = /[0-9]+\\.[0-9]*([eE][-+]?[0-9]+)/ \
>FLOAT_CONST = /[0-9]*\\.[0-9]+([eE][-+]?[0-9]+)/ \
Now from what I understand '?' is frequently used in regular expressions
to indicate 0 or 1 occurences of what precedes it. Is that what it is
being used for here? If so, are there any other 'operators' like this
that are frequently used in regular expressions that are useable in
token rules that are undocumented (assuming I have an up to date help
file for parse_string)?
I also have another question relating to precedence. The parse_string
help file states the following:
>For any regular expression, the longest possible token will be
>matched. The name "whitespace" is reserved for defining a special
>token, which is simply skipped. More than one rule may be specified > for
>each token, including whitespace.
>If a string matches more than token, the token for which the rule
>appears first in the grammar is selected. If a string does not match
>any token, it is rejected and parsing fails.
My question is if I define some tokens:
token1 = /[a-z]+/
token2 = /.*/
Despite the fact that the rule for token1 precedes the rule for token2,
nothing is ever going to be matched up to token1 because the longest
match precedence rule takes precedence over all? My experiments with
the function and my reading up on the subject of regexp's seem to
indicate that this would be the case.
I have one last theoretical question. I've been told that for every
regular expression it is possible to write an inverse regular expression.
That is to say, if I have a regular expression A, I can
write a regular expression B such that everything that A matches, B will
not match, and everything A does not match, B will match. How difficult
would it be to implement an 'operator' (like * or +) for token rules
that could be used to mean any string that does not match the regular
expression that precedes it? Would it be possible to write a front
end to parse_string using parse_string itself to generate B from A?
I have almost no background in the computer sciences, so I apologize
if any of these questions were trivial.
Thanks in advance,
S. D. Foley
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com
List config page: http://list.imaginary.com/mailman/listinfo/dgd
More information about the DGD
mailing list