[DGD]parse_string
Ludger Merkens
balduin at uni-paderborn.de
Fri Jan 28 13:31:05 CET 2000
Hello,
I currently try to use parse_string(a,b) to do some preprocessing of
html files. Have a look at my first version, it should separate
tags "<asd dsa>" from text. As well as stripping some chars like
!,-,[] and so on.
grammar = " \
whitespace=/[ \t\n\r\v\f]+/ \ /* strip some chars */
whitespace=/[!-\\[\\]|]+/ \ /* some more */
tag=/<[^<]*>/ \ /* tags start with <, end with >
term =/[a-zA-Z]+'s/ \ /* e.g. GNU's */
term =/\\(*[a-zA-Z`']+\\)*/ \ /* Hello or 'Hello' */
eol =/<[^<>]*/ \ /* < but neither < nor > follows. */
s : tag s \
s : term s \
s : term \
s : eol \ /* broken tag at end of chunk */
s : tag \
";
First of all, I want to know how the whitespace rules are used, e.g. I
stumbled over a tag containing \n in spite of the first whitespace rule.
Second I need some details about efficiency of parsing. If I use this
grammar on a string containing html with length of 1000 chars, I have
a cost of about 25240 ticks. Whereas I get for 2000 chars 181664 ticks
and for 3000 even 591632.
- Which execution costs do I have to expect O(n^2), O(n^3) ... (be n the
length of the string)
- In how far is the cost dependend on the grammar, are there rules which
are parsed faster than others? Are there some good books, url's how to
build a well formed (fast parsed) grammar (kfg)
Do you have a better solution for this problem ?
Ludger Merkens
(balduin at uni-paderborn.de)
List config page: http://list.imaginary.com/mailman/listinfo/dgd
More information about the DGD
mailing list