[DGD] parse_string question

Noah Gibbs noah_gibbs at yahoo.com
Tue Jan 6 09:39:36 CET 2004


Despite the earlier question, I'm not parsing LPC :-) 
I'm using DGD 1.2.69, if that matters.

The enclosed grammar is big.  I apologize for that. 
I'm mainly just hoping that somebody knows what my
tokenizing problem could be.  If so, I'll look silly,
and then slink off and fix the problem :-)  If I can't
fix it myself soon and nobody here knows offhand, I'll
 see how small I can get a test case.

I'm reworking Phantasmal's command syntax, using a
parse_string grammar.  Since string literals
automatically override nonterminal tokens in the same
rule (so says the documentation), I'm making sure to
parse everything as a nonterminal token first.  To
make sure we can have ambiguity in all the right
places, I'm sorting all words into categories by what
parts of speech they can represent.  I'm faking the
verbs and adverbs for now, but using the nouns and
adjectives from all the objects everywhere in
Phantasmal.

All that is working.

But I've hit a weird problem.  I know that
parse_string has to fully tokenize your input, and it
does so with no ambiguity -- there is only *one*
tokenization.  That's fine.  And tokenizing is
resolved in favor of the earliest rule in the file --
if more than one rule matches, the first one is what
the token is parsed as.  My tokenizing is (to begin
with) unambiguous, so no sweat.

Again, so far, so good.

I made a grammar that accepts trivial little commands
using this system, and it parses intransitive verbs
and simple transitive verbs with no problems.

Then I add a 'bad token' rule.  The rule looks like
this:

bad_token = /.*/

It comes after all other token rules in the file.  So
it shouldn't affect any existing parseable grammars,
but it should make a bunch of inputs stop giving
"Invalid token at offset XXX" errors and start just
returning nil.

No dice.  It turns out that adding the bad token rule
suddenly makes simple transitive verbs ("get lamp",
"look road") stop parsing.  I have no idea why.  It
seems like that shouldn't be possible since everything
that gets tokenized at all without the bad_token rule
should be tokenized the same way after it gets added. 
It *really* seems like intransitive verbs and
transitive verbs shouldn't be treated differently from
each other.

Anyway, I'm enclosing my postprocessed grammar,
including the bad_token rule.  To see it work, remove
the bad_token rule.

-------------------------------------------------

av = /(fast)|(quickly)|(really)|(slowly)/
iv =
/(bug)|(idea)|(inv)|(inventory)|(typo)|(users)|(who)/
iv = /(whoami)/
av_iv = /@@1@@/
tv =
/(drop)|(get)|(grab)|(place)|(put)|(remove)|(take)/
av_tv = /@@2@@/
iv_tv = /(exa)|(examine)|(glance)|(look)/
av_iv_tv = /@@3@@/
n =
/(admin)|(ball)|(blemish)|(block)|(box)|(breadbox)/
n =
/(building)|(carpet)|(chair)|(chairs)|(checkers)|(clearing)/
n =
/(clumps)|(coffin)|(couch)|(couches)|(dagger)|(dog)/
n =
/(dorms)|(dust)|(furniture)|(gaslamp)|(ghak'la)|(ghakla)/
n =
/(grass)|(grasses)|(grime)|(ground)|(hallway)|(in)/
n =
/(insignia)|(knife)|(label)|(lamppost)|(lightpost)/
n =
/(locker)|(logo)|(lounge)|(mouth)|(northeast)|(northwest)/
n =
/(out)|(outline)|(picture)|(pineapple)|(polish)|(post)/
n =
/(road)|(room)|(rug)|(sarcophagus)|(sculpture)|(shed)/
n =
/(sofa)|(sofas)|(southeast)|(southwest)|(spot)|(stair)/
n =
/(stairs)|(statue)|(step)|(steps)|(street)|(streets)/
n =
/(table)|(tables)|(threads)|(tiles)|(tomb)|(tower)/
n =
/(turnip)|(uvula)|(vegetable)|(walls)|(weasel)|(windows)/
n = /(wombat)/
av_n = /@@4@@/
iv_n = /(east)|(north)|(south)|(west)/
av_iv_n = /@@5@@/
tv_n = /@@6@@/
av_tv_n = /@@7@@/
iv_tv_n = /@@8@@/
av_iv_tv_n = /@@9@@/
ad =
/(academic)|(angular)|(arkham)|(around)|(battered)/
ad =
/(bread)|(brick)|(carpeted)|(ceremonial)|(checkered)/
ad =
/(curved)|(decayed)|(decaying)|(decrepit)|(distant)/
ad =
/(dry)|(dull)|(dusty)|(economics)|(engineering)|(english)/
ad =
/(faded)|(faint)|(frayed)|(fraying)|(front)|(gardener's)/
ad =
/(gas)|(gently)|(gently-vibrating)|(gouged)|(grassy)/
ad =
/(gray)|(grey)|(grimy)|(hard)|(heavy)|(hideous)|(history)/
ad =
/(hostel's)|(humanities)|(immobile)|(inside)|(jagged)/
ad =
/(juicy)|(languages)|(large)|(largish)|(lifeless)|(limp)/
ad =
/(literature)|(long)|(maintained)|(math)|(mathematics)/
ad =
/(meat)|(merritt)|(messy)|(mobile)|(modern)|(moving)/
ad =
/(nicked)|(not)|(not-quite-spiral)|(odd)|(old)|(orange)/
ad =
/(painted)|(peaceful)|(pickford)|(plump)|(polished)/
ad =
/(quite)|(red)|(rubber)|(school)|(science)|(sciences)/
ad =
/(scratched)|(small)|(spiral)|(stained)|(sticky)|(stuffed)/
ad =
/(surrounding)|(swirly)|(threadbare)|(tiled)|(tool)/
ad =
/(tufted)|(ugly)|(vibrating)|(weathered)|(well-maintained)/
ad =
/(wood)|(wooden)|(worn)|(wrap)|(wrap-around)|(wraparound)/
av_ad = /@@10@@/
iv_ad = /@@11@@/
av_iv_ad = /@@12@@/
tv_ad = /@@13@@/
av_tv_ad = /@@14@@/
iv_tv_ad = /@@15@@/
av_iv_tv_ad = /@@16@@/
n_ad =
/(checker)|(checkerboard)|(city)|(double-u)|(doubleyou)/
n_ad =
/(dubya)|(floor)|(green)|(hall)|(halls)|(hostel)|(lamp)/
n_ad =
/(lantern)|(light)|(marsh)|(peabody)|(porch)|(telman)/
n_ad = /(test)|(tile)|(tufts)|(w)/
av_n_ad = /@@17@@/
iv_n_ad = /@@18@@/
av_iv_n_ad = /@@19@@/
tv_n_ad = /@@20@@/
av_tv_n_ad = /@@21@@/
iv_tv_n_ad = /@@22@@/
av_iv_tv_n_ad = /@@23@@/
punctuation = /[\.;!:?]/
article = /((a)|(the))/
bad_token = /.*/
line: 
iverb: iv
iverb: iv_n
iverb: iv_n_ad
iverb: iv_ad
iverb: iv_tv
iverb: iv_tv_n
iverb: iv_tv_n_ad
iverb: iv_tv_ad
iverb: av_iv
iverb: av_iv_n
iverb: av_iv_n_ad
iverb: av_iv_ad
iverb: av_iv_tv
iverb: av_iv_tv_n
iverb: av_iv_tv_n_ad
iverb: av_iv_tv_ad
tverb: tv
tverb: tv_n 
tverb: tv_n_ad
tverb: tv_ad
tverb: iv_tv
tverb: iv_tv_n
tverb: iv_tv_n_ad
tverb: iv_tv_ad
tverb: av_tv
tverb: av_tv_n
tverb: av_tv_n_ad
tverb: av_tv_ad
tverb: av_iv_tv
tverb: av_iv_tv_n
tverb: av_iv_tv_n_ad
tverb: av_iv_tv_ad
noun: n
noun: n_ad
noun: iv_n
noun: iv_n_ad
noun: tv_n
noun: tv_n_ad
noun: iv_tv_n
noun: iv_tv_n_ad
noun: av_n
noun: av_n_ad
noun: av_iv_n
noun: av_iv_n_ad
noun: av_tv_n
noun: av_tv_n_ad
noun: av_iv_tv_n
noun: av_iv_tv_n_ad
adject: ad
adject: n_ad
adject: iv_ad 
adject: iv_n_ad 
adject: tv_ad
adject: tv_n_ad
adject: iv_tv_ad
adject: iv_tv_n_ad
adject: av_ad
adject: av_n_ad
adject: av_iv_ad 
adject: av_iv_n_ad 
adject: av_tv_ad
adject: av_tv_n_ad
adject: av_iv_tv_ad
adject: av_iv_tv_n_ad
line: sentence 
line: sub_line 
line: sub_line sentence 
sub_line: sentence punctuation 
sub_line: sub_line sentence punctuation 
sentence: indep_clause
indep_clause: iverb
indep_clause: tverb noun
npr: adj_noun
npr: det adj_noun
adj_noun: adjp noun
np: npr
adjp:
adjp: adjp adject
det: article


=====
------
noah_gibbs at yahoo.com

__________________________________
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus
_________________________________________________________________
List config page:  http://list.imaginary.com/mailman/listinfo/dgd



More information about the DGD mailing list