[MUD-Dev] Parser engines

Fri Mar 12 09:18:41 CET 2004

Malcolm W. Tester II wrote:

> is the type of person who codes it.  And in general, it seems
> Americans are lazier about it.  I am an American, so I have a
> right to call us lazy :) Both of the parsers in these muds were
> written by other nationalitys.  One was by a German guy, the other
> was by a Swedish guy.  Both are based off the original lp 2.4.5.
> Could they

Speaking of other languages, does anyone have experience with GOOD
parsers that work with languages whose grammar/origin is very
different from English? (Has anyone ever tried?)

For example: Japanese (3 character sets, no spaces, verb at end,
enter text with an IME), Chinese (2 character sets, no spaces, enter
text with an IME), Arabic (non-Roman character set, enter text with
an IME, ???), or even Finnish (which I've been told likes to combine
verbs and nouns, or something of the sort). The Inform designers
guide discusses porting to English's cousins like French and German,
but not the more distant language groups.

I posted on rec.arts.int-fiction and received a helpful response
about some of the oddities about Japanese. No one else seems to have
any info.

I am interested in localization, so one concern I have (but which
most people won't) is that if I spend a huge amount of time writing
an awesome english parser "aggressively attack the second orc to the
right of the tree" (to exaggerate) then:

  a) No sane person will be able to localize the same functionality
  into other languages. (I don't know enough other languages to do
  localization myself.)

  b) All the wonderful parsing will be unseen by most users because
  they don't need it, even in an IF oriented enviornment. Users on
  rec.arts.int-fiction don't seem too keen on going beyond anything
  in TADs or Inform, which implies to me that maybe it's not
  necessary.

I have been thinking about parsing lately. If you intend to write
one, here are some other things to think about:

  a) You might eventually want to connect speech recognition to your
  parser so users can speak the commands. For this to work you need
  to compile your entire grammer into a CFG (context-free grammar)
  (BNF format or whatever).  If you start with a CFG model, speech
  recognition will be easy to add at some point in the future. If
  you use another approach then it will be very difficult to bolt
  speech recognition on top.

  b) It seems to me that the parser for commands is different than
  the parser you'll need for modelling conversations with NPCs. I
  haven't thought enough about this yet, but conversations will
  probably need some sort of probabilitsic heuristic. "Tell me about
  King Leopold?", "What do you know about the king?", "Who is lord
  Leopold?", and "Does the local monarch have anything interesting
  in his castle?" all mean the same thing to a NPC, but a CFG won't
  do so well.

  c) If you talk to a professional linguist they'll probably tell
  you the first step is to determine each words' part-of-speech,
  which is a non-trivial problem. I have some links for possible
  solutions to this problem. Once you have POS you can generate a
  parse tree based on the POS associations of the language (Adj
  before N, in English, etc.). This then lets you identify
  verb-phrase, subject, object, etc. You can also use a word-net
  and/or thesaurus for synonyms. If you talk to a professional
  linguist you'll spend the rest of your life writing your parser.

  d) The more "correct" your parsing solution is, the more parts
  you'll be able to use when going from concept to sentence, such as
  verb/noun agreement in "<name> is too big to fit in your bag." If
  <name> is "the piano" your text is ok, but if <name> is "the gold
  ingots" you need to change "is" to "are". Other languages have it
  far worse. The more NLP information your app has around, the
  better it can resolve these issues.

Mike Rozak
http://www.mxac.com.au
_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev