[MUD-Dev] Spellchecking

Richard Woolcock KaVir at dial.pipex.com
Wed Sep 23 00:43:41 CEST 1998


Although I've not been coding much lately, I've still been thinking over
various concepts I am (at least considering) implementing within my mud.

One of these is something I had thought about a long time ago, but never
really got around to coding - mob speech.  I believe this area has been 
discussed briefly in the past (primarily by Matt Chatterley I believe).

The reason I mention it is because - while thinking about mob speech - I
realised that such a system might well serve to differentiate players and 
mobs, which is something I am trying to avoid.  Mob speech would do this 
in many ways (primarily by the quality of the speech itself), but the most 
immediate giveaway in many situations would be typo's made by players - 
typo's that mobs wouldn't make.

At first I thought about adding the occasional typo in mob speech, but 
quickly discarded that idea.  Instead, I considered spell-checking every 
bit of speech made by players.  If kept purely to the in-character methods 
of communication (rather than the OOC global channel), the impact upon the
processing power of the machine would be minimal.

Thinking further, I realised that - because all of the in-character names
people can choose are stored in lists (male forenames, female forenames
and surnames) - I could add every possible player name to the list of words 
as well.  This would allow correct capitalisation of speech for in-character 
conversations.  By storing the names this way, I could also incorporate them
into the same text-speech recognition code as used for mobs, allowing names
to be recognised as references to people they might know (and my mobs *can*
'know' certain players or mobs), thus:

   say bubba Do you like Richard?
   You say to Bubba 'Do you like Richard?'

>From Bubba's point of view:

   "Do"      : He's asking a question.
   "you"     : He's talking about me.
   "like"    : He's asking my opinion about something.
   "Richard" : He's discussing a male with the forename 'Richard'.

Then, depending on various factors, you might get something like:

   "I hate him."
   "Which Richard?"
   "I don't know anyone called Richard!'

It wouldn't be particularly difficult for me to add all object types
to the list of words as well.  This would allow even more detail to
be added to the system.

Right now I'm not really worrying about the speech - I'm more
interested in throwing together a lightning-fast spellchecker,
which can check through a dictionary of several thousand words.

As far as implementation of the spellchecker is concerned, I am
considering using either a hash-table or a binary sorted tree.  A
while ago I wrote a piece of code to turn a single text-word into
a (most likely) unique 32bit integer - this would give me a good
means to scan through the list of words (it's faster than checking
strings, anyway).  Basically the idea is to take each word in the
string, convert it into a number, then sort through the list of 
words looking for a match.  Once a match is found, I could THEN do
a strcmp to ensure that the words did match completely.

What would people recommend as the fastest sorting method, assuming
roughly 10-30k words?

Does anyone know where I can get such a list, short of copying them
from a dictionary?

Can anyone see any obvious problems with such a system?

KaVir.

...

say heh laf d00d, this mud is KEWL for rp man!!!!
The following words were not recognised: heh, laf, d00d, KEWL, rp.
Please do not over-use exclaimation marks.

:)




More information about the mud-dev-archive mailing list