[MUD-Dev] Naming and Directories?

Tue Mar 16 13:37:55 CET 1999

Mik Clarke writes:
> 
> Mark Gritter wrote:
> > Hans Staerfeldt suggest "search tries" in his response to my original e-mail.
> > The idea is that each node stores a name and 26 pointers to child nodes.  The
> > tree is searched by using successive letters as an index to the array of
> > pointers.  You can search for abbreviations easily in this scheme--- if
> > you run out of letters and the node has only one child, you've found the
> > full name.
> 
> Ummm. Yick. 26 4 byte pointers by, say 16 characters long just to index 100-200 names?
> Even with a sensible implementation you're going to end up wasting 90% of the storage you
> allocate (and a ten letter name means you have to allocate 1K pointers, each different
> name then requires another 900 bytes or so).
> 

There are, of course, more efficient implementations of tries than what
I described above.  :)  For sparse tries, using a hash table or linked list
to store the child pointers might still be a win.  (Aho, Hopcroft, and Ullman's
algorithms book devotes some time to ways of making tries more efficient.)

One big improvement is to not "expand" the tree to the full depth in parts 
where there's only one leaf.

> You might have more luck with a binary tree, which just has a 2 way split (or even a
> binary search of a sorted list).  The Diku search actually does a linear search, but it
> compares first characters only first.  If they match (about 1 in 20) it goes on to
> compare the whole string.
> 

Hmm... this seems unnecessary.  Wouldn't strcmp() quit as soon as there
was a difference in the strings anyway?  You'd only be saving the
function call overhead, which could be eliminated by inlining.

Binary search is certainly going to be good enough in the 100-200 name range.
The nice feature of tries, though, is that search time scales with the length 
of names rather than with the number of names.  (Similar algorithms have been 
proposed--- and I think are currently in use--- for IP routing, where the
number of entries are in the 55-60k prefix range)

If, for example, all 10000+ mobiles in a game had distinct names, there
might be a better case for using a smarter data structure.  OTOH, at large
scales, other methods of scaling (like breaking up into regions) start
making sense, too.  But even in that case, I think more directory service
support is going to be called for.  I'm not trying to explore the benefits
of a specific data structure so much as of making indexes and naming explicit.

Mark Gritter
mgritter at cs.stanford.edu

_______________________________________________
MUD-Dev maillist  -  MUD-Dev at kanga.nu
http://www.kanga.nu/lists/listinfo/mud-dev