[MUD-Dev] Naming and Directories?

Fri Mar 19 01:44:31 CET 1999

Mark Gritter wrote:
> 
> Ola Fosheim =?iso-8859-1?Q?Gr=F8stad?= writes:
> >
> > "Jon A. Lambert" wrote:
> > > But you have changed your requirements above from speed to memory.
> > > No fair. ;)
> >
> > fragmented memory access == slow => memory == speed
> >
> 
> True.  But almost _any_ large-scale random-access data structure is gonna
> have bad memory locality, so in general, the fewer accesses, the better.
> I and a couple of my research group members were talking about this, this
> afternoon.
> 
> Although a trie could easily use 10x or more memory than a binary tree,
> the number of cache lines you're accessing is likely to be much smaller
> during a lookup, and that's generally a big win.  (Assuming you have enough
> memory so you don't have to page, of course.)

Uhm... But a hashtable would only access like... 3-5 cachelines?  (I
don't think trees and tries qualifies as random-access structures
either)  The benefit of the binary tree is that it is efficient to
traverse... So we agree, we need to know what the datastructure is going
to be used for. For instance, if you want to display a sorted "all
players" list then a binary tree (or a balancing variant) would do ok.
*shrug*

> Binary-tree cache behavior could be improved significantly by limiting the 
> key length and adding some space overhead to the node:

> struct node {
>  char key[24];
>  struct node *left, *right;
> };

sizeof(struct node) == 36... Maybe it would be better if one clustered
those variables whose bytes are most likely to be accessed? ;^)

struct node {
   struct node *left, *right;
   char key[ANY_LENGTH];  // will usually only access the first few bytes
   object *obj;           // almost never accessed
}

Of course, if you keep thinking like this then you will never get a MUD
up and running ;) With 1000 names then you would only access about 10
nodes (cachelines) per lookup.. Not too bad?

Another option would be to only store the first 4 bytes in the node, and
revert to the object itself if there is a match (sizeof(node)==16 :).

Sketch:
  struct node {struct node *l,*r; uint32 key; object *obj;}

  object *findobj(char *key,node *n){
     uint32 k; int i=0;
     do{k<<=8;k|=key[i++];}while((i&3)&&(!(k&255)));
     if(i||(!key[4])){  //strlen(key)<=4
        i=4-i;while(i--)k<<=8;
        do{
          if(k<n->key){n=n->l; continue;}
          if(k>n->key){n=n->r; continue;}
          return n->obj;
       }while(n);
       return NULL;
     }
     do{
       	if(k<n->key){n=n->l; continue;}
        if(k>n->key){n=n->r; continue;}
        i=strcmp(key+4,n->obj->key+4);
       	if(i<0){n=n->l; continue;} //is this right??
        if(i>0){n=n->r; continue;}
        return n->obj;
     }while(n);
     return NULL;
  }

Don't complain about readability! :)

Oh yeah, while we are at it, why not limit names to 12 characters and
allow only 30 unique symbols (0 used for padding, 1 used for illegal
characters), then we could bitcompress it and fit the key in 60 bits...
Would be nice for client-server protocols as well...

inline long long convert(char *s){
  long long k=0;int i=0;
  do{
    k<<=5; 
    k|=translationtable[s[i++]];
    if(!(k&0x1e)){
      if(k&1) throw "illegal character";
      i=12-i;
      return k<<((i<<2)+i);
    }
  }while(i<12);
  if(!s[12])throw "string too long";
  return k;
}

object* findobj(char *key,node *n){
  long long k=convert(key);
  for(;;){
    if(k<n->key){n=n->l; if(n)continue; return 0;}
    if(k>n->key){n=n->r; if(n)continue; return 0;}
    return n->obj;
  }
  assert(1);
}

I don't think this would be too slow?  Hmm.. I might use that...
--
Ola Fosheim Groestad,Norway      http://www.stud.ifi.uio.no/~olag/

_______________________________________________
MUD-Dev maillist  -  MUD-Dev at kanga.nu
http://www.kanga.nu/lists/listinfo/mud-dev