[DGD] Persistent Users

Thu Sep 22 20:00:50 CEST 2016

On Thu, 22 Sep 2016 16:00:35 +0100, Gary wrote

<snip> 

> Thanks for the reply, this is exactly the information I was after.
> 
> Do mappings suffer the same performance concerns with size?

Yes.

> 
> My dilemma at the moment is that if I move users over to a persistent
> object rather than saving password and associated account info out to
> disk, the USERD mapping/array that holds user objects/names turns 
> from a limit on the number of online/link dead users, to a limit on 
> the maximum number of user accounts.

The number of online users is also limited by the number of connections DGD
can handle.

> 
> As I'd prefer to avoid editing the kernel library, that makes my options

I can imagine. For the record, I'm not using the kernel library, I created my
own alternative for it, based in part on bits and pieces I wrote for example
for gurbalib, or in the past for way of the force. Its certainly inspired by
the kernel lib, and does borrow bits and pieces from it.

> 
> (I'm open to additional ideas) increase the array_size gradually to keep
> up with user accounts (assuming user account limits ever becomes a real
> issue) or tweak user.c and add an ACCOUNTD.
> 
> The user object would be created/destroyed as usual (I guess you 
> could think of its purpose now like the session object you mentioned 
> in a prior post), the password and all other account information 
> would be moved over to the ACCOUNTD.

That is essentially what I did, tho account_d in my case is an lpc database.
That 'lpc database' may need some explanation, see below.

> 
> The ACCOUNTD can then make use of a mapping of arrays, or abstract away
> the splitting of storage between multiple objects. That would allow a
> naive accountd implementation of a single mapping or array yet retain
> the option to migrate to separate objects that accountd manages and
> avoid changes to the kernel library as USERD will once more only 
> track online and link dead users.

Sounds like a viable approach to me. You could opt to use an LWO with operator
overloading to simulate indices on a mapping, and use that from within USER_D
to talk to your account daemon. That will let you write most of the code as if
it is just a mapping, and abstract away whatever you do to distribute the data
over multiple mappings or arrays or objects.

> 
> > 
> > Additionally, making very big arrays also makes the object containing
> > that array big. This makes swapping less efficient, so after some
> > point, it really pays to distribute those multiple arrays over
> > multiple objects.
> 
> Just as a check on my understanding of swapping. An object with an array
> of objects, when swapped in, would not cause the objects within its
> array to also be swapped in? iiuc, only attempting to access those
> objects would swap them in (assuming they're swapped out to begin 
> with).

Your understanding is correct for all I know.

> 
> So the size issue would be the size of data the object holds and however
> many bytes it takes to store the ID/reference to each object in the array?

Yes.

But lets look at this some more. Lets say you want to support 1m users. That
would mean you still may end up with 1m object IDs, which will cause an object
of a few mb. Each time this object needs to be swapped in or out, that means
swapping in/out a few mb. Not yet deadly for performance, but don't overdo it. 

> 
> [snip]
> > Splitting up and distributing largish data sets is really the only
> > good solution here, and when data sets get really large, splitting up
> > means more
> > then a nested array or mapping, it means spreading the data over
> > multiple objects.
> >
> 
> How is the control object for your database splitting the items it
> manages amongst objects?

There is an entire subsystem behind this which does this in a generic way, not
just for account information, but basicly for any data in the system that
isn't just 'runtime state'. 

How it exactly works goes a bit beyond what I can describe in mail, but the
basic idea isn't very difficult, and was directly inspired by how some
database engines can split a table over multiple partitions.

An important consideration for me was that I wanted to have a somewhat even
distribution of data over all the data objects involved, without all the
hassle of trees and having to rebalance them.

Anyway, what some database engines do is simply run the key for a table
through some hash function, and derive a numeric index from that. The numeric
index tells you in which partition (object) the data for that key (user name)
resides.

In LPC terms, a very simple implementation fo that idea whould do something
like this:

object storage_cell( string key ) {
   int pos;

   pos = hash_crc16( key) & 0xff ); /* 256 storage cells */
   if( !storage[cell] ) {
      storage[cell] = clone_object( STORAGE_OB );
   }
   return storage[cell];
}

void store_user( mapping data ) {
   int cell;

   cell = storage_cell( data["name"] );
   cell->store( data );
}

As long as you have a name, you can find the object that stores the data for
that name. Additionally, the hash function will cause a somewhat normal
distribution of data over the 'cells'. 

Of course this is an extremely simplified example, there is a lot more behind
this to ensure I don't keep empty storage objects around or generate objects
for non existing data for example.

> 
> For example rather than a mapping of arrays all in a single object, have
> you gone with a mapping of objects with each object then holding an
> array of items?

I actually have a mapping of mappings of objects each of which contains a
mapping with data. This in theory would scale to 2 to the power 35 entries,
but in reality it is a bit less then that because you do not get a perfect
distribution of data over objects (and I do not use crc16, but crc16 makes it
very easy to see how this works)

> 
> > Making things big like that introduces more potential for issues,
> > even when you do not have a malicious guest programmer.
> 
> I had not considered the issue of array reallocation which is kind of
> obvious now you've mentioned it.

Also, consider the following situation. This one happened to me simply because
I didn't think carefully about what I was trying to do.

I was reading a file, splitting it in 8k strings, and filling an array with
those strings. This was a somewhat largish file, and after reading some 2k+
strings, I ended up with an object which could no longer be restored from the
state dump, and could not be restored with save/restore_object (both issues
now fixed). The resulting object had grown to nearly 20mb.

Both the strings and the array are well within the default limits of dgd. 

> 
> [snip]
> 
> > 
> > Regarding making things scalable.. don't overdo it, but also don't
> > count on things always staying small.
> 
> I realistically don't expect this to ever be a problem. There are so
> many hurdles between now and having a running mud, let alone one that
> would have the issue of too many user accounts ;)

That sounds quite reasonable I'm afraid. 

> 
> That said, I'd like to know where the limits are for any decision I make
> and have an idea of alternatives I can switch to should it become an
> issue for myself or another.

Imo a good idea.

But let me add something. 
The lib I'm building right now doesn't actually have a purpose, as in, I'm not
intending to run a game with it, its not even supporting any kind of 'game'
feature. It did however start out with a purpose, to provide a simple lib I
can release together with my I3 router code. 

So.. I started out with the same questions as you are asking, and decided I
needed a better user/account system then I had at that moment, one that did
scale well beyond a few thousand entries. 

I wrote something that could be called an LPC variation on things like bdb, or
gnudb. A simple key:value database that scales beyond the size of a mapping. 

Once I had this, I realized it would be pretty trivial to use it as storage
for a column in a table, so... I revamped the code a little bit, and ended up
with something which very much behaves like a table in a relational database. 

>From there... I added a database layer so I can have multiple tables and
create relations between them... 

What I have now is as Felix called it, a relational database interface running
on top of DGD's object database. 

Through it supports relational data, it can also deal with non relational models.

At any rate, I've been moving nearly everything into that database, and found
it is really nice to have some kind of datastorage which is not restricted by
the size of mappings, arrays, and doesn't cause insanely large objects. It
turns out to be usable for so many things one might want to do on a mud or
talker or other similar environment, from storing history of your comms
channels, tells and all that, to session logs, to descriptions or whatever you
want.

One of the things I have now is a database of some 200k words, with references
to where they appear in a big collection of text files. On top of that I have
a database with almost 700k substrings with references to every word the
system knows of in which those substrings appear.

The result? I can find back every text containing any combination of words and
substrings of words in a few milliseconds, and in case of searching for only a
few words, in less then 1 millisecond.

The point of this?

When you create such an interface for storing account information, it seems
like a good idea to create a very generic interface for storing values for a
huge number of keys. It has uses way beyond the account system.

Bart.
> 
> Regards,
> 
> Gary
> 
> ____________________________________________
> https://mail.dworkin.nl/mailman/listinfo/dgd

--
http://www.flickr.com/photos/mrobjective/
http://www.om-d.org/