[DGD] Persistent Users

Thu Sep 22 17:00:35 CEST 2016

On 22/09/16 14:33, bart at wotf.org wrote:
[snip]
> 
> But anyway, there is a bit more to this. Making arrays very large
> makes them very inefficient, and generally it is a good idea to keep
> arrays smaller then 512 items, or at least smaller then 1024 items
> for performance reasons. First of all, modifying arrays
> (adding/removing elements) may result in reallocations, which simply
> become more expensive and difficult when arrays get very large.
> 
> It is much better to look for a solution where you distribute this
> data over multiple arrays when it becomes too big to reasonably fit a
> single array.

Bart,

Thanks for the reply, this is exactly the information I was after.

Do mappings suffer the same performance concerns with size?

My dilemma at the moment is that if I move users over to a persistent
object rather than saving password and associated account info out to
disk, the USERD mapping/array that holds user objects/names turns from a
limit on the number of online/link dead users, to a limit on the maximum
number of user accounts.

As I'd prefer to avoid editing the kernel library, that makes my options
(I'm open to additional ideas) increase the array_size gradually to keep
up with user accounts (assuming user account limits ever becomes a real
issue) or tweak user.c and add an ACCOUNTD.

The user object would be created/destroyed as usual (I guess you could
think of its purpose now like the session object you mentioned in a
prior post), the password and all other account information would be
moved over to the ACCOUNTD.

The ACCOUNTD can then make use of a mapping of arrays, or abstract away
the splitting of storage between multiple objects. That would allow a
naive accountd implementation of a single mapping or array yet retain
the option to migrate to separate objects that accountd manages and
avoid changes to the kernel library as USERD will once more only track
online and link dead users.

> 
> Additionally, making very big arrays also makes the object containing
> that array big. This makes swapping less efficient, so after some
> point, it really pays to distribute those multiple arrays over
> multiple objects.

Just as a check on my understanding of swapping. An object with an array
of objects, when swapped in, would not cause the objects within its
array to also be swapped in? iiuc, only attempting to access those
objects would swap them in (assuming they're swapped out to begin with).

So the size issue would be the size of data the object holds and however
many bytes it takes to store the ID/reference to each object in the array?

[snip]
> Splitting up and distributing largish data sets is really the only
> good solution here, and when data sets get really large, splitting up
> means more
> then a nested array or mapping, it means spreading the data over
> multiple objects.
>

How is the control object for your database splitting the items it
manages amongst objects?

For example rather than a mapping of arrays all in a single object, have
you gone with a mapping of objects with each object then holding an
array of items?

> Making things big like that introduces more potential for issues,
> even when you do not have a malicious guest programmer.

I had not considered the issue of array reallocation which is kind of
obvious now you've mentioned it.

[snip]

> 
> Regarding making things scalable.. don't overdo it, but also don't
> count on things always staying small.

I realistically don't expect this to ever be a problem. There are so
many hurdles between now and having a running mud, let alone one that
would have the issue of too many user accounts ;)

That said, I'd like to know where the limits are for any decision I make
and have an idea of alternatives I can switch to should it become an
issue for myself or another.

Regards,

Gary