[MUD-Dev] [TECH] Distributed MUD
J C Lawrence
claw at 2wire.com
Wed Apr 25 17:39:49 CEST 2001
On Wed, 25 Apr 2001 00:50:58 -0600
Kwon Ekstrom <justice at softhome.net> wrote:
> There hasn't been much posted here about distributing by role.
Sort of. That's one corner of what I've been referring to as
predictive segmentation and distribution.
> What little there has been was generally on a side note.
Yup.
> IMHO, to distribute by role, you would need at least 3 basic types
> of server. A front end user handler, backend database, and
> backend handler.
Yes and no. At the functional level the basic block diagram
certainly seems to look something like:
[OutOfBandServer]<=============++
^| ||
|| ||
|v |v
===>(PlayerTraffic)<===>[ConnectionServer]<===>[Authentication/Billing]
^| ^|
|| ||
|v |v
[Command/EventProcessor]<===>[BackingStore]
^| ^|
|| ||
|v |v
[AutomationSystem] [ReplicationServers]
One of the interesting bits of this is that the distribution model
doesn't need to follow the block diagram. While there's a certain
architectural simplicity in making the backing store a discrete
system, it encourages a significant IPC overhead and various sorts
of cache contention problems. One of the more interesting
approaches to me is distributing the Command/EventProcessor boxes
across N systems, and having the backing store be a distributed
autonomous database across the same set of systems, where the
cluster as a whole manages cache locality against a shared-write
physical store (eg an ObjectStore ala Panasas).
What I've been thinking about this morning is a method for
dynamically distributing load across an arbitrary cluster of
processor boxes. Doing it "properly" is expensive and complex. My
interest is in whether a moderately cheap approximation can do well
enough for the interesting cases.
Consider the above diagram with the following change:
[OutOfBandServer]<=============++
^| ||
|| ||
|v |v
===>(PlayerTraffic)<===>[ConnectionServer]<===>[Authentication/Billing]
^| ^|
|| ||
|| ||
|v ||
[AutomationSystem]<===>[Distributor] ||
^| ||
|| ||
|| ||
|v |v
[Command/EventProcessor]<===>[BackingStore]
^|
||
|v
[ReplicationServers]
What happens is:
Commands come in off the Connection server. Automation "commands"
come in off the Automation system. Both go through a clearing
house system I'll call the "Distributor". Each item that is sent
to the Bistributor has a concept of an "Owner" (likely the
ObjectID that "owns" the command.).
The Distributor farms the resulting "processes" out to machines in
the Command/EventProcessor cluster in whatever manner it sees fit.
Unpon compleation of a task the CEP nodes report back to the
Distributor the list of modified objects, and the list of
referenced objects in the given transaction. In this manner the
Distributor can maintain an ongoing definition of the current
working set for each "owner" that it is processing items for.
The Distributor can now build a graph of the owners against the
working sets, with modified working set objects given priority
over merely referenced objects. Relatively simple graph traversal
(think about it) will allow "blobs" in the graph to be located,
and crude approximation of segmentation of great pool of working
sets into N pools, where N is the number of nodes in the CEP
cluster, along lines of minimum intersection.
Getting perfect graph segmentation appears hard at first blush.
The same first blush suggests that a simple walk across the
aggregate space spotting valleys (areas of low intersection)
should be comparitively cheap, and simple raising of the
threshhold in the case of the space between two valleys being
too large (very large highly intersecting collection of working
sets) will likely give a Good Enough approximation of reasonable
division lines.
Perhaps.
The idea is to not get involved in the complexity of doing a full
CC:NUMA architecture, es[ecially in regards to proper cache
management and cache line exchange, and instead to do a Merely Good
Enough stab at at second order approximation of cache management to
dynamically build a Probably Good Enough dynamic cluster allocation
system.
> I'm not sure how you'd setup the database, but it should be
> written specifically to handle the objects being used by the
> server. The problem comes with synchronizing data, and managing
> bandwidth.
Bandwidth is the minor problem -- 10Gig-E is already available.
Cache coherency and lock contention/synchronisation are what scare
me.
--
J C Lawrence claw at kanga.nu
---------(*) http://www.kanga.nu/~claw/
--=| A man is as sane as he is dangerous to his environment |=--
_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev
More information about the mud-dev-archive
mailing list