Fwd: [MUD-Dev] Distributed State Systems

Tue Sep 14 10:22:36 CEST 2004

On Sun, 2004-09-12 at 12:49 +0300, Alex Arnon wrote:

> So are we looking at a cluster with N+1 (or N+M) redundancy here?
> Tough problem for some domains, though possibly not for MUDs.
> Would a system which checkpoints the world state periodically, but
> which has no redundancy whatsoever, be considered viable for a
> large scale MUD/MMO? If so, what is the frequency of checkpointing
> you would desire? Or in other words, how much game time (not
> downtime) would players be willing to lose due to server failure
> before they start leaving: 1 minute, 5, 10, 30 minutes?

> If such a solution is acceptable, it solves some technological
> problems - you could tune your architecture for performance, with
> the addition of checkpointing logic (i.e., occasionally the DB
> server would message the cluster nodes to start sending in their
> changes to objects).

Well, obviously wed like to keep downtime to zero if possible :). To
your question though, it is N + M clusters, with the M arbitrarily
chosen based on server load.  N + 1 isnt really sufficient if for
some reason two nodes fail at the same time.

As far as the DB cache goes, that was my plan :).  I did some
thinking about it, and realized that really the only thing that
needs access to the data is the commands.  Player commands don't
take as much time execute as other portions of the MMO might (such
as updating the world, spawning NPCs, etc.).  Realizing that, as
well as the fact that DB bottleneck could cause a problem. I decided
to go with a three-tier approach.

The first tier is the main entry point into the MMO, as was decided
at the beginning of this expidenture.  The second tier is the N + M
nodes, set to handle connection I/O polling, and I've also decided
to delegate world update mechanims to the second tier nodes.  The DB
node (well call it D for now) sends notice to whatever node N is
reponsible for the current update, N performs the update, and sends
the updated data back to D.  This is where the third tier comes in.
Another set of say, N/x (where x is a number dependent on load
testing I haven't performed just yet), nodes handle commands.  These
are the only nodes which really need to vigorously keep the world
state synchronized.

I realize that the second-level N nodes open another possibility for
I/O bottleneck since they have to communicate with the third-tier,
but this can be alleviating by compacting the data from the player
before sending it to the third-tier.  This requires some knowledge
of commands (since theyre essentially compiled), but lightens
cluster load.  Another way to decrease the bottleneck is to pass a
reference to an outgoing socket which the command node can send
output directly to, although some form of inline compression, like
bz, could help if the command relays through the second-tier.

Right now it looks like a pretty solid system, and I look forward to
running some stress tests on it once its in a state to do so.  I
appreciate everyones input on the matter.

Mike Tindal
_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev