[DGD] just out of curiosity

Tue Sep 11 18:27:58 CEST 2012

You probably want to start with one or more of the existing tools for this purpose.  I'm thinking of, say, ZooKeeper.

You'll need to use that *with* one or more game servers, and DGD would probably be fine for that.  But in this day and age if you're looking at building your own Paxos algorithm (or any of several similar ones) you're probably doing it wrong.  If you don't know what Paxos is and you don't know why you'd need it for the large-scale distributed fault-tolerant computations, you should probably do a fair bit of educating yourself before you start.

A lot of this stuff now exists, and it turns out it's a very hard problem, now that we know its approximate parameters.

Felix's pointer on the CAP theorem is a good one.  You're going to need to figure out where you can compromise, and then probably choose tools on that basis.

________________________________
 From: Ragnar Lonn <prl at gatorhole.se>
To: dgd at dworkin.nl 
Sent: Tuesday, September 11, 2012 7:11 AM
Subject: Re: [DGD] just out of curiosity

On 09/11/2012 03:32 PM, Felix A. Croes wrote:
> Ragnar Lonn <prl at gatorhole.se> wrote:
> 
>> [...]
>> The problem with DGD/Hydra, for this particular application, is that it
>> is not meant to be run in a distributed environment. Any system that is
>> not distributed will not have enough CPU cycles for anything but a small
>> world with few players. You can get away with some sharding maybe, or
>> transferring objects between different state machines, but it will be
>> messy.
> This isn't true anymore, DGD & Hydra now explicitly support outbound
> connections.  The problem of efficiently distributed servers is still
> unsolved, but DGD/Hydra can be part of the solution.

I guess the overall question is: is DGD/Hydra a good starting point for building a massively scalable, distributed state machine, or would it be easier to start with something else, or completely from scratch?

When you mention outbound connections, I guess you mean that state distribution should be done in LPC. Would that be fast enough?

I want:

1. huge scalability. Up to hundreds of thousands of physical nodes where each node supports hundreds of thousands of objects
2. a seamless world, where interaction between objects is always reasonably fast from a user's point of view
3. reliability. A failed physical node will not cause service interruptions (multiple-copy state redundancy). Multiple concurrent failures can at most cause temporary interruptions (physical media state backups/snapshots). Loss of data can happen, but is kept to a minimum and consistency is not compromised.

  /Ragnar

___________________________________________
https://mail.dworkin.nl/mailman/listinfo/dgd