[DGD] just out of curiosity

Tue Sep 11 19:45:54 CEST 2012

Thanks, and you're right that I need to read more. So far I've only 
covered one book on distributed systems (and it usually takes me more 
than one read for things to stick), but I have this suspicion that half 
the battle lies in properly identifying the problem and, like you write, 
figuring out which compromises can be made. If you make the right 
compromises regarding reliability, performance, scalability, consistency 
you can vastly simplify implementation. Or that is my belief/hope 
anyway. (belief and hope are one and the same to an entrepreneur).

Just look at Second life - they made some pretty huge compromises when 
they designed their system. They basically have a sharded system where a 
physical server handles a certain amount of geographical terrain. If the 
server goes down, that area in 2nd life just disappears while the server 
is offline. Movable objects (like players) are transferred between 
servers as they move. Maybe this is an acceptable level of availability 
(although their performance is not good enough when objects are migrated 
between servers). Some parts of the world will be down at any one time. 
The actual requirements of an online world might not be so bad, and many 
limiting factors can be worked around through smart content creation 
(i.e. create content that is suitable for the platform. Set rules for 
content that *makes* it suit the platform).

As for me, I'm just trying to figure out a viable path. If and when I 
find it, I'm not going to write the code because the past ten years or 
so I've finally understood that there is an inverse relation between the 
amount of code I check in, and the quality of the code in our software 
repository...  So maybe I don't need to know *exactly* how to do it, 
just have an overall idea and be fairly sure it *can* be done. That's 
why I'm asking you here - some of you have already thought about these 
things for a while, and know much more about distributed systems than I do.

   /Ragnar

On 09/11/2012 06:27 PM, Noah Gibbs wrote:
> You probably want to start with one or more of the existing tools for 
> this purpose.  I'm thinking of, say, ZooKeeper.
>
> You'll need to use that *with* one or more game servers, and DGD would 
> probably be fine for that.  But in this day and age if you're looking 
> at building your own Paxos algorithm (or any of several similar ones) 
> you're probably doing it wrong.  If you don't know what Paxos is and 
> you don't know why you'd need it for the large-scale distributed 
> fault-tolerant computations, you should probably do a fair bit of 
> educating yourself before you start.
>
> A lot of this stuff now exists, and it turns out it's a very hard 
> problem, now that we know its approximate parameters.
>
> Felix's pointer on the CAP theorem is a good one.  You're going to 
> need to figure out where you can compromise, and then probably choose 
> tools on that basis.
>
>
> ________________________________
>   From: Ragnar Lonn <prl at gatorhole.se>
> To: dgd at dworkin.nl
> Sent: Tuesday, September 11, 2012 7:11 AM
> Subject: Re: [DGD] just out of curiosity
>   On 09/11/2012 03:32 PM, Felix A. Croes wrote:
>> Ragnar Lonn <prl at gatorhole.se> wrote:
>>
>>> [...]
>>> The problem with DGD/Hydra, for this particular application, is that it
>>> is not meant to be run in a distributed environment. Any system that is
>>> not distributed will not have enough CPU cycles for anything but a 
>>> small
>>> world with few players. You can get away with some sharding maybe, or
>>> transferring objects between different state machines, but it will be
>>> messy.
>> This isn't true anymore, DGD & Hydra now explicitly support outbound
>> connections.  The problem of efficiently distributed servers is still
>> unsolved, but DGD/Hydra can be part of the solution.
> I guess the overall question is: is DGD/Hydra a good starting point 
> for building a massively scalable, distributed state machine, or would 
> it be easier to start with something else, or completely from scratch?
>
> When you mention outbound connections, I guess you mean that state 
> distribution should be done in LPC. Would that be fast enough?
>
> I want:
>
> 1. huge scalability. Up to hundreds of thousands of physical nodes 
> where each node supports hundreds of thousands of objects
> 2. a seamless world, where interaction between objects is always 
> reasonably fast from a user's point of view
> 3. reliability. A failed physical node will not cause service 
> interruptions (multiple-copy state redundancy). Multiple concurrent 
> failures can at most cause temporary interruptions (physical media 
> state backups/snapshots). Loss of data can happen, but is kept to a 
> minimum and consistency is not compromised.
>
>    /Ragnar
>
>
> ___________________________________________
> https://mail.dworkin.nl/mailman/listinfo/dgd
> ___________________________________________
> https://mail.dworkin.nl/mailman/listinfo/dgd