[DGD] DGD/MP & OpenMosix revisited?

Felix A. Croes felix at dworkin.nl
Mon Feb 9 20:54:23 CET 2004


Greg Lewis <glewis at eyesbeyond.com> wrote:

>[...]
> Your argument regarding maintenance assumes low quality generic
> brand parts.  You'll find that most decent cluster manufacturers will
> use good quality components.  MTBF is important not only to the
> customer buying the cluster but also to the company selling it (which
> will have to pay for shipping and replacement parts when the customer
> RMAs a faulty node).  I'd argue that a good quality cluster from a
> reputable company is now seeing similar MTBF numbers to similar
> size large machines, at a fraction of the cost.

I was assuming high quality PC-level components.  I had no idea that
they were as reliable as you say.  I take it that's mean time between
failures for the cluster as a whole, not for each machine individually --
that makes each component in a cluster machine several times as reliable
as each component in a large machine, assuming that the latter has
fewer components overall.  That's impressive.


> > Then there is the software maintenance cost.  All of these machines
> > have an operating system which must be installed and kept up to date.
> > Any OS problem is potentially multiplied by the number of machines.
> > The hardware could perhaps be managed by a single person, the software
> > can not.
>
> Sorry, but I have to disagree with this entirely.  In a cluster it is
> customary to manage the operating system for the nodes as images.  In
> MUD cluster I'd expect there to be two production images, one for the
> standard compute nodes which are running the game and one for some
> I/O nodes which are managing state dumps/database transactions.  You
> do _not_ manage the OS for each individual node unless you're a masochist.
>
> State of the art is to clone these images onto the relevant nodes over
> a management network (avoiding congestion on the main interconnects)
> using multicast.  The nodes themselves have network bootable NICs using
> PXE or have something like etherboot flashed onto the NIC ROMs.

That's installation and and upgrading taken care of.  I have my
doubts about other maintenance, but I'll just cede the point.


> > Add to this the cost of designing your MUD to be distributed (which is
> > not trivial), the cost to the MUD itself of being distributed (some
> > features that you'd really like to have are just not going to be
> > possible), the cost of making sure that the MUD software keeps working
> > as intended in this distributed environment, the cost of synchronizing
> > MUD software updates across all machines, and probably quite a few more
> > things that I didn't think of -- at this point, you are paying a lot of
> > people to manage your overhead of cheap hardware on a fast network.
>
> Designing a distributed MUD is certainly a challenge, at least we agree
> on this one :).  However, this is where a transparent process migration
> technology like OpenMosix of bproc can help simplify things.

This point I won't cede.  Some things just don't scale well using
clusters, and in my opinion that includes MUDs.

I hope to take this out of the realm of argument, and into the
realm of demonstration fairly soon.  Should DGD/MP turn out to be
a miserable failure, feel free to call me a fool. :)

Regards,
Dworkin
_________________________________________________________________
List config page:  http://list.imaginary.com/mailman/listinfo/dgd



More information about the DGD mailing list