[DGD] DGD/MP & OpenMosix revisited?
Felix A. Croes
felix at dworkin.nl
Mon Feb 9 20:54:23 CET 2004
Greg Lewis <glewis at eyesbeyond.com> wrote:
>[...]
> Your argument regarding maintenance assumes low quality generic
> brand parts. You'll find that most decent cluster manufacturers will
> use good quality components. MTBF is important not only to the
> customer buying the cluster but also to the company selling it (which
> will have to pay for shipping and replacement parts when the customer
> RMAs a faulty node). I'd argue that a good quality cluster from a
> reputable company is now seeing similar MTBF numbers to similar
> size large machines, at a fraction of the cost.
I was assuming high quality PC-level components. I had no idea that
they were as reliable as you say. I take it that's mean time between
failures for the cluster as a whole, not for each machine individually --
that makes each component in a cluster machine several times as reliable
as each component in a large machine, assuming that the latter has
fewer components overall. That's impressive.
> > Then there is the software maintenance cost. All of these machines
> > have an operating system which must be installed and kept up to date.
> > Any OS problem is potentially multiplied by the number of machines.
> > The hardware could perhaps be managed by a single person, the software
> > can not.
>
> Sorry, but I have to disagree with this entirely. In a cluster it is
> customary to manage the operating system for the nodes as images. In
> MUD cluster I'd expect there to be two production images, one for the
> standard compute nodes which are running the game and one for some
> I/O nodes which are managing state dumps/database transactions. You
> do _not_ manage the OS for each individual node unless you're a masochist.
>
> State of the art is to clone these images onto the relevant nodes over
> a management network (avoiding congestion on the main interconnects)
> using multicast. The nodes themselves have network bootable NICs using
> PXE or have something like etherboot flashed onto the NIC ROMs.
That's installation and and upgrading taken care of. I have my
doubts about other maintenance, but I'll just cede the point.
> > Add to this the cost of designing your MUD to be distributed (which is
> > not trivial), the cost to the MUD itself of being distributed (some
> > features that you'd really like to have are just not going to be
> > possible), the cost of making sure that the MUD software keeps working
> > as intended in this distributed environment, the cost of synchronizing
> > MUD software updates across all machines, and probably quite a few more
> > things that I didn't think of -- at this point, you are paying a lot of
> > people to manage your overhead of cheap hardware on a fast network.
>
> Designing a distributed MUD is certainly a challenge, at least we agree
> on this one :). However, this is where a transparent process migration
> technology like OpenMosix of bproc can help simplify things.
This point I won't cede. Some things just don't scale well using
clusters, and in my opinion that includes MUDs.
I hope to take this out of the realm of argument, and into the
realm of demonstration fairly soon. Should DGD/MP turn out to be
a miserable failure, feel free to call me a fool. :)
Regards,
Dworkin
_________________________________________________________________
List config page: http://list.imaginary.com/mailman/listinfo/dgd
More information about the DGD
mailing list