[DGD] DGD/MP & OpenMosix revisited?

Tue Feb 10 00:06:14 CET 2004

On Mon, Feb 09, 2004 at 12:21:39PM -0800, Noah Gibbs wrote:
>   It's not always true, either.  While Greg makes the (valid) point
> that the company will have to pay shipping and replacement on an RMA'd
> part, as you point out, the parts would have to be several times as
> reliable as the equivalent for a server.  That's true when you buy a
> "clustering solution", which is absolutely *not* a fraction of the
> price of a big server.

I would disagree.  A big selling point of Linux clusters in comparison
to traditional "big iron" has been the price.  I'm pretty sure we won't
be able to get prices out of IBM, Sun or HP over the phone, but my money
is still on a clustering solution being a fraction of the cost.

> Essentially, if you're willing to pay for the
> company's clustering software and carefully-selected hardware, you'll
> get higher reliability.  That usually involves software failover,
> though.  For that matter, large servers tend to be the same way these
> days -- you need software failover because large servers tend to
> integrate redundant parts to prevent catastrophic failure if, say, the
> power supply goes bad.
> 
> > > > Then there is the software maintenance cost.  All
> > > > of these machines
> > > > have an operating system which must be installed and
> > > > kept up to date.
> > > > Any OS problem is potentially multiplied by the number
> > > > of machines.
> > > > The hardware could perhaps be managed by a single
> > > > person, the software can not.
> > >
> > > Sorry, but I have to disagree with this entirely.  In a
> > > cluster it is
> > > customary to manage the operating system for the nodes as
> > > images.
> 
>   Have you checked into the cost of good-quality professional software
> to do this for you?  You're welcome to say "use open source", but the
> OSS solutions are simply not equal (yet?).

Yes, I have checked the cost :).  You could use OSS software, but its
about a generation behind.  You'll still get cloning, but it may happen
in serial rather than via multicast.  It probably won't have a very
polished GUI management suite associated with it either.

> > That's installation and and upgrading taken care of.  I have my
> > doubts about other maintenance, but I'll just cede the point.
> 
>   You're correct that it's a nontrivial undertaking, for the record. 
> Clustering is a great idea, and for simple highly-parallel problems
> it's a really good idea.  However, it works best for the same things
> that a SETI at home solution excels at, and for the same reasons.  As you
> say, problems that are not highly interconnected.

However, in this case the cluster is running a single application.  As
I stated before, there should only be two types of nodes.  Your maintenance
should be occurring on these two images, not on individual nodes.  Or, you
should perform the maintenance on a single node from that group and use it
to update the other nodes.

> > > Designing a distributed MUD is certainly a challenge, at
> > > least we agree
> > > on this one :).  However, this is where a transparent
> > > process migration
> > > technology like OpenMosix of bproc can help simplify things.
> > 
> > This point I won't cede.  Some things just don't scale well using
> > clusters, and in my opinion that includes MUDs.
> 
>   Yup.  I have to agree with Felix here.  Let's assume that
> geographical boundaries and things like that are already taken care of
> (a highly nontrivial thing).  You still have certain features that are
> very difficult.  Nonlocal effects (a switch turns on a light halfway
> across the MUD) have a necessary propagation time since you've got
> network latency.  Certain events which need to happen simultaneously
> don't necessarily happen that way -- for instance, if you have two
> gates and no more than one is open at any given time...  Well, more
> than one is open at certain given times, generally speaking.  It's very
> hard to enforce certain constraints when there's a delay there.

And in an MP situation you have bus latency.  Yes, thats quite a lot less
than network latency, but you'll find that gap is closing.

What I'm getting at is that each situation carries similar problems and
I'm wondering why the clustering situation is seen as insurmountably
harder than the MP situation.

Just think of component failure.  A CPU in a cluster node can fail, so
can a CPU in a massive MP system.  Even with hot-swappable CPUs, the
data on the CPU just went west in either case.  So why is the MP case
any easier to deal with?

>   If you run your MUD over standard protocols like TCP/IP, the delay
> will occasionally, and unpredictably, be *very* large, ten seconds or
> more.  That may not seem like much, but when you're going for
> "instantaneous", ten seconds can be an eternity.  What if your CD
> player suddenly took that long to start playing a song, or respond to
> your buttonpress in any way?

If the delay between nodes in a cluster is ten seconds then either the
cluster is poorly architected or the network is saturated.  This should
never happen in a properly setup cluster.

>   Yes, there are non-TCP protocols.  You could retool your MUD to use
> them (speaking of "nontrivial").  But your builders would also have to
> know about these constraints.  In practice, that means *lots* of subtle
> bugs.

No, your builders shouldn't need to know this at all.  I doubt that they
can or should care that you're using TCP/IP at the moment.

-- 
Greg Lewis                          Email   : glewis at eyesbeyond.com
Eyes Beyond                         Web     : http://www.eyesbeyond.com
Information Technology              FreeBSD : glewis at FreeBSD.org

_________________________________________________________________
List config page:  http://list.imaginary.com/mailman/listinfo/dgd