[MUD-Dev] Complexities of MMOG Servers WAS Retention Without Addiction

Fri Dec 13 12:09:23 CET 2002

on Thu, Dec 12, 2002 at 01:50:36AM -0500, Amanda Walker wrote:
> On 12/11/02 8:57 PM, bradley newton haug <brad at faithanddisease.com> wrote:

>> While not disagreeing that idiots will mess anything up, with
>> remoting (com+ and .NET, corba, pickyerpoison) and modern
>> languages it's possible to write apps that run the same way no
>> matter how many machines they are spread over.

> Sure.  This doesn't mean that they will run well when spread over
> many machines, however.

We dumped CORBA (despite the massive advantages of such a well tried
and tested platform): its too slow, by a consderable factor (for our
usages).

>> While this sounds like an oversimplification, it's not, my server
>> can be spread by threads or instanced objects over any number of
>> machines or processors, and it requires about one extra line for
>> every 20 (there's some initial setup but a snap after that).

> You also have to make sure that you don't hold on to locks (or, if
> possible, use locks at all), don't rely on a single global state
> being consistent, watch out for priority inversion, deal with
> redundancy and failover if a machine crashes or has a hardware

Hmm..admittedly you end up describing some of the easiest problems,
as opposed to the really hard ones. Those things you highlight are
mainly difficult in the same way that your first sorting algorithm
is difficult: they require you to take on board new knowledge, but
after that they are, frankly, striaghtforward. Priority inversion is
the only one that sounds remotely interesting - but I've seen
hundreds fo research papers on it; what do you think happens in
every single pre-emptive OS the first time they get a decent
scheduler going?  (hint: certain programs stop working entirely, due
to being starved of CPU by some subtle priority inversion). Its a
really well documented problem! Unless I'm misunderstanding what
you're referring to...

> failure, etc.  Managing concurrency on a large scale is a whole
> different ball game than basic client/server setups, and
> concurrency problems (particularly emergent ones like priority
> inversion) can be very hard to spot by looking at the code.
> There's a reason that everyone's big MMORPG servers roll over and
> die the first few times they throw thousands of clients at them
> simultaneously.

...but you are mainly glossing over what these problems might be,
with statements like "There's a reason that ...(but I'm not going to
tell you what it is, possibly because I don't understand
myself)...".

I think Brad is equally glossing over too many details "Whilst this
sounds like an oversimplification, it's not", well yes it is
actually :).  When you're talking about a topic where all the facts
are buried in the details (at a high level, one could just say:
Well, I can do a server with more than one client, so why not one
with 1million clients using the same approach?), it does no good to
omit the details themselves ;).

And I, at least, find it rather hard to follow the conversation
properly when no-one's giving enough detail to make it clear to me
what facts they are basing their ideas on.

> What's the #1 complaint of any MMP game?  Lag.  What's lag?  Lag
> happens any time you don't provide plausible feedback to the
> player in under 50ms.  What's the biggest cause of lag?  Server
> timing and concurrency problems.  Network congestion is actually
> farther down the list.

The basis for a very good point; but you neglect to give an example
of why 50ms might be difficult to meet. Today's commodity CPUs are
2Ghz+.  Therefore, in 50ms they can execute 100 million
instructions. That's an awful lot of work for an awful lot of
players, before you even get close to having spent 50ms on it!

I'd suggest the reason the lag problem becomes *really* tricky is
due to long critical paths and long pipelines. (Pipeline = several
units/modules /etc of code that each evaluate part of an algorithm,
and pass the partial result on to the next one in the
pipeline. There is absolutely no benefit to putting something
partway through the pipeline - you have to do every step or
none. Critical path = "longest" route through the system, i.e. the
"worst-case scenario").

If you have a 50ms critical path in your system, you can spend 1000
years improving the speed of all the other parts of the system, but
you'll still have a 50ms critical path (potentially; now I'm
oversimplifying?  :). The problem is that you don't get to choose
which algorithms to improve: you *have* to do the critical path;
untilyou reduce that, nothing else matters (unless your critical
path only hppens infrequently.)

In a distributed system, you typically do a lot of pipelining (which
spreads the CPU/etc workload among physical servers), and so you can
end up with rather long critical paths. Because the GrexEngine is a
highly distributed system, we've had to work particularly hard on
critical-path reduction. If you create an MMOG system with little or
no distribution, then by comparison you'll have practically no crit
path problems, and I'd hazard a guess that lag is really not a
problem for you.

But I've not spent time on single-system development (doesn't
scale. at all ;), so I wouldn't relly know. The few single-system
MMOGs that I've seen the source code for (and seen them running
under heavy load) don't seem to have a problem with lag at all, but
YMMV.

>> reach a decent one.  Of course the last statement probably shows
>> me to be incredibly stupid and naïve.  =D

> Overoptimistic, perhaps :-).

> Client/server is one thing.  10K clients / 100 servers is a bit of
> another.

10K clients is not very hard (considerably more work than 0-2000,
but not very hard). 100k client is starting to get really nasty.With
the GrexEngine, our target is closer to 500k -> 1m; (and I'm not
going to be any less vague on numbersuntil we've got solid examples
to prove it :) but then we're using lots of technologies and
patented stuff that doesn't come close to anything so far discussed
on this list, so (whether or not it works :) our approach is
radically different.

We'll be publishing papers on some aspects of our technology next
year with the first two public releases of products. (I'll of course
post them to MUD-DEV too at the time :).

In the meantime, we're looking for some more development partners
right now who would benefit from early access to the technology,
documentation, API's etc.  If that sounds like you, feel free to get
in touch with me on this address. We're looking for commercial and
non-commercial developers, although we're most interested in
original projects (where our technology can make the very hard task
of developing something very new a lot easier) or very large player
projects (where you might spend years working on your game and never
quite manage to get the performance good enough, without our help).

Adam M 

_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev