[MUD-Dev] Server stability (was: Player statistics)

Sun Jan 14 17:12:22 CET 2001

On Sun, 14 Jan 2001 12:35:05 -0800 
Par Winzell <zell at skotos.net> wrote:

> Koster, Raph writes:
>> From: Bruce 

>>> Cold hasn't got this 'problem', as it is a disk-based system,
>>> and demand load objects from the on-disk database.
>> 
>> The concern is speed. Any perceptible delay in accessing this
>> data is unacceptable in most cases, and our databases get very
>> large. Everyone I know in the commercial industry is trying to do
>> this for the next generation of games, but nobody knows how it'll
>> perform in a live environment. EQ tried it prior to release, and
>> couldn't get it fast enough.

> I assume you're saying that everyone you know in the commercial
> industry is trying to switch to a system where objects can reside
> on disk?

This is a standard concern in database design, be it in-memory, on
disk, or network accessed.  Even with in-memory systems (given
enough system RAM to store the entire uimage), hwving it in RAM is
not everything.  Item size, CPU cache line size, UP vs SMP, rate of
cache flushes and cache flush avoidance, data locality, data
processing locality, data migration across clusters, etc all become
operative concerns.

Summary: This is not a trivial problem where "on disk" or "in RAM"
are compleat solutions.  Especially as cache management (data and
CPU) can have signifi8cant effects in both cases.

The base metric is "average latency".  Typically calculations (and
resulting optimisations) are done not only for the general cases,
but for specific object types, and known for common object access
patterns as its not uncommon for index access patterns, index
locking to be a critical contention point, especially for systems
with comparatively small CPU-localcaches.

> That said, however, 'disk-based' is a misnomer; in actuality this
> system is obviously a superset of a purely RAM-based one -- just
> set the swap rate infinitely low and everything stays in memory.

The base concerns:

  Working set size, defined as the total data set, code and data,
  that is being processed by the system at a given point in time.
  This is an averaged value for a known system loading.

  Working set migration rate: The rate at which data moves into the
  working set and other data is flushed or removed from the current
  working set.  Again this is an averaged value for known loadings.

  Working set re-use rates.  The rate at which identified data sets
  of various sizes move in and out of the working set across various
  averaged times, for known system loads.  This value is critical to
  proper cache selection and tuning.

Next you need to graph all the above and your data set and map
access rates to it both as a sum total, and at the unit level as
curves across not different system load patterns, but transaction
rates (that sounds funny, but you'd be surprised how variant
transaction rates can warp the curves and reveal previously unknown
sweet/horror spots).

Expect to end up with a *LOT* of raw data if you do it right.  Also
epxect to spend a fair bit of time on analysis and tuning as a
result.  

> This perceptible access delay, is it the overhead of checking if
> an object is in-memory or needs to be swapped it? Is it the time
> consumed by actually swapping in long-unused objects? By swapping
> them out, perhaps? Do you know?

This is where the graphs above of unit level access access patterns
get intersting.  You can often get this data with a decent profiler
(important as that gives you call tree expense for specific
accesses, and you can then map the delta expenses of initial access
versus subsequent access, versus cache flush against your various
curves).

It gets hairy.  On smaller systems (which Skotos with its generally
short-story/vignette oriented approach would seem to fit) its very
tempting to cut corners and work off the fact that you "know" the
system, and that's often valid for such smaller systems as you don't
get many manifestations of it being a complex system.  On larger
systesm, in which class I'd probably put the UO's and EQ's of the
world, and most definitely the new heavy world servers under
development now, you are going to be into complex system behaviours
that are just not penetrable by examination.  You have to model
observed behaviour to know you're getting it right (and you'll
usually/always find you're not (quite)).

<<Why do I feel like a crotchety old S/390 engineer waving my stack
of punched cards at them young rapscallion microcomputer punks and
their toys?>>

> We're just reaching a 200-meg database on our Castle of Marrach so
> it is much too early to make any kind of comparisons against the
> large graphical games. 

What is your working set size etc?

> That kind of comparison is probably of minor interest anyway; I
> expect a larger subset of objects need to be swapped in for a
> horizon-revealing game like EQ anyway?

This really depends on object density within the working set of a
given player (eg 5,000 fiddly objects in a well decored room, or
5,000 objects in an army marching toward a castle), player locality
(and therefore sharing of working sets among players), average
object size, access rates etc etc etc.  Once you get beyond the hand
waving level is where it gets hairy/interesting.

> I expect in a year we'll have 3-4 gig databases... we'll see how
> that goes.

You're being saved by CPU capabilities, cache sizes, and system RAM
levels.  Not that this is a Bad Thing, far from it.  It makes your
jobs easier and faster, which worth a considerable amount.  Pity
more the poor folks attempting to make bleeding edge games which
will match the top of the curve N years from now.  Its a scary
crack-pipe passing question as to how hard to gamble, or not.

Aside: Typical consumer PC's these days seem to ship with 128Meg
RAM.  Under a year ago that average was closer to 64meg RAM.  Even
now 128Meg is looking tired except that there's little that home
computer users use that demands more.  The box I'm typing on now at
home has 512Meg RAM, as does Kanga.Nu and most of the other systems
about here (such as alice.kanga.nu you'll hopefully all be hearing
more about soon (would be now except I ran out of bloody IP
addresses)).  Why?  That's where the knee in system RAM expense is
-- four 128Meg SDRAMs offer the best price/size point, and 4 slot
MB's are common.

I wouldn't bet on the knee being in the same place 18months from
now.  I also wouldn't be surprised if home user loads were suddenly
found that required more than 120 Meg for reasonable performance
(part of the problem of a disposable econmoy).

--
J C Lawrence                                       claw at kanga.nu
---------(*)                          http://www.kanga.nu/~claw/
--=| A man is as sane as he is dangerous to his environment |=--
_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev