[MUD-Dev] Server stability (was: Player statistics)

Bruce bruce at puremagic.com
Fri Jan 12 17:42:39 CET 2001


David Kennerly wrote:

> By 1998, Doomvas was remarkably stable.  When I operated Dark Ages,
> I set a personal record of 3 or 4 months operation without a single
> server crash.  I'm guessing this is remarkable.  I don't know what
> other game server engines average.  Players tell me the EQ, UO,
> etc. have a game server crash about once a week.  I'm skeptical of
> many things I'm told.

While not in the realm of 16000 users/server at all, MOO and Cold are
both very stable.

Both have servers that have run for a year or so under usage without
crashing or any other problems.  The Eternal City
(http://www.eternal-city.com/) uses Cold and averages fairly decent
user-loads for a text mud.  We don't have crashes that frequently and
they only happen when the machine itself runs of out of memory.

Did you have any particular techniques that you used to get better
stability?

In my opinion, the most valuable things that can be done are:
 * Test test test!  Everything should have tests in the
   test suite.  This should be run often.
 * Run Purify or some other similar tool.  Run the tests
   under this tool.
 * Have code reviews of all new code in the server itself.
 * Instrument the server and relevant code and keep logs.
   It is tough to know what's going on sometimes unless
   you've got data to let you know what was happening
   internally.  In this line of thinking, we've recently
   added a lot of new debugging facilities to the Cold
   server to help us make sure that TEC only increases
   in stability and performance as it grows with things
   like:
     * Log the ColdC task stacks (backtraces) for
       large memory allocations.
     * Log the ColdC task stacks (backtraces) for
       failed memory allocations.
     * Log data about object cache evictions.
     * Log ColdC task stack that causes an object
       to be marked as dirty. (If we expect an
       object to be read only, but we see that
       it is getting flagged as dirty and therefore
       has to be written back to disk, this can
       take harmful.)
     * You're getting the point now I'm sure.
 * Isolate high level code from lower level code in some
   fashion.  In Cold, everything is pretty much written
   in ColdC, and the server/interpreter/DB is written in
   C.  This means that you're able to allow most anyone
   to write code that won't crash the server within
   certain guidelines (usually involving memory
   consumption), and reduce the amount of code likely
   to have problems, making it easier to test, test,
   test.

I'm sure that I could be doing a better job at keeping things stable
as we do have crashes, so I'd love to hear what other people are doing
to keep (make?) their systems stable.

Or the reverse ... are there people who feel that a server that
crashes once/week is fine, that the customers will still be happy and
life will go on, and so the invested effort (since time is money) is
overkill and doesn't pay off from a business perspective?  Maybe
server stability really doesn't matter that much, so long as data
isn't lost often and the players can be kept happy.

 - Bruce
_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev



More information about the mud-dev-archive mailing list