[MUD-Dev] Server stability (was: Player statistics)

Koster Koster
Sun Jan 14 07:46:56 CET 2001


> -----Original Message-----
> From: mud-dev-admin at kanga.nu 
> [mailto:mud-dev-admin at kanga.nu]On Behalf Of
> Bruce
> Sent: Sunday, January 14, 2001 12:28 AM
> To: mud-dev at kanga.nu
> Subject: Re: [MUD-Dev] Server stability (was: Player statistics)
 
> Having to restart a server just to add new content or install
> bugfixes seems to a movement in opposition to the text-based MUD
> community.  From my perspective, the move within the text-based
> community has been towards less downtime and systems that can be
> modified or have content changes done at runtime.
 
> Am I leading -that- sheltered an existence by being used to make
> almost any sort of change without needing to restart my server?

Not at all. Nobody said that the commercial mud developers were
cutting edge--yeah, there's areas in which they go further (such as
scale), but also plenty where they do not.

There's a generally fairly complex staged publishing process that
EQ/UO/AC use to get a change to their servers, which is why there's
downtime associated with it. The objectives of the process are: update
changes at predictable times, only after in-house testing and player
testing. Generally speaking, you never want to make changes at runtime
on the live service. In UO's case, at least, you CAN make many changes
at runtime, but some that the tech DOES support (map changes at
runtime, for example) are explicitly disallowed in the production
environment. The followis applies to UO, though EQ andAC are similar,
but details vary.

1. Developer designs new system (code or game, whatever). His work
environment is some form of single-server game environment--more like
a regular mud, with dummy data--subset of the true world data,
usually. This is so that it's small & can run on one machine (or even
several to a single machine)--unlike the actual game, which runs
several machines to one game instance. This small server probably has
numerous abilities to update at runtime, including the map.

2. Developer puts it in the queue to be posted to an internal server
complex. This staging server is meant to be static, therefore most of
the runtime update abilities are disabled. This is so that QA gets
stable versions to test. If errors are found, return to step
1. Depending on the nature of the changes, the full dataset of a
production environment may be mirrored to the test server complex (eg,
copying all houses, players, items, etc, to the QA server).

3. Entire internal test server's code & new game data is mirrored to a
publicly available server complex so that players can take whacks at
it--and players are given a checklist of what changes are going in,
for comment and debate, and for testing purposes. Again, for the sake
of clean testing, changes are not updated piecemeal--if a change is
required, all code & game data are mirrored. If players find errors or
if the debate over upcoming features reveals problems, the whole thing
goes back to step 1.

4. Entire public test server data and code is mirrored to all
production server complexes. Servers are brought down in order by
timezone during trough usage hours for this publish. All code and game
data is published, which includes client-side data patches (which MUST
be received in advance of connecting to the production server, or else
client crashes are likely).  One server is generally brought up in
advance of the others and allowed to run for one hour to catch any
problems with lower incidence (eg, any errors, bugs, exploits, or
simply unexpected insights that occur to five times the userbase). If
there is a failure here, you revert the production server (including
player state, if necessary) as soon as it becomes apparent, go back to
the internal test server to attempt to replicate the problem, then go
back to stage 1.

Now, with how much rigor this is followed, well, that's another
story. :)

The thing is, the philosophy among the big services is that making a
change is an opportunity to bring the whole thing crashing down around
your ears.  So changes are always made in a controlled environment as
much as is possible. Making changes during runtime runs counter to
that philosophy.  There's significant losses from that, yes, in that
some tools which would be ideal for events and the like are disabled.

>> Now, if it doesn't, customers get very unhappy. Especially in the
>> case of something like UO, which like a MOO/MUSH, saves full world
>> state for *everything* in the game. That comes out at (last I
>> recall) 17 million objects per server, or in real time terms, a 3
>> hour boot time to read it off of disk. :)

> Ouch.  Sounds like your DB guys looked at MOO or something. :)

UO keeps everything in memory for speed, and dumps the entire world
state to disk periodically. EQ also keeps everything in memory for
speed, and dumps individual characters on a rolling basis to disk.

> Cold hasn't got this 'problem', as it is a disk-based system, and
> demand load objects from the on-disk database.

The concern is speed. Any perceptible delay in accessing this data is
unacceptable in most cases, and our databases get very large. Everyone
I know in the commercial industry is trying to do this for the next
generation of games, but nobody knows how it'll perform in a live
environment. EQ tried it prior to release, and couldn't get it fast
enough.

> On The Etenal City, our database of objects is around 1.3 gigs,
> while the object index is around the index is 192M.

UO's is... more. :) Lots more.

-Raph 
_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev



More information about the mud-dev-archive mailing list