[MUD-Dev] NWN gets more MUD-like (again)

Tue Feb 4 13:11:10 CET 2003

On Friday, January 31, 2003, Sean wrote:

> As for some technical background -- the DB system is using a
> parent process to run the game engine in debug mode. They're
> allocating a couple large blocks of memory in script and then
> watching for changes in those blocks. This relies on the fact that
> the engine won't reallocate a memory block unless more space is
> needed to store a larger chunk of data. I had concerns about
> performance degradation but apparently there isn't any to speak
> of.

>From the information in discussion boards on their site, I'd guess
something like the following is happening:

1. The wrapper process starts the server, probably with the flags
   CREATE_SUSPENDED and DEBUG_PROCESS

2. The wrapper process sets up some state information using the
   handles to the server process, and eventually calls ResumeThread,
   probably then calling WaitForInputIdle to allow the server process
   to initialize.

3. The wrapper process must now "discover" the memory segment used by
   the server process for the "large blocks of memory" allocated by
   the script. I can see two ways to do this, all with attendant
   advantages and disadvantages.

     a. The wrapping process waits for special output to be written
        to the server's log file. This method has the advantage of
        using the only supported output method available, but
        potentially somewhat slow and subject to the frequencey of
        stream flushing (although Bioware appears to do a good job
        keeping the writes synchronized). After that the wrapping
        process suspends the server process (thread-at-a-time, or by
        calling DebugBreakProcess to cause the process to report a
        break-event). The wrapping process then scans (using
        ReadProcessMemory)for a special byte pattern to establish the
        location of the memory segments, this is best done page-at-a-
        time (this is only done once, or as an error recovery path if
        the segments are re-allocated, more later). After finding the
        memory segment, the wrapping proces calls ContinueDebugEvent,
        or otherwise un-suspends all threads in the server process,
        and maintains the memory offset.

     b. (most likely--rather, what I would try) The wrapping process
        calls WaitForDebugEvent, dispatching the events generated by
        the server process internally. The script, after allocating,
        and writing the memory, does something to raise a continuable
        exception (integeroverflow perhaps?). Mind you, this can be
        an exception _handled_ as an exception by the nwn scripting
        engine, provided that it's handled by a catch-type mechanism.
        No data on that at the moment, if the script interpreter
        catches the attempt to cause an exception prior to doing so,
        this will not work. If it does work, an exception is raised
        as a debug event, tripping the wrapping process. When the
        system notifies the debugger of a debugging event, it also
        suspends all threads in the affected process. The wrapping
        process then scans for a special byte pattern to establish
        the location of the memory segments (this is only done once,
        or as an error recovery path if the segments are
        re-allocated, more later). After finding the memory segment,
        thewrapping proces calls ContinueDebugEvent, un-suspending
        all threads in the server process, and maintaining the memory
        offset.

4. Execution continues, when the server process desires to perform a
   database query, the query is written, with some header/footer
   special-byte-sequence patterns to the memory allocated by the
   script in 3. Then, according to the synchronization mechanism
   employed (3.a or 3.b), the server process signals the wrapper
   process. The wrapper process suspends the server process, reads
   the request (making sure it is prefixed/suffixed with the byte
   sequences, if it is not, an exhaustive memory scan could be
   undertaken as an error recovery path, this would indicate that the
   memory reserved has been reallocated, error detection would be
   simpler/less time consuming than error correction, and an
   appropriate path would be resumption of the server process,
   followed by an orderly shutdown/logging/restart). After the
   request is successfully read, the wrapping process services the
   request against the database, and places the results into the
   server process's memory space using WriteProcessMemory. The server
   process is then resumed using ContinueDebugEvent.

5. Loop at 4.

**Discussion :

I hope that 3.b is a useable mechanism as it induces a lower
constant overhead. Almost certainly, the wrapping process is using
the operating system to suspend all threads in the server process as
a side effect of a debug event being dispatched. Since that overhead
is already present, the additional overhead of the wrapping process
having to _cause_ the debug event in the server process (via a call
to DebugBreakProcess as in 3.a), versus having the already-executing
scripting engine cause the debug event (via some continuable
exception raising mechanism like integer overflow--assuming, and
making possibly an ass out of myself--that integer overflow is a
continuable exception).

However, the mechanism likely used to suspend the server has the
following easily identified drawback: All threads in the server are
suspended. This _does_ guarantee that no code will execute while the
wrapping process handles the database query; however, it may be
overkill. Instead, it should be possible to use method 3.b above,
retrieving a handle to _only_the_thread_that_raised_the_debug_event.
This thread is _almost_certainly_ the only thread executing within
the script interpreter (aside from my vast doubts that Bioware wrote
a multithread-capable scripting language, note the empirical
evidence, when a script is taking _far_ too long to execute the
entire server will eventually feel the lag). With that thread
handle, immediately increment the "script interpetter thread"'s
suspend count by calling SuspendThread, then, immediately call
ContinueDebugEvent, the remaining application threads should be
released, and the "script interpretter thread" will remain suspended
(still has a suspension count > 0). The wrapping process should then
service the database request with all possible speed, and upon
completion, call ResumeThread to decrement the "script interpretter
thread" suspension count. The script interpretter should continue
execution.

Anticipated Questions:

  Q: Won't the entire server eventually block while the "script
  interpretter thread" is suspended, even if you release the other
  threads?

    A: Yes, but hopefully not _immediately_ as would be the case if
    the other threads weren't released.

  Q: Why not just record the "script interpretter thread"'s thread
  ID and use it continuously, along with method 3.a instead of
  relying on the silly exception throwing mechanism?

    A1: Method 3.a (scanning the logfile for synchronization) may
    not be fast enough unless a wait of some variety is introduced
    in the script, i.e. the synchronization may fail, this would be
    bad.

    A2: Bioware may (unlikely, but may) be using a thread-pool, or
    other mechanism to dispatch script execution, so the thread id
    may change during runtime, when using the exception mechanism,
    the thread raising the exception/debug event is recorded by the
    system immediately prior to dispatch, no error is possible.

  Q: Is it really necessary to argue about half-seconds here in
  "constant" overhead for each database hit?

    A: Absolutely, if every persistence call in every script is
    going to cause every thread to block, even for a few seconds,
    the results versus latency and throughput could _easily_ cause
    packets to back up, and eventually be lost if they can't be
    serviced fast enough when the server is resumed from
    suspension. Lost packets cause some, if not all, the visible
    signs of lag, as well as probably being behind many of the
    annoying portal bugs some PW's have experienced. This is a good
    argument for allowing the other threads to resume until they
    internally require suspension (as though a script "were just
    very complex, and taking a long time to execute). Additionally,
    since module developers (driven by players) will become addicted
    to persistence in design/implementation, the number of hits that
    all incur "constant" overhead will be driven in something like
    0(n*s*C) where n is the number of players, and s is some measure
    of the world size/number of scripts using persistence, and C is
    the per-database-hit-constant overhead. This would quickly limit
    scalability.

  Q: Isn't scalability a bit of a argumentative cop-out?

    A: Maybe, but a few hours of programmer-time investigating a
    potentially better solution is likely cheaper than more capable
    hardware, and from both personal, and second-hand experience on
    the NWN PW using/developing this solution, it appears that
    limiting effects on their hardware with the current design, and
    the module currently running there (which does not appear to be
    hugely--yes, i know, entirely subjective and relative
    term--using persistence) occur with on the order of 30-50
    players.

  Q: Isn't 30-50 players a huge world?

    A: (shameless plug) Not at all, drop by #Landsoflore on
    irc.landsoflore.org, or http://www.landsoflore.org to find a
    world utilizing neither of the aforementioned forms of
    persistence, and averaging near 50 online players (minimax
    probably 25+/75+).

Thanks for reading this, I know it's annoyingly long.

-Dave

_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev