[MUD-Dev] NWN gets more MUD-like (again)
Smith
Smith
Tue Feb 4 13:11:10 CET 2003
On Friday, January 31, 2003, Sean wrote:
> As for some technical background -- the DB system is using a
> parent process to run the game engine in debug mode. They're
> allocating a couple large blocks of memory in script and then
> watching for changes in those blocks. This relies on the fact that
> the engine won't reallocate a memory block unless more space is
> needed to store a larger chunk of data. I had concerns about
> performance degradation but apparently there isn't any to speak
> of.
>From the information in discussion boards on their site, I'd guess
something like the following is happening:
1. The wrapper process starts the server, probably with the flags
CREATE_SUSPENDED and DEBUG_PROCESS
2. The wrapper process sets up some state information using the
handles to the server process, and eventually calls ResumeThread,
probably then calling WaitForInputIdle to allow the server process
to initialize.
3. The wrapper process must now "discover" the memory segment used by
the server process for the "large blocks of memory" allocated by
the script. I can see two ways to do this, all with attendant
advantages and disadvantages.
a. The wrapping process waits for special output to be written
to the server's log file. This method has the advantage of
using the only supported output method available, but
potentially somewhat slow and subject to the frequencey of
stream flushing (although Bioware appears to do a good job
keeping the writes synchronized). After that the wrapping
process suspends the server process (thread-at-a-time, or by
calling DebugBreakProcess to cause the process to report a
break-event). The wrapping process then scans (using
ReadProcessMemory)for a special byte pattern to establish the
location of the memory segments, this is best done page-at-a-
time (this is only done once, or as an error recovery path if
the segments are re-allocated, more later). After finding the
memory segment, the wrapping proces calls ContinueDebugEvent,
or otherwise un-suspends all threads in the server process,
and maintains the memory offset.
b. (most likely--rather, what I would try) The wrapping process
calls WaitForDebugEvent, dispatching the events generated by
the server process internally. The script, after allocating,
and writing the memory, does something to raise a continuable
exception (integeroverflow perhaps?). Mind you, this can be
an exception _handled_ as an exception by the nwn scripting
engine, provided that it's handled by a catch-type mechanism.
No data on that at the moment, if the script interpreter
catches the attempt to cause an exception prior to doing so,
this will not work. If it does work, an exception is raised
as a debug event, tripping the wrapping process. When the
system notifies the debugger of a debugging event, it also
suspends all threads in the affected process. The wrapping
process then scans for a special byte pattern to establish
the location of the memory segments (this is only done once,
or as an error recovery path if the segments are
re-allocated, more later). After finding the memory segment,
thewrapping proces calls ContinueDebugEvent, un-suspending
all threads in the server process, and maintaining the memory
offset.
4. Execution continues, when the server process desires to perform a
database query, the query is written, with some header/footer
special-byte-sequence patterns to the memory allocated by the
script in 3. Then, according to the synchronization mechanism
employed (3.a or 3.b), the server process signals the wrapper
process. The wrapper process suspends the server process, reads
the request (making sure it is prefixed/suffixed with the byte
sequences, if it is not, an exhaustive memory scan could be
undertaken as an error recovery path, this would indicate that the
memory reserved has been reallocated, error detection would be
simpler/less time consuming than error correction, and an
appropriate path would be resumption of the server process,
followed by an orderly shutdown/logging/restart). After the
request is successfully read, the wrapping process services the
request against the database, and places the results into the
server process's memory space using WriteProcessMemory. The server
process is then resumed using ContinueDebugEvent.
5. Loop at 4.
**Discussion :
I hope that 3.b is a useable mechanism as it induces a lower
constant overhead. Almost certainly, the wrapping process is using
the operating system to suspend all threads in the server process as
a side effect of a debug event being dispatched. Since that overhead
is already present, the additional overhead of the wrapping process
having to _cause_ the debug event in the server process (via a call
to DebugBreakProcess as in 3.a), versus having the already-executing
scripting engine cause the debug event (via some continuable
exception raising mechanism like integer overflow--assuming, and
making possibly an ass out of myself--that integer overflow is a
continuable exception).
However, the mechanism likely used to suspend the server has the
following easily identified drawback: All threads in the server are
suspended. This _does_ guarantee that no code will execute while the
wrapping process handles the database query; however, it may be
overkill. Instead, it should be possible to use method 3.b above,
retrieving a handle to _only_the_thread_that_raised_the_debug_event.
This thread is _almost_certainly_ the only thread executing within
the script interpreter (aside from my vast doubts that Bioware wrote
a multithread-capable scripting language, note the empirical
evidence, when a script is taking _far_ too long to execute the
entire server will eventually feel the lag). With that thread
handle, immediately increment the "script interpetter thread"'s
suspend count by calling SuspendThread, then, immediately call
ContinueDebugEvent, the remaining application threads should be
released, and the "script interpretter thread" will remain suspended
(still has a suspension count > 0). The wrapping process should then
service the database request with all possible speed, and upon
completion, call ResumeThread to decrement the "script interpretter
thread" suspension count. The script interpretter should continue
execution.
Anticipated Questions:
Q: Won't the entire server eventually block while the "script
interpretter thread" is suspended, even if you release the other
threads?
A: Yes, but hopefully not _immediately_ as would be the case if
the other threads weren't released.
Q: Why not just record the "script interpretter thread"'s thread
ID and use it continuously, along with method 3.a instead of
relying on the silly exception throwing mechanism?
A1: Method 3.a (scanning the logfile for synchronization) may
not be fast enough unless a wait of some variety is introduced
in the script, i.e. the synchronization may fail, this would be
bad.
A2: Bioware may (unlikely, but may) be using a thread-pool, or
other mechanism to dispatch script execution, so the thread id
may change during runtime, when using the exception mechanism,
the thread raising the exception/debug event is recorded by the
system immediately prior to dispatch, no error is possible.
Q: Is it really necessary to argue about half-seconds here in
"constant" overhead for each database hit?
A: Absolutely, if every persistence call in every script is
going to cause every thread to block, even for a few seconds,
the results versus latency and throughput could _easily_ cause
packets to back up, and eventually be lost if they can't be
serviced fast enough when the server is resumed from
suspension. Lost packets cause some, if not all, the visible
signs of lag, as well as probably being behind many of the
annoying portal bugs some PW's have experienced. This is a good
argument for allowing the other threads to resume until they
internally require suspension (as though a script "were just
very complex, and taking a long time to execute). Additionally,
since module developers (driven by players) will become addicted
to persistence in design/implementation, the number of hits that
all incur "constant" overhead will be driven in something like
0(n*s*C) where n is the number of players, and s is some measure
of the world size/number of scripts using persistence, and C is
the per-database-hit-constant overhead. This would quickly limit
scalability.
Q: Isn't scalability a bit of a argumentative cop-out?
A: Maybe, but a few hours of programmer-time investigating a
potentially better solution is likely cheaper than more capable
hardware, and from both personal, and second-hand experience on
the NWN PW using/developing this solution, it appears that
limiting effects on their hardware with the current design, and
the module currently running there (which does not appear to be
hugely--yes, i know, entirely subjective and relative
term--using persistence) occur with on the order of 30-50
players.
Q: Isn't 30-50 players a huge world?
A: (shameless plug) Not at all, drop by #Landsoflore on
irc.landsoflore.org, or http://www.landsoflore.org to find a
world utilizing neither of the aforementioned forms of
persistence, and averaging near 50 online players (minimax
probably 25+/75+).
Thanks for reading this, I know it's annoyingly long.
-Dave
_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev
More information about the mud-dev-archive
mailing list