[MUD-Dev] lurker emerges

James Wilson jwilson at rochester.rr.com
Sun Aug 9 02:06:56 CEST 1998


hello, I've been perusing some of your fascinating archives for a little
while and thought I'd add my two cents (or maybe more than that). I sent a
similar message a day or so ago, not noticing the big bold message in the
list policy that told me I couldn't post yet, so that message is lost in
/dev/null. Duh. Fortunately I have had the opportunity to do a little more
research on what I want to talk about. I have been puttering around with mud
development for years, and keep going around and around on the same
questions. Here are two of them, in a hopefully-not-too-blathery degree of
verbosity.

my first issue is basic server control flow: select() vs. threading vs...?
There are some startling numbers at the thttpd web site (<url:
http://www.acme.com/software/thttpd>) showing the vast difference between
single-threaded, select()-based http servers and servers based on other
models (using one-thread-per-connection or a thread pool). This was a bit of
a shock to me as I am quite enamored of threading (probably more because of
its challenges than because of its necessity, I am forced to admit). I am
curious as to what approaches the list readers feel are tenable, and what
their pros and cons are.

As I see it, not having gone the whole nine and tested it out, using a
single, select()ing thread to do all your stuff would work fine if each
operation is bounded by some small
amount of time. That is, if you spend too much time in processing the
received action,
your responsiveness to pending requests goes down. I'm not sure how or if
this issue is solved in the select()-based http servers, and am looking at
the source code to try to suss it out. How do they deal with a request for a
bigass file? Do all the ripe sockets wait to be select()ed while the bigass
file is sent on its merry way? News at 11.

One compromise which occurs to me is to make a sort of hybrid thread pool,
where all the input comes through one thread and gets thrown at a group of
worker threads that
do the actual processing and send responses directly back to the user. That
way, one
can allow the input thread to go directly back to listening while processing
of possibly indefinite duration goes on in other threads. In a similar way
one could factor out the database magic, and so on. I'm not sure whether
such variations would be any better than the standard thread-pool server.

The second big issue I am interested in getting some expert opinions on is
the mechanics of interaction between the threading system (assuming one is
used), the heap (or database), and the internal logic of the mud, perhaps
including user-scripted behaviors. That is, when one introduces threading
into a persistent database system, a number of issues present themselves.
Garbage collection, for instance, becomes a fascinating problem (see <URL:
http://www.cs.utexas.edu/users/oops/papers.html#allocsrv>, at the UTexas
site which also contains the Texas persistent store), while avoiding race
conditions and deadlocks in code-that-used-to-work-dammit becomes a
hairpuller. On top of those
problems (soluble given a careful programming technique) one must protect
the users
from the horrors of synchronization, for their own sanity and for the
stability of the whole
system. However, removing synchronization from user-scripting implies that
user scripts are either serialized or their correctness is otherwise assured
using some other, more subtle mechanism.

The simplest way of serializing mud processes would be to lock the database
and heap of in-memory mud objects for every transaction. It seems to me this
would reduce the system to one essentially identical to the single-threaded
select() server. The lockless system described in the FAQ could be an
improvement on this. In this system, access to the database is atomic, while
in-memory objects are thread-local and competing modifications get resolved
with a repeat of the modifying event. This pushes the serialization to
different places, namely the point at which the database is read to generate
the local copies and the point at which the database is written and checked
for discrepancies. This is still an improvement because the lockless system
allows concurrent processing once objects have been read in from the
database. (I am still a little fuzzy on some details - how is the collection
of objects to 'clone' determined?  Does the cloning thread save two clones
of the object, one 'as it was read in', and one 'production copy'? If not,
how does it know the difference between the object's state-at-snapshot and
the object's current database state (which might reflect modifications by
other threads)? If a thread grabs an object and runs for a long time, will
it see modifications to that object, or work with the old copy?)

Another option which has occurred to me is to serialize groups of related
objects which have a high probability of being accessed together, such as
geographically proximate locations. Threads could then enjoy unrestricted,
unlocked access to those 'local' objects. Access to non-local objects would
have to be proxied through an event-passing mechanism; that is, in order to
access object O in locale D, one would have to figure out that O lives in D
and send a transaction T to D which could then be executed in a thread with
exclusive access to D.  The result of T could be the transmission of a
followup transaction back to T's sender, and so on. All this overhead could
bring a server to its knees if done improperly, and if non-local operations
are frequent. Moreover objects would have to be disciplined so they don't
hold onto references to non-local objects; a reference to a local object
could then become illegal if that object moved. This seems problematic.
Access to local objects could of course be proxied as well through
transactions on the locale, but I find it doubtful that such a system could
be efficient; you'd need a transaction for every read of every field. And if
one uses transactions for everything, why not simply move to a
single-threaded system that simulates threads using self-regenerating
events?

James






More information about the mud-dev-archive mailing list