[MUD-Dev] C&C and Event Rescheduling

clawrenc at cup.hp.com clawrenc at cup.hp.com
Fri Jul 18 14:59:23 CEST 1997


In <33CFCE6E.41C67EA6 at iname.com>, on 07/18/97 
   at 01:22 PM, Shawn Halpenny <malachai at iname.com> said:

>clawrenc at cup.hp.com wrote:

>> In <33C69FC6.41C67EA6 at iname.com>, on 07/11/97
>>    at 02:07 PM, Shawn Halpenny <malachai at iname.com> said:

...model ellided...

>> This is pretty well identical to my model.  Concerns:
>> 
>>   Most events modify more than one object.  eg Bubba (object X) moves
>> from room A to room B.  Minimally objects A, B, and X will be modified
>> by this event (containment pointers/lists).
>> 
>>   How do you intend to handle the cases of an event requesting object
>> Q just to check the value of a data member (ie no modification), where
>> that data member has then been changed by the time the event commits
>> (see the dragon's dinner example, and look at the door)?

>Yep, a slight change is necessary.  Now (and again, roughly): 
>1.  A client C requests object O from the database. 
>2.  Add C to list of clients using O. 
>3.  Return OC (a client-only copy of O) to C. 
>4.  A client D (which may or may not be the same as C, but is on the
>    using-list for O) returns OC'. 
>5.  If O has not changed since D's request for it (a different client
>    may have committed changes to O while D was processing), OC' is
>    atomically committed to the database, replacing O.  D is removed 
>    from O's using-list.
>6.  If O has changed since D's request for it, OC' is discarded and D
>    receives notification that OC' couldn't commit.
>7.  Clients in the using-list for O are asynchronously notified that
>    O has changed. 

Given that this basic pattern extend to all the objects which comprise
a given event or transaction, this is identical in principle to my
model.

Notes:

  Re: #2.  I call this the "interested parties list".  To me it
communicates a bit better that this is a list of things that care
about the state of the item.  A moot point of course.

  Re: #3.  I do some folding here, largely due to the fact that I
haven't seperated my clients from my DB as thoroughly as you.  For me,
upon a READ request for an object X I return a reference to the
read-only copy of that object that's already in the DB cache.  If a
transaction attempts to modify object X (remember, all it has is a
read-only reference), then I make a writable transaction-specific copy
(ie unique to that specific transaction) for the changes made by that
transaction, and a seperate read-only copy of the original value of
object X.  Then, when the transaction attempts to commit, the original
copy is compared to the current copy, and if they match, the changed
copy is committed.

  There is reason behind all this madness:  It reduces the number of
data copies needed for transactions.  By using this sort of layered
scheme I manage to delay making a copy of the object until I
absolutely need one (ie someone has attempted to modify the object). 
I find that the majority of objects referenced by MUD events are
actually only read, and not modified (all the flag, can-I-do-this, and
state checks).  Comparitively only a fraction of the total number of
objects referenced by a typical events are actually modified by the
event.  

  Re: #5.  How do you determine that the object has changed during the
execution of the event?  I do this by always keeping an original copy
of the object to compare to the copy current in the DB and then do a
bit-wise comparison of the two objects.  The original copy is the
second copy made above for a modification.

  Aside: I'm actually getting *slightly* neater than this as the
current plan encludes only making a copy of the object
attribute/member which is changed, and then only comparing that at
C&C.

  Re: #5 & #6. I call this stage the "Compare and Commit", or C&C or
short.  (I think I owe this tag to AlexO here on the list)

  Re: #7.  Similarly I post messages (flags really) to all the members
of the interested parties lists for all the objects that a
successfully committed transaction modified.  The effect of those
flags is that the next time those events traverse a block boundary
(essentially any piece of code wrapped in {braces}, such as entering
or leaving an IF, or WHILE etc), the flag gets checked and if TRUE,
the event aborts to reschedule.

Concern:

  An event generated IO.  How do you handle when it reschedules? 
Think about things like SAY, TELL, LOOK, etc for example cases.

>This with the intent that all clients (usually events, in this case)
>will reschedule upon notification that their changes couldn't commit.

Ack.

>Step 7 is not entirely necessary, since after any change to O, all
>clients using O will (when they terminate their processing and return
>their OC' 's for commit) receive notification that what they did is
>no longer applicable.  Step 7 just lets them know about it sooner.

The advantage of such early warning is that it allows a doomed event
to recycle and possibly compleat sooner than if it had had to attempt
to compleat, fail C&C, and reschedule.  It adds a certain level of
responsiveness to the system.

>>   Do you have a concept of a "transaction" where a transaction is
>> effectively synonymous with an event, and involves the possible
>> requesting of read-only copies of N objects, and the possible
>> modification of M objects where M is a subset of N?  What I'm looking
>> for here is the idea that a transaction or event can only commit if
>> all of its component objects resolve correctly to their commit state
>> (ie everything compares right).

>This is now the case:  if one of those M objects has changed since an
>event's beginning, reschedule that event since the data it has
>already made a decision on may have changed (i.e. the door is shut
>now so the dragon, who was moving through it, now has to reevaluate
>its movement).

Bingo.

>I think that fits the above criteria.

Yep.

>>   What about the case of an object with two attributes, say object A,
>> with attributes A::X, and A::Y.  Two events, Q and R, then occur.  Q
>> modifies the value of A::X.  R modifies the value of A::Y.  Both
>> events run simultaneously and commit in arbitrary order.  Should one
>> event fail and be rescheduled?

>Yes.  The attributes that are read/written in A from Q and R are not
>monitored by the DB, since the DB has no way of knowing what
>attributes an event is going to access next.

I'm partially thru splitting this as mentioned above so that the C&C
only checks the specific object attributes and members which were
referenced by the event in question.

  Reason:  Unlike most standard (ie non-game real world) DB
environments, MUD worlds tend to have many events all contending for a
small number of objects while the rest of the DB/world remains
unreferenced.  This sort of C&C DB model is not optimised for this
case.  

Consider the pessimal case of 50 players all standing in a single room
A.  All 50 decide to move west into the next room, B.  Of a sudden,
the objects for room A, room B, and the exits between A and B (if you
have exits as objects), are all under heavy contention.  Ever event is
attempting to modify all the objects, and each one needs a clean cut
at all of them to C&C.  The result is that almost all the events are
going to end up failing C&C at least once (one will make it thru the
door and the rest will fail as they note that the current room, and
target room are now changed as they have different contents lists),
and rescheduling various numbers of times until the finally make it
thru C&C.  The result is that the events get done in slow motion as
they start, fail/reschedule/fail/reschedule/fail...and finally
compleat.

Okay, 50 players in a room all moving west is unlikely.  How about 50
players in or near a room, some moving out the various doors, some
coming back into the room having left, others picking up or dropping
various objects (changes to the contents lists), Every single event is
now compeating for the chance to get a single unmodified copy of that
room when it C&C's.  

Now that I've cast doom and gloom.  This need not be a huge problem. 
The fix is to change the manner in which you write
events/transactions.  The requirement is to split events into the
smallest, fastest executing,  logically consistant units you can.  The
idea is that by making the runtime of an individual event as short oas
possible the chance of collision by other events is minimised. 
Further, by making the runtime short, the probability is that what a
given event sequence contends with on its first step will not be
contended for on its next step.  eg:

  Going back to the case of the two rooms A and B, and the 50 players
moving between them.

  If you can, split your room/motion model so that moving between
rooms requires two events:

  #1 Move out of current room.

  #2 After sucessfully doing #1, move into new room.

The question of course is "where" the player is having compleated #1,
and processing on #2.  This is also not a wonderful example to work
this area with.  The TELL verb gives a good case:

  Worst possible TELL implementation:  

    The TELL verb attempts to put IO to every single player object in
the game.  It's almost never going to work.  At least one of the
players in the entire list of players on the game is going to have
changed during the execution of the verb, the verb will fail, and it
will have to reschedule and try again. and fail, retry etc.

  Possibly decent TELL implementation:

    The TELL verb sends a message to a root "TELL" object with the IO
for the TELL.

    The root "TELL" object spawns events, perhaps one per player, or
one per small group or list of players (there are advantages to both
models), where each event passes on the message to its target
player(s) only.

Note: I dug into this TELL implemenation a while back on the list,
along with details on how to handle channels, SAY, WHISPER etc
efficiently in this sort of event driven C&C model.  If you want I'll
try and dig it up.
  
>[ how to prevent events from starving ]

>> I don't have a pure solution.  I (rough memory) follow the following
>> course:
>> 
>>   A rescheduled event is given a higher than default priority.  All
>> this means is that it gets first crack at getting a thread to execute
>> on as versus a "normal" event which will have to wait until all
>> higher-priority events have threads.
>> 
>>   A 5*rescheduled event forces the Executor to cease assigning events
>> to threads until the 5*rescheduled event terminates.
>> 
>>   A 10*rescheduled event (I think its 10) forces the Executor to cease
>> assigning events to threads until the 10*rescheduled event
>> successfully terminates (ie it heads toward and finally goes into
>> single-tasking mode).
>> 
>>   A 20*rescheduled event (have no idea what the real number is)
>> signals all other executing events to block until this event has
>> terminated.  This just shuts down the multi-threading until the event
>> has run.

>Ahh.  I had settled on using execution priorities as well, but not
>quite like this.  I don't think I'll keep track of the number of
>times an event has been rescheduled.  

I don't actoually keep track of the number of reschedules directly. 
The priority level keeps incrementing, and that in turn (reasonably
well) maps directly to the number of reschedules.  

  Note: Almost all events come in at a standardised low priority.  A
very few events are authorised to come in at a higher priority.  As
such the priority level is not an exact measure of the reschedule
count, but in practice I don't care.  The fact that a high priority
event has problems C&C'ing is just as significant as an originally low
priority event now at the same priority level thru rescheduling having
C&C problems.

>Instead, I've a fixed number of
>possible priorities, with the top one reserved for internal use.  An
>event will have its priority increased each time it is rescheduled.
>If it reaches the top priority, it is run to completion before any
>further event processing.

If you think about it, this is almost exactly the same model.

>> Note: Events have a very short maximum time they must execute within.
>> Event's which take longer than that time are killed and not
>> rescheduled.  This maximum time is reduced as events are rescheduled
>> and ascend the above priority list (ie I give them more chance to get
>> their job done, but less time to do it).

>Not so for me.  Am I missing something while standing here at the
>drawing board? :)

Look at it this way:

  50 events are simultaneously executing on your server.  Under those
conditions a new event X (say Bubba moving from room A to room B) can
expect to compleat in T real world clock ticks.

  Due to rescheduling, priority levels, or just a paucity of events,
there are no __NO__ events executing on your server.  Under those
conditions a new event X (say Bubba moving from room A to room B) can
expect to compleat in U real world clock ticks where U<T.

Part of my attention here is that I don't want events which are
borderline on their time values managing to C&C only because they
throw the system into single-threaded mode and then squeak thru
because they have the whole damn CPU to themselves.

Note: A lot of this is obviated by hanging the way you measure an
event's execution time.  I originally crafted the above schema when I
was measuring event time in real world CPU clock ticks.  I have slated
to change this to some sort of measurement of internal processing
time, but have yet to come up with a decent system.

--
J C Lawrence                           Internet: claw at null.net
(Contractor)                           Internet: coder at ibm.net
---------------(*)               Internet: clawrenc at cup.hp.com
...Honorary Member Clan McFUD -- Teamer's Avenging Monolith...




More information about the mud-dev-archive mailing list