[MUD-Dev] C&C and Event Rescheduling

clawrenc at cup.hp.com clawrenc at cup.hp.com
Fri Aug 1 16:00:57 CEST 1997


In <199707311608.SAA00450 at mare.zesoi.fer.hr>, on 07/31/97 
   at 07:12 PM, Miroslav Silovic <silovic at mare.zesoi.fer.hr> said:

>> In <199707261742.TAA00453 at regoc.srce.hr>, on 07/26/97 
>>    at 10:53 AM, silovic at srce.hr (Miroslav Silovic) said:

>> commit as I do.  This could allow a failed event to die and reschedule
>> much sooner than if it had to wait for the other contender to commit

>This was my intention.

To an extent however it could reduce responsiveness:

  Very long event X starts processing.
  Near the end of X's run: very short event Y starts processing.
  Y locks resource A.
  X requests A and dies.
  
Didn't you have some sort of age/priority scheme in there to counter
this?  I seem to remember something such.

>> It also changes the character fo the DB accesses.  In the lockless
>> model events compete to C&C successfully.  Events with smaller
>> workings have an advantage at C&C time, as do events with shorter run
>> times.  Its a question of who is in and out first.  Neither of those
>> points are true for your model.  For you, more or less, the first to
>> reach the base has it, holds it, and is guaranteed to commit
>> successfully (ignoring questions of needing to access an object which
>> another already has locked, which is not a safe assumption as below). 

>I consider this a good thing. Remember, you have change_parents
>function in Cold. When you call it, it guarantees that the database
>will be consistant after termination. 

FWLIW I have a similar call, but I don't guarantee consistancy. 
Instead I rebuild the inheritance tree when an outdated object is
detected.  This doesn't change the workload much -- it just spreads it
out over time.

>The problem is that it means
>that constructor/destructor pairs get called on /all/ descendants of
>the affected object. 

Translation: You have an event with a HUGE working set which is almost
guaranteed to fail to C&C due to the large number of objects involved. 
This is precisely why I don't rebuild everything in one swell foop. 
Note: My approach also allow optimising for those cases where two
different inheritance changes affect the same object -- which while
perhaps not the most common case, it sorta cute.

>Adding a parent to the user object, in other
>words, will touch all the users. This means that such task has to run
>with VERY high priority (and in my scheme, I intend to knock the
>locks off the lower-priority tasks, precisely because of this). Now,
>this takes about 20-30 seconds on a LARGE db (say, 5,000 users), if
>the parent change is significant. This means that if lower-priority
>tasks could kill this at will, it'd take 5-10 minutes for it to grow
>high enough in priority to actually complete, according to your
>scheme. 

Were I to do this sort of mass munging of the DB by a single event,
I'd have the event start at one of the higher priority levels --
potentially even at single-threaded mode.  I can do that.

>In my scheme, it'd get max priority, and all threads in its
>way would restart, but it'd complete in reasonable time without
>affecting the game too much.

Which I can do as well by short-circuiting the priority level scheme. 
I don't see a net win/lose where.

>IMO, the events roughly fall into two categories: database
>recompilation (change_parents is one such), and game events. I think
>that game events roughy take equal ammount of time, and giving
>priority to the shorter event is actually big potential problem: If
>events take 20 and 200 ms respectively, total processing, when 30 ms
>even is rescheduled, is 20+2*200 = 420 ms, instead of 220ms. On the
>other hand, database recompilations are so unpredictable that they're
>best run with high priority or atomically, simply because that way
>the probability that something will break is the lowest.

True.  The problem with the lockless model is that performance
degrades almost geometrically as contention rises.

>> Another potential problem is that your locking model opens itself to
>> infinite loops about the Dining Philosophers problem.  Without some
>> very interesting deadlock detection it is quite possible to have (say)
>> a minimal set of 3 events each of which attempts to lock various
>> objects and all of which get killed/rolled back because one of the
>> other two events already has the requested resource.  Consider:
>> 
>>   Objects A, B and C,
>> 
>>   Events X, Y and Z.
>> 
>>   X locks A.
>> 
>>   Y locks B.
>>   
>>   Z locks C
>> 
>>   X attempts to lock B and dies.
>> 
>>   Y attempts to lock C and dies.
>>   
>>   X restarts and locks A.
>> 
>>   Z attempts to lock A and dies.
>> 
>>   Y restarts and locks B.
>> 
>>   X attempts to lock B and dies.
>> 
>>   Z restarts and locks C.
>> 
>>   Y attempts to lock C and dies.
>> 
>>   ...etc.
>> 
>> This is a very simplistic case.  Add many more events, and have each
>> event attempt to access more than one object (5?  10?) and it becomes
>> almost probable.

>Actually I intend to reschedule the tasks with random delays,
>precisely because of this (I believe this solution is standard).
>Deadlock is improbable, then, because sooner or later things will
>schedule in the right order, with proper pause between each.

Random rescheduling is a standard approach, however I wouldn't label
it a solution.  You might like to do a simulation here.  I suspect
you'll be surprised by the results.  The problem of this approach is
that its incredibly sensitive to both the load and to range of the
random number generated.  

Specific case:  If the random generation varies by a value which is
generally far smaller than the specific contending events in question,
then they'll continue to contend for a long while.  There are other
nasty cases (a simulation should show you most of the nasty ones).  

I wonder if you're not going to need to profile the run-length of most
of your verbs and tailor (perhaps dynamically at runtime) your random
rescheduling to suit.

>> Aside: Its not distributable, well, not in any sane manner, whereas
>> the lockless model rolls out there pretty easily, especially if you
>> allow objects to migrate between DB servers.

>My method would support RPC model of distribution, with remote
>proxies...

This sounds a lot like Cool's model.

>...(Cold already supports proxy objects, but they are not all
>that useful without proper thread safety). For that matter, I'm still
>split between RPC and replication in choosing the right model for
>distribution. Replication bombs the bandwidth with flinging the
>objects around, while RPC needs to interchange several remote-call
>packets to do anything. On the other hand, replication would fare
>REALLY badly if two servers do intense manipulation of the same
>object for any length of time.

Unter originally took the replication model.  Latter design
discussions that I've heard of spoke of doing RPC until a threshhold
value (balance of local calls to remote calls from specific target) at
which point the object would be moved to the preferential server.

--
J C Lawrence                           Internet: claw at null.net
(Contractor)                           Internet: coder at ibm.net
---------------(*)               Internet: clawrenc at cup.hp.com
...Honorary Member Clan McFUD -- Teamer's Avenging Monolith...




More information about the mud-dev-archive mailing list