[DGD] Kernel library question

bart at wotf.org bart at wotf.org
Thu Aug 23 19:12:37 CEST 2018


On Thu, 23 Aug 2018 15:10:40 +0200, Felix A. Croes wrote

<snip> 

> The basic problem I was trying to solve with suspended callouts was a
> Hydra issue: how can I perform a task which changes a lot of objects,
> making it susceptible to rollback, but which, when broken up into smaller
> tasks, runs the risk of allowing other intervening tasks to run with
> partially changed data or code?  The prime example being a global
> recompile which changes an API.

Ah yes, Hydra might have this condition even when not using a chain of
call_outs. Of course wotf running on dgd won't suffer from the roll_backs but
the calling outdated code due to api changes situation does occur (due to the
call_out chain).

> 
> A global recompile is expensive,

Yes I can see that. On wotf, doing this in one run was causing some big
trouble because of trying to pull in all 'blueprints' for the objects that
were being recompiled while not getting a chance to swap out things inbetween,
and most objects not being clones but unique objects on that mud. 

I had to split it in a 'critical' and 'non critical' part, roughtly the lib
itself is critical in its entirety, while any 'area code' is considered non
critical.


> and a global recompile which is rolled
> back several times is even more expensive.  

Ha, I bet.

> So the idea was to 
> suspend callouts, do the recompile in a single task which will not 
> be rolled back because there are no competing tasks running, and 
> then release the callout suspension.

Makes sense at first glance.

> 
> There were several problems with the implementation.  There were 
> bugs, especially with saving callouts that were triggered during suspension;

Interesting, I have not seen any issues with this in my current implementation.

> there was extra overhead for callout management even when callouts
> were not suspended; 

Yes, I can see that, in case of wotf it involves a call to the driver object
to check for call_out suspension.

> callouts which were triggered while suspended would
> still run a task in the object, and could thereby still prevent a 
> task which modifies that object along many others from completing without
> rollback; 

This is something I tried to prevent by trying to not have any central
administration of call_outs (ie, I never tried the call_outs as a manageable
resource approach either).

Its also what I'm trying to prevent by using the proxy lwo.
But.. I suppose rescheduling the call_out, even if it does not change any lpc
visible data in the object, will still result in a change to the object?
(seeing how it changes the call_out array in status() for the object)

> and suspended callouts were saved in a central object, 
> which would also be used by any other callouts triggered during 
> suspension, meaning that when there are thousands of callouts trying 
> to run and getting suspended, a large number of them will be rolled 
> back.  The cure wwas starting to look worse, or at least no better,
>  than the disease.

Hehehe.

Well, I do think you can prevent needing such a central object for this, and
actually any kind of administration other than what can be stored in the
arguments of the saved/rescheduled call_out.

> 
> Furthermore, as I gained insight into the issue, I also found different
> ways to prevent rollbacks within Hydra, for cases where I had assumed
> it to be inevitable.  Also, it is quite simple to start a task that
> must modify many objects with an action that will cause an immediate
> rollback if the completion of the task is not already guaranteed, for
> example by writing to a file.  Even though this does not prevent a
> rollback, the rollback will occur before the expensive global 
> recompile, rather than after it.

That is interesting, and while not using hydra, something I've been keeping in
mind, as I hoped, expected even, that performing such an action early would be
possible and helpful for such situations.

Eventho I don't use Hydra myself, I find the implementation and requirements
extremely interesting, and always try to keep them in mind.

> 
> In the end, this was a major factor in stopping development of the
> kernel library, taking a snapshot of it, and altering that radically
> in backward-incompatible ways to better suit Hydra, as part of the
> Cloud Server library.

Understandable.

Bart
--
https://www.bartsplace.net/
https://wotf.org/
https://www.flickr.com/photos/mrobjective/




More information about the DGD mailing list