[DGD] Hydra

Tue Aug 21 00:08:14 CEST 2018

bart at wotf.org wrote:

> This is the biggest (in cores) system I have currently. Its not a very new
> system, but a fair bit newer than your 10 years old one still.

Nehalem, one generation newer than my Penryn.  It will have expensive
Meltdown mitigation.  Only 8 cores in /proc/cpuinfo, did you disable
hyperthreading?

Thanks a lot for running the tests.  Since the numbers won't mean much
to you, here is a quick explanation:

There are 10000 objects.  100 of these function as senders, and all of
them function as receivers.  The sender objects send messages to (start
callouts in) up to 100 receiver objects, which are randomly chosen by
each sender.  This is repeated 10 times by each sender, after which
the number of receivers is increased by one and the senders choose a
new set of objects to receive their messages.

For test-6, the number 1.730 for 100 receivers means that 100 * 100 * 10
callouts were run.  That's 100000 tasks, or transactions, in 1.730 seconds.

As I mentioned, this is a synthetic benchmark.  The message received is
just an integer value, and the receiver object merely assigns this value
to a global variable.  It's about the thinnest implementation for which
you can claim that any information was actually received.

The point of the benchmark is that it is written in such a way that
rollbacks can almost entirely be avoided, so that locking is the main
constraint on scalability.  There are two types of locks that matter.
The first is the lock on the object table, which is used whenever a
task first accesses an object, and when the changes made by a task are
committed.  The second is a per-object lock which is used when a
callout is added or removed.

Starting a callout in 100 objects is not a common occurrance in normal
code, but it will happen occasionally since starting a callout in an
object does not count as a modification of that object, as long as
the dataspace of the object is not accessed when the callout is
added.  So, it is genuinely a way to pass information to, say,
10000 user objects without having to modify all of those objects
in a single task, which would be very likely to fail to commit,
resulting in a very expensive rollback.

With only 1 LPC thread, there is no lock contention.  With 2 LPC threads
there is, and there are not enough tasks running in parallel to
compensate for that.  With 6 LPC threads, the gain from running
more tasks outweighs the increased lock contention.  I am not sure
how to interpret the case with 8 LPC threads yet, but at least it shows
that the ultimate limit imposed by lock contention on scalability has
not been reached.  I would like to see tests on machines with more
cores.

In normal code, there will be fewer objects accessed in each task,
and the constraint on concurrency will shift from lock contention to
rollbacks.

You can run the same test with DGD to get an idea of the overhead all
of this imposes.

Regards,
Felix Croes