[DGD] Hydra

Sun Sep 16 17:45:51 CEST 2018

I have gathered some useful results from the feedback I received on
this thread.

The purpose of the benchmark was to see how Hydra performs in a worst-
case scenario.  It is not a realistic benchmark; Hydra could perform
badly on it, and still do well in a real-world scenario.

Of all valid replies, only one was for a server CPU.  The others were
for desktop and laptop CPUs.  Unfortunately the server CPU was also
the oldest one; I would have liked to see the result for a modern
server CPU with a lot of cores.

    http://www.dworkin.nl/bm002/i7-6700K-4-windows.png

CPU: Core i7 6700K, 4 cores, 4.0 Ghz, 8 Mb cache, released 2015

This CPU has 4 cores, so only the data for 1 and 2 cores is relevant
for this benchmark; Hydra will have other threads running along with
the LPC threads, and the OS will schedule other tasks to run, as well.
The graph shows two mostly straight lines, which is important.  If
the line with the highest core count starts to bend upward, it is a
sign that there is too much locking going on, and the implementation
no longer scales well for this extreme test.  So far so good.

    http://www.dworkin.nl/bm002/i7-7700HQ-4-windows.png

CPU: Core i7 7700HQ, 4 cores, 2.8 Ghz, 6 Mb cache, released 2017

Similar results.  The lines are a bit more wobbly, especially the one
for 1 core, but the overal result is still good.  Note that this CPU,
which is more recent, is slower than the previous one, although not
by a lot.

    http://www.dworkin.nl/bm002/i7-8750H-6-windows.png

CPU: Core i7 8750H, 6 cores, 2.2 Ghz, 9 Mb cache, released 2018

This CPU has 6 cores, so the relevant results are with 1, 2 and 4
LPC threads running.  The result is interesting but not worrying;
the OS obviously decided that it had more important things to do
during part of the test, but you can still see the original path
from which the results deviate.  The line for 4 cores is straight,
fortunately.

    http://www.dworkin.nl/bm002/xeon-E5520-2x4-linux.png

CPU: dual Xeon E5520, 2 x 4 cores, 2.26 Ghz, 8 Mb cache, released 2009

The oldest CPU in this test.  Also the only server CPU, also the only
dual CPU system (2 x 4 cores).  The lines plotted are for 1, 2, 4 and
6 cores.

The most interesting result, also because it is the only test
performed on Linux.  In my own testing, I saw that Hydra scales
linearly on Windows and Solaris, but when running Linux on the same
hardware, it does much better with lower core counts, while still
performing similar to Windows and Solaris with high core counts.
That is also what we see here.  Is this due to the OS, the old
hardware, or because it is a dual CPU system, with a lot of copying
back and forth between caches?  I am not sure, because I did my
own testing with various operating systems on a slightly older, but
similar system.

The line for 1 core is mostly straight.  The line for 6 cores is
too, though it could be starting to bend upward at the top.  The
line for 2 cores fluctuates between better and worse than the line
for 1 core.  The line for 4 cores fluctuates between better than
<6> cores, and worse than 1 core.

I think what we are seeing here is a system trying to decide
whether to run everything on one CPU or two CPUs, and doing a lot of
catch-up copying when changing strategies.  When the number of
active CPUs is unambiguously higher than the core count of a
single CPU, it performs worse for 50 recipients, but better for
100 recipients.

Overall, the results are encouraging.  For modern systems, even
desktop and laptop systems, the locking architecture scales well.
Naturally I would still like to see results for a single-CPU server
system with 32 or so cores. :)

For an encore, I will try to come up with a benchmark that measures
performance realistically.

Regards,
Felix Croes