[MUD-Dev] UDP Revisted

Thu Oct 25 00:05:03 CEST 2001

Bobby,

Thanks for the response. :)

Bobby Martin wrote:
>> From: Bruce Mitchener <bruce at puremagic.com>

>> I was curious as to what sorts of advanced features you had in
>> ARMI and the documentation didn't really go into much detail
>> beyond some usage notes.

>> Do you have any support for pipelining of requests?

> No, although TCP (which I currently use) buffers packets for up to
> 200 ms before it sends the buffer.

Mmmm, this isn't quite what I was thinking about when I wrote that.

If I fire off 3 message sends in a row, and TCP is the transport in
use, what happens?  Do all 3 get sent to the remote end right after
each other and then responses are returned, tagged with some request
identifier as they happen?  Is there an ordering imposed on the
return of requests? (Say that you send off messages 1 and 3 to
object A, and message 2 to object B.  Object B takes a long time in
responding.  Does that delay the return of the result for #3?)

Or, similar to that, what happens if one method call returns a very
large bunch of data.  Will that hold up following method sends, or
do you break data into chunks similar to what BEEP and HTTP/1.1 can
do to multiplex a single channel?

Finally (for now :)), do you share the transport connection, or do
you establish new ones frequently (like per-object, per-method send,
etc)?

But, on the point that you brought up, do you have control over that
200 ms buffering from Java? (To turn off nageling or whatever you
might want to change.)  Is the 200ms delay proving troublesome at
all?  That seems like a fairly high penalty to pay in terms of
latency for any distributed communication.

>> How about for monitoring message traffic or requests for
>> debugging, checking load balancing, etc?

> Currently in the works, but not there right now.

What sort of stuff is in the works? :)

>> What sorts of security or authentication do you support?

> Currently, none :( It is fairly trivial to add an encrypting
> message filter, though.  You can filter messages that you send any
> way you like.

What sort of model are you thinking about though?  Is this mainly for 
communication between trusted peers?  Or might it be used in a more 
hostile environment with foreign (or user-written) code involved?

>> Can you compare your packing overhead to that of RMI or other
>> systems? Do you compress your datastreams?

> We don't currently formally compress data streams.  I have done
> some sampling of RMI throughpuut versus ARMI throughput; ARMI was
> about 10 times smaller for the method calls I tested.  This
> definitely needs more formal testing.

and later in your post, but moved here for ease in replying ...

> I have implemented production compression systems before, and I
> can tell you that you will almost certainly need tailor made
> compression (i.e. just write clever serialization code) for any
> significant gains.  Lots of small information packets are hard to
> compress, unless you take large production runs and analyze them,
> then compress assuming the usage patterns don't change too much.
> Most compression mechanisms require that every block of data you
> compress has a table describing how to decompress it, and for
> small blocks the table takes up more space than the amount you
> gain by compressing.  If you analyze large production runs of
> data, the tables can be static and just sit on either end of the
> connection.

With respect to serialization strategy being more important than a
traditional compression algorithm, I definitely agree.  Brad Roberts
and I made some changes to Cold about 2-3 months ago that changed
how we serialized some bits of data and objects for storage into the
DB.  Those changes managed to take a DB that was roughly 2G and drop
it to about 1.6G in size, which was quite nice.  (It also had the
interesting impact of reducing our fragmentation of free space on
disk as more objects were now small enough to fit in the minimum
block size, consuming more of bits of free space floating about that
previously used to just contribute to wasted space.  Those gains
aren't accounted for in the numbers that I gave above.)

Regarding throughput, you say that messages were smaller.  Were the
round-trip times also less?  Or roughly similar?

>> Any more information to explain in more detail how/why ARMI is
>> great is welcome. :)

> Offhand I don't know what I can tell you about why ARMI is great
> that isn't listed on the web site.  Main attributes are:

>  1) Your method calls can be asynchronous (your app doesn't stop
>  and wait for the return value from a method call until you need
>  the return value.)

Will you provide some higher level support to be used by a caller of
an asynchronous method to help deal with the failure cases that
might result, or any sort of syntactic sugar/glue to help deal with
the complexities that arise from a more asynchronous model?

Some cool work in this that has been done is in the E programming
language. (Disclaimer: that work probably is from predecessors to E,
but that was my first exposure to this and I can't credibly cite
which predecessor it might have been.)

Taking from the E in a Walnut draft book, and the chapter on
Distributed Co
mputing, located at http://www.skyhunter.com/marcs/ewalnut.html#SEC18:

-- begin quote --
    All distributed computing in E starts with the eventually
    operator, "<-":

      # E syntax
      car <- moveTo(2,3)
      println("car will eventually move to 2,3. But not yet.")

    The first statement can be read as, "car, eventually,
    please moveTo(2,3)". As soon as this eventual send has
    been made to the car, the program immediately moves on
    to the next sequential statement. The program does not
    wait for the car to move. Indeed, as discussed further
    under Game Turns below, it is guaranteed that the
    following statement (println in this case) will execute
    before the car moves, even if the car is in fact running
    on the same computer as this code.
--- end quote ---

and ....

-- begin quote --
    When you make an eventual send to an object (referred to
    hereafter simply as a send, which contrasts with a call
    to a local object that waits for the action to complete),
    even though the action may not occur for a long time, you
    immediately get back a promise for the result of the action:

    # E syntax
    def carVow := carMaker <- new("Mercedes")
    carVow <- moveTo(2,3)

    In this example, we have sent the carMaker object the
    "new(name)" message. Eventually the carMaker will create
    the new car on the same computer where the carMaker
    resides; in the meantime, we get back a promise for
    the car. We can make eventual sends to the promise
    just as if it were indeed the car, but we cannot make
    immediate calls to the promise even if the carMaker
    (and therefore the car it creates) actually live on
    the same computer with the program. To make immediate
    calls on the promised object, you must set up an action
    to occur when the promise resolves, using a when-catch
    construct:

      # E syntax
      def temperatureVow := carVow <- getEngineTemperature()
      when (temperatureVow) -> done(temperature) {
        println("The temperature of the car engine is: " + temperature)
      } catch prob {
        println("Could not get engine temperature, error:" + prob)
      }
      println("execution of the when-catch waits for resolution
      of the promise,")
      println ("but the program moves on immediately to these printlns")

    We can read the when-catch statement as, "when the promise
    for a temperature becomes done, and therefore the temperature
    is locally available, perform the main action block... but
    if something goes wrong, catch the problem in variable
    prob and perform the problem block". In this example,
    we have requested the engine temperature from the
    carPromise. Eventually the carPromise resolves to a
    car (possibly remote) and receives the request for
    engine temperature; then eventually the temperaturePromise
    resolves into an actual temperature. The when-catch
    construct waits for the temperature to resolve into
    an actual (integer) temperature, but only the when-catch
    construct waits (i.e., the when catch will execute
    later, out of order, when the promise resolves). The
    program itself does not wait: rather, it proceeds on,
    with methodical determination, to the next statement
    following the when-catch.

    Inside the when-catch statement, we say that the promise
    has been resolved. A resolved promise is either fulfilled
    or broken. If the promise is fulfilled, the main body of
    the when-catch is activated. If the promise was broken,
    the catch clause is activated.

    Notice that after temperatureVow resolves to temperature
    the when clause treats temperature as a local object. This
    is always safe to do when using vows. A vow is a reference
    that always resolves to a local object. In the example,
    variable names suffixed with "vow" remind us and warn
    others that this reference is a vow. We can reliably
    make eventual sends on the vow and immediate calls
    on what the vow resolves to, but we cannot reliably
    make immediate calls to the vow itself. A more detailed
    discussion of these types of warnings can be found in
    the Naming Conventions section later.
--- end quote ---

E takes a very thought out and considered approach to distributed
computing (especially of the variety where untrusted code may be
involved).  But, were you looking at providing any similar sort of
mechanism for simplifying working with asynchronous method sends?

I've found that the addition of syntactic sugar for handling
asynchronous operations can even be useful when distributed
computations aren't being used.  In TEC, we've got a scripting
language that GMs use for some things that compiles to ColdC.  I've
taken advantage of that to provide some fairly high level approaches
to solving problems that then compile into the lower-level, messy
code.  (And we lack a lot of the complexity of the environment that
E is designed for, but some of the same principles are paying off
well for us.)

>  2) I use ids for the class and method instead of the huge
>  descriptors that RMI uses.

Does that change the security profile of your protocol at all?  (Is
there a good reason that RMI uses large descriptors?)

>  4) <not implemented yet> You can choose whether a method call is
>  guaranteed or not.  Currently you either use UDP for transport
>  and all method calls are non-guaranteed but fast, or you use TCP
>  for tranport and all method calls are guaranteed but slower (but
>  still much faster than RMI).

How will you do this?  Will it be something that is decided at a
method call site?  Or will it be per-method (and globally decided)?
If the latter, will you have some sort of IDL that provides that
info, a registration process, or something else?

  That sort of thing is where I think that a language with
  extensible parsing/keywords is really useful.  Taking it back to
  persistence for a moment, you might invent your own keywords to
  specify persistence behaviors.  Or for distributed computing, you
  might just tag your source with directives on how that variable
  should be updated over the network.

  I'm not a big fan of requiring a whole new IDL or something for
  stuff like that, or a specialized program like OpenSkies uses for
  their object definitions.  But, then, I guess not many mainstream
  programming languages have an extensible grammar

- Bruce

_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev