Execution
Jeff Kesselman
jeffk at tenetwork.com
Thu Apr 10 18:58:48 CEST 1997
You left out a true virtual machine with a compiled assembly code i nits
own assembly langauge (typicly a stack machine of soem kind).
These VMs come in two varietoes these days, traditional stack machines liek
the old pascal P system, adn object orienetd stack machiens like the JAVA
machine (and my Kelvin VM).
JK
At 07:43 AM 4/10/97 MST, you wrote:
>:> Perhaps we can come up with a gradient (I'm sure this has been done
before!)
>:> and we can all point to where we are on it:
>:>
>:> 1. native machine code
>:> 2. threaded code
>:> 3. bytecode
>:> 4. parse tree traversal
>:> 5. pre-tokenized interpretation
>:> 6. straight text interpretation
>:
>:Would anyone be able to give a short description of all of these?
>:(Especially threaded and bytecode) I have been trying to find out about
>:threads - i got the pthreads lib for linux, the docs for that are
>:impossible to understand without *some* kind of prior knowledge of what
>:threads are - I have heard that the linux port of it isnt very
>:good/stable/efficient and martin keegan has gone so far as to advise not
>:to use them under any kind of unix..
>
>[snip]
>
>:Has anyone ever done an analysis comparing the above 6 methods? It would
>:be interesting to look at.
>
>Weeeell, since I opened this can of worms... First, note that I completely
>made up the above list - it is not official in any way! Second, the
>'threading' referred to has nothing to do with threads of execution. Sorry.
>Take *all* of this stuff with lots of grains of salt. I've only ever done
>4 and 6, myself. Oh, and a bit of 2 many many years ago.
>
>- native machine code: that produced by compilers like C, C++, assemblers
> Stuff that runs directly on the CPU using native CPU instructions.
>
>- threaded code: the usual example here is the language Forth. There are
> variants of this, but here is one. Stuff in memory is typically a
> sequence of addresses, with a few non-addresses mixed in. The addresses
> are pointers into the code of the support framework for the language.
> The system will have a central core, consisting of half a dozen or
> less instructions, that just reads the next pointer in the sequence,
> and branches to it. That sequence will do its thing, perhaps reading
> a word of data from the sequence, then branching back to the central
> core to allow the next operation. E.g.
>
> if a < 0 then
> Print("negative\n");
> else
> Print("positive\n");
> fi;
>
> could be represented as:
>
> @pushvar primitive: push value onto stack
> A value: address of variable who's value to push
> @pushconst primitive: push constant value onto stack
> 0 value: the constant 0
> @< primitive: pop 2 args, compare, push 1 if '<' else 0
> @if primitive: the 'if' handler
> @label1 value: address of 'else' part
> @pushconst
> 'negative' value: the address of the string constant
> @printstring primitive: the thing that prints a string
> @branch primitive: branch to the indicated place
> @label2 value: the address to branch to
>label1: @pushconst
> 'positive' value: the address of the string constant
> @printstring
>label2:
>
> So, you have to sort of 'compile' to this stuff, but it is a lot
> easier than compiling to true native code. You don't care about the
> details of instruction formats and stuff, you don't have to worry about
> linking, and you don't have to worry about object file formats.
>
> Other forms of threaded code have actual instruction sequences instead
> of the above type of sequence. Subroutine call instructions are used to
> get to the primitives, and they can use the return address to know
> where their operands are.
>
>- bytecode: this is conceptually easier, but still has lots of variations
> possible. You compile to something like the above, but just use
> (typically) single-byte codes to represent which primitive you want
> to use. E.g. for our example:
>
> 00: <numeric code for PUSHVAR>
> 01: <some pointer to 'a'
> 05: <numeric code for PUSHCONST>
> 06: 0 (4 bytes)
> 10: <numeric code for LESSTHAN>
> 11: <numeric code for IF>
> 12: 9 (2 bytes)
> 14: <numeric code for PUSHCONST>
> 15: <pointer to string>
> 19: <numeric code for PRINTSTRING>
> 20: <numeric code for BRANCH>
> 21: 6 (2 bytes)
> 23: <numeric code for PUSHCONST>
> 24: <pointer to string>
> 28: <numeric code for PRINTSTRING>
> 29:
>
> The 'interpreter' of these bytecodes can be written in assembler, or
> in some higher level language like C. Again, you sort of have to
> compile to this stuff, but its fairly easy compilation.
>
>- parse tree traversal: (I use this). This is just a bunch of malloc'ed
> memory, containing records of a union type, that have a type code
> saying what they are (like the bytecode codes), plus pointers to
> further records, or some constant values. Rough ASCII art follows:
>
> ------------------------
> | IF | | | |
> ------------------------
> / | \
> / | \
> / | \
> --------- ----------- -----------
> |<| | | |PSTR| ptr| |PSTR| ptr|
> --------- ----------- -----------
> / |
> / |
> -------- ---------
> |VAR| a| |CONST|0|
> -------- ---------
>
> Just some kind of direct 'parse tree' of the program. The two parts
> of the 'if' would often be nodes representing a sequence of other
> nodes, terminated by one with a nil pointer (linked list of things
> to do). The interpreter just does a recursive preorder (do the left
> most node first) traversal of the tree, executing as it goes. For
> an 'if', it evaluates the condition, and decides which of the two
> subparts of the 'if' to execute. The overhead here is all of the
> extra recursive calls of the interpreting routine. This structure
> is fairly easy to work with, however, such as being 'pretty printed'
> back out to ASCII form.
>
>- pre-tokenized interpretation: the program has been turned into a
> sequence of codes and constants, like in bytecodes, but it hasn't
> necessarily been fully checked for consistency. So, the interpreter
> must continually check that the next codes make sense. Also, the
> names of variables, etc. probably haven't been looked up yet (they
> are just stored as ASCII strings), so the interpreter has to go
> find what they refer to (if they are in fact valid!)
>
>- straight text: as above, but no pre-tokenization into chunks has been
> done. All of the work of scanning the input is done over and over
> again as the code is re-used. Quite slow, but doesn't require the
> design of any extra data structures, magic codes, etc.
>
>Whew! I'm now 15 minutes late for my bus - hopefully I'll catch the next one!
>
>--
>Chris Gray cg at ami-cg.GraySage.Edmonton.AB.CA
>
>
More information about the mud-dev-archive
mailing list