Tutorial: Let's build a Compiler! - Part XIII: Procedures

Jon A. Lambert jlsysinc at ix.netcom.com
Mon Mar 2 16:40:19 CET 1998


                     LET'S BUILD A COMPILER!

                                By

                     Jack W. Crenshaw, Ph.D.

                          27 August 1989


                      Part XIII: PROCEDURES


*****************************************************************
*                                                               *
*                        COPYRIGHT NOTICE                       *
*                                                               *
*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *
*                                                               *
*****************************************************************


INTRODUCTION

At last we get to the good part!

At  this point we've studied almost all  the  basic  features  of
compilers  and  parsing.    We  have  learned  how  to  translate
arithmetic expressions, Boolean expressions, control  constructs,
data  declarations,  and  I/O  statements.    We  have defined  a
language, TINY 1.3, that embodies all these features, and we have
written  a  rudimentary  compiler that can translate  them.    By
adding some file I/O we could indeed have a working compiler that
could produce executable object files  from  programs  written in
TINY.  With such a compiler, we could write simple  programs that
could read integer data, perform calculations with it, and output
the results.

That's nice, but what we have is still only a  toy  language.  We
can't read or write even a single character of text, and we still
don't have procedures.

It's  the  features  to  be  discussed  in  the  next  couple  of
installments  that  separate  the men from the toys, so to speak.
"Real" languages have more than one data type,  and  they support
procedure calls.  More than any others, it's  these  two features
that give a language much of its character and personality.  Once
we  have  provided   for   them,  our  languages,  TINY  and  its
successors, will cease  to  become  toys  and  will  take  on the
character  of  real  languages,  suitable for serious programming
jobs.

For several installments now, I've been promising you sessions on
these  two  important  subjects.  Each time, other issues came up
that required me to  digress  and deal with them.  Finally, we've
been able to put all those issues to rest and can get on with the
mainstream  of  things.    In   this   installment,   I'll  cover
procedures.  Next time, we'll talk about the basic data types.


ONE LAST DIGRESSION

This has  been an extraordinarily difficult installment for me to
write.  The reason has nothing to do with the subject  itself ...
I've  known  what I wanted to say for some time, and  in  fact  I
presented  most  of  this at Software Development  '89,  back  in
February.  It has more to do with the approach.  Let me explain.

When I first  began  this  series,  I  told you that we would use
several "tricks" to  make  things  easy,  and to let us learn the
concepts without getting too bogged down in the  details.   Among
these tricks was the idea of looking at individual  pieces  of  a
compiler at  a time, i.e. performing experiments using the Cradle
as a base.  When we studied expressions, for  example,  we  dealt
with only that part of compiler theory.  When we  studied control
structures,  we wrote a different program,  still  based  on  the
Cradle, to do that part. We only incorporated these concepts into
a complete language fairly recently. These techniques have served
us very well indeed, and led us to the development of  a compiler
for TINY version 1.3.

When  I  first  began this session, I tried to build upon what we
had already done, and  just  add the new features to the existing
compiler.  That turned out to be a little awkward and  tricky ...
much too much to suit me.

I finally figured out why.  In this series of experiments,  I had
abandoned the very useful techniques that had allowed  us  to get
here, and  without  meaning  to  I  had  switched over into a new
method of  working, that involved incremental changes to the full
TINY compiler.

You  need  to  understand that what we are doing here is a little
unique.  There have been a number of articles, such as  the Small
C articles by Cain and Hendrix, that presented finished compilers
for one language or another.  This is different.  In  this series
of tutorials, you are  watching  me  design  and implement both a
language and a compiler, in real time.

In the experiments that I've been doing in  preparation  for this
article,  I  was  trying to inject  the  changes  into  the  TINY
compiler  in such a way that, at every step, we still had a real,
working  compiler.     In   other  words,  I  was  attempting  an
incremental enhancement of the language and  its  compiler, while
at the same time explaining to you what I was doing.

That's a tough act to pull off!  I finally  realized  that it was
dumb to try.    Having  gotten  this  far using the idea of small
experiments   based   on   single-character  tokens  and  simple,
special-purpose  programs,  I  had  abandoned  them  in  favor of
working with the full compiler.  It wasn't working.

So we're going to go back to our  roots,  so  to  speak.  In this
installment and the next, I'll be  using  single-character tokens
again as we study the concepts of procedures,  unfettered  by the
other baggage  that we have accumulated in the previous sessions.
As a  matter  of  fact,  I won't even attempt, at the end of this
session, to merge the constructs into the TINY  compiler.   We'll
save that for later.

After all this time, you don't need more buildup  than  that,  so
let's waste no more time and dive right in.


THE BASICS

All modern  CPU's provide direct support for procedure calls, and
the  68000  is no exception.  For the 68000, the call  is  a  BSR
(PC-relative version) or JSR, and the return is RTS.  All we have
to do is to arrange for  the  compiler to issue these commands at
the proper place.

Actually, there are really THREE things we have to address.   One
of  them  is  the  call/return  mechanism.    The second  is  the
mechanism  for  DEFINING  the procedure in the first place.  And,
finally, there is the issue of passing parameters  to  the called
procedure.  None of these things are really  very  difficult, and
we can of course borrow heavily on what people have done in other
languages ... there's no need to reinvent the wheel here.  Of the
three issues, that of parameter passing will occupy  most  of our
attention, simply because there are so many options available.


A BASIS FOR EXPERIMENTS

As always, we will need some software to  serve  as  a  basis for
what  we are doing.  We don't need the full TINY compiler, but we
do need enough of a program so that some of the  other constructs
are present.  Specifically, we need at least to be able to handle
statements of some sort, and data declarations.

The program shown below is that basis.  It's a vestigial  form of
TINY, with single-character tokens.   It  has  data declarations,
but only in their simplest form ... no lists or initializers.  It
has assignment statements, but only of the kind

     <ident> = <ident>

In  other  words,  the only legal expression is a single variable
name.    There  are no control  constructs  ...  the  only  legal
statement is the assignment.

Most of the program  is  just the standard Cradle routines.  I've
shown the whole thing here, just to make sure we're  all starting
from the same point:


{--------------------------------------------------------------}
program Calls;

{--------------------------------------------------------------}
{ Constant Declarations }

const TAB = ^I;
      CR  = ^M;
      LF  = ^J;

{--------------------------------------------------------------}
{ Variable Declarations }

var Look: char;              { Lookahead Character }

var ST: Array['A'..'Z'] of char;


{--------------------------------------------------------------}
{ Read New Character From Input Stream }

procedure GetChar;
begin
   Read(Look);
end;

{--------------------------------------------------------------}
{ Report an Error }

procedure Error(s: string);
begin
   WriteLn;
   WriteLn(^G, 'Error: ', s, '.');
end;


{--------------------------------------------------------------}
{ Report Error and Halt }

procedure Abort(s: string);
begin
   Error(s);
   Halt;
end;


{--------------------------------------------------------------}
{ Report What Was Expected }

procedure Expected(s: string);
begin
   Abort(s + ' Expected');
end;


{--------------------------------------------------------------}
{ Report an Undefined Identifier }

procedure Undefined(n: string);
begin
   Abort('Undefined Identifier ' + n);
end;


{--------------------------------------------------------------}
{ Report an Duplicate Identifier }

procedure Duplicate(n: string);
begin
     Abort('Duplicate Identifier ' + n);
end;


{--------------------------------------------------------------}
{ Get Type of Symbol }

function TypeOf(n: char): char;
begin
     TypeOf := ST[n];
end;


{--------------------------------------------------------------}
{ Look for Symbol in Table }

function InTable(n: char): Boolean;
begin
   InTable := ST[n] <> ' ';
end;


{--------------------------------------------------------------}
{ Add a New Symbol to Table }

procedure AddEntry(Name, T: char);
begin
     if Intable(Name) then Duplicate(Name);
     ST[Name] := T;
end;


{--------------------------------------------------------------}
{ Check an Entry to Make Sure It's a Variable }

procedure CheckVar(Name: char);
begin
     if not InTable(Name) then Undefined(Name);
     if  TypeOf(Name)  <>  'v'  then    Abort(Name  +  ' is not a
variable');
end;


{--------------------------------------------------------------}
{ Recognize an Alpha Character }

function IsAlpha(c: char): boolean;
begin
   IsAlpha := upcase(c) in ['A'..'Z'];
end;


{--------------------------------------------------------------}
{ Recognize a Decimal Digit }

function IsDigit(c: char): boolean;
begin
   IsDigit := c in ['0'..'9'];
end;


{--------------------------------------------------------------}
{ Recognize an AlphaNumeric Character }

function IsAlNum(c: char): boolean;
begin
   IsAlNum := IsAlpha(c) or IsDigit(c);
end;


{--------------------------------------------------------------}
{ Recognize an Addop }

function IsAddop(c: char): boolean;
begin
   IsAddop := c in ['+', '-'];
end;


{--------------------------------------------------------------}
{ Recognize a Mulop }

function IsMulop(c: char): boolean;
begin
   IsMulop := c in ['*', '/'];
end;


{--------------------------------------------------------------}
{ Recognize a Boolean Orop }

function IsOrop(c: char): boolean;
begin
   IsOrop := c in ['|', '~'];
end;


{--------------------------------------------------------------}
{ Recognize a Relop }

function IsRelop(c: char): boolean;
begin
   IsRelop := c in ['=', '#', '<', '>'];
end;


{--------------------------------------------------------------}
{ Recognize White Space }

function IsWhite(c: char): boolean;
begin
   IsWhite := c in [' ', TAB];
end;


{--------------------------------------------------------------}
{ Skip Over Leading White Space }

procedure SkipWhite;
begin
   while IsWhite(Look) do
      GetChar;
end;


{--------------------------------------------------------------}
{ Skip Over an End-of-Line }

procedure Fin;
begin
   if Look = CR then begin
      GetChar;
      if Look = LF then
         GetChar;
   end;
end;


{--------------------------------------------------------------}
{ Match a Specific Input Character }

procedure Match(x: char);
begin
   if Look = x then GetChar
     else Expected('''' + x + '''');
     SkipWhite;
end;


{--------------------------------------------------------------}
{ Get an Identifier }

function GetName: char;
begin
   if not IsAlpha(Look) then Expected('Name');
   GetName := UpCase(Look);
     GetChar;
     SkipWhite;
end;


{--------------------------------------------------------------}
{ Get a Number }

function GetNum: char;
begin
   if not IsDigit(Look) then Expected('Integer');
   GetNum := Look;
     GetChar;
     SkipWhite;
end;


{--------------------------------------------------------------}
{ Output a String with Tab }

procedure Emit(s: string);
begin
   Write(TAB, s);
end;


{--------------------------------------------------------------}
{ Output a String with Tab and CRLF }

procedure EmitLn(s: string);
begin
   Emit(s);
   WriteLn;
end;


{--------------------------------------------------------------}
{ Post a Label To Output }

procedure PostLabel(L: string);
begin
   WriteLn(L, ':');
end;


{--------------------------------------------------------------}
{ Load a Variable to the Primary Register }

procedure LoadVar(Name: char);
begin
     CheckVar(Name);
     EmitLn('MOVE ' + Name + '(PC),D0');
end;


{--------------------------------------------------------------}
{ Store the Primary Register }

procedure StoreVar(Name: char);
begin
     CheckVar(Name);
     EmitLn('LEA ' + Name + '(PC),A0');
   EmitLn('MOVE D0,(A0)')
end;


{--------------------------------------------------------------}
{ Initialize }

procedure Init;
var i: char;
begin
     GetChar;
     SkipWhite;
     for i := 'A' to 'Z' do
          ST[i] := ' ';
end;


{--------------------------------------------------------------}
{ Parse and Translate an Expression }
{ Vestigial Version }

procedure Expression;
begin
     LoadVar(GetName);
end;


{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }

procedure Assignment;
var Name: char;
begin
     Name := GetName;
     Match('=');
     Expression;
     StoreVar(Name);
end;


{--------------------------------------------------------------}

                             






{ Parse and Translate a Block of Statements }

procedure DoBlock;
begin
     while not(Look in ['e']) do begin
          Assignment;
          Fin;
   end;
end;


{--------------------------------------------------------------}
{ Parse and Translate a Begin-Block }

procedure BeginBlock;
begin
     Match('b');
     Fin;
     DoBlock;
     Match('e');
     Fin;
end;


{--------------------------------------------------------------}
{ Allocate Storage for a Variable }

procedure Alloc(N: char);
begin
     if InTable(N) then Duplicate(N);
   ST[N] := 'v';
     WriteLn(N, ':', TAB, 'DC 0');
end;


{--------------------------------------------------------------}
{ Parse and Translate a Data Declaration }

procedure Decl;
var Name: char;
begin
   Match('v');
     Alloc(GetName);
end;


{--------------------------------------------------------------}
{ Parse and Translate Global Declarations }

procedure TopDecls;
begin
     while Look <> 'b' do begin
      case Look of
        'v': Decl;
      else Abort('Unrecognized Keyword ' + Look);
          end;
          Fin;
     end;
end;


{--------------------------------------------------------------}
{ Main Program }

begin
     Init;
     TopDecls;
     BeginBlock;
end.
{--------------------------------------------------------------}


Note  that we DO have a symbol table, and there is logic to check
a variable name to make sure it's a legal one.    It's also worth
noting that I  have  included  the  code  you've  seen  before to
provide for white space  and  newlines.    Finally, note that the
main program is delimited, as usual, by BEGIN-END brackets.

Once you've copied  the  program  to  Turbo, the first step is to
compile it and make sure it  works.   Give it a few declarations,
and then a begin-block.  Try something like:


     va             (for VAR A)
     vb             (for VAR B)
     vc             (for VAR C)
     b              (for BEGIN)
     a=b
     b=c
     e.             (for END.)


As usual, you should also make some deliberate errors, and verify
that the program catches them correctly.


DECLARING A PROCEDURE

If you're satisfied that our little program works, then it's time
to  deal  with  the  procedures.  Since we haven't  talked  about
                             






parameters yet, we'll begin by considering  only  procedures that
have no parameter lists.

As a start, let's consider a simple program with a procedure, and
think about the code we'd like to see generated for it:


     PROGRAM FOO;
     .
     .
     PROCEDURE BAR;                     BAR:
     BEGIN                                   .
     .                                       .
     .                                       .
     END;                                    RTS

     BEGIN { MAIN PROGRAM }             MAIN:
     .                                       .
     .                                       .
     FOO;                                    BSR BAR
     .                                       .
     .                                       .
     END.                                    END MAIN


Here I've shown  the  high-order language constructs on the left,
and the desired assembler code on the right.  The first  thing to
notice  is that we certainly don't have  much  code  to  generate
here!  For  the  great  bulk  of  both the procedure and the main
program,  our existing constructs take care of  the  code  to  be
generated.

The key to dealing with the body of the procedure is to recognize
that  although a procedure may be quite  long,  declaring  it  is
really no different than  declaring  a  variable.   It's just one
more kind of declaration.  We can write the BNF:


     <declaration> ::= <data decl> | <procedure>


This means that it should be easy to modify TopDecl to  deal with
procedures.  What about the syntax of a procedure?   Well, here's
a suggested syntax, which is essentially that of Pascal:


     <procedure> ::= PROCEDURE <ident> <begin-block>


There is practically no code generation required, other than that
generated within the begin-block.    We need only emit a label at
the beginning of the procedure, and an RTS at the end.

Here's the required code:

{--------------------------------------------------------------}
{ Parse and Translate a Procedure Declaration }

procedure DoProc;
var N: char;
begin
     Match('p');
     N := GetName;
     Fin;
     if InTable(N) then Duplicate(N);
     ST[N] := 'p';
     PostLabel(N);
     BeginBlock;
     Return;
end;
{--------------------------------------------------------------}


Note that I've added a new code generation routine, Return, which
merely emits an RTS instruction.  The creation of that routine is
"left as an exercise for the student."

To  finish  this  version, add the following line within the Case
statement in DoBlock:


            'p': DoProc;


I should mention that  this  structure  for declarations, and the
BNF that drives it, differs from standard Pascal.  In  the Jensen
& Wirth  definition of Pascal, variable declarations, in fact ALL
kinds of declarations,  must  appear in a specific sequence, i.e.
labels,   constants,  types,  variables,  procedures,  and   main
program.  To  follow  such  a  scheme, we should separate the two
declarations, and have code in the main program something like


     DoVars;
     DoProcs;
     DoMain;


However,  most implementations of Pascal, including Turbo,  don't
require  that  order  and  let  you  freely  mix up  the  various
declarations,  as  long  as  you  still  don't  try to  refer  to
something  before  it's  declared.    Although  it  may  be  more
aesthetically pleasing to declare all the global variables at the
top of the  program,  it  certainly  doesn't do any HARM to allow
them to be sprinkled around.   In  fact,  it may do some GOOD, in
the  sense  that it gives you the  opportunity  to  do  a  little
rudimentary  information  hiding.     Variables  that  should  be
accessed only by the main program, for example,  can  be declared
just before it and will thus be inaccessible by the procedures.

OK, try this new version out.  Note that we  can  declare as many
procedures as we choose (as long  as  we don't run out of single-
character names!), and the  labels  and RTS's all come out in the
right places.

It's  worth  noting  here  that  I  do  _NOT_  allow  for  nested
procedures.   In TINY, all procedures must  be  declared  at  the
global level,  the  same  as  in  C.    There  has  been  quite a
discussion about this point in  the  Computer  Language  Forum of
CompuServe.  It turns out that there is a significant  penalty in
complexity that must be paid for the luxury of nested procedures.
What's  more,  this  penalty gets paid at RUN TIME, because extra
code must be added and executed every time a procedure is called.
I also happen to believe that nesting is not a good  idea, simply
on the grounds that I have seen too many abuses of the feature.
Before going on to the next step, it's also worth noting that the
"main program" as it stands  is incomplete, since it doesn't have
the label and END statement.  Let's fix that little oversight:


{--------------------------------------------------------------}
{ Parse and Translate a Main Program }

procedure DoMain;
begin
     Match('b');
     Fin;
     Prolog;
     DoBlock;
     Epilog;
end;
{--------------------------------------------------------------}



More information about the mud-dev-archive mailing list