Tutorial: Let's build a Compiler! - Part XI: Lexical Scan Revisited

Jon A. Lambert jlsysinc at ix.netcom.com
Mon Mar 2 16:39:06 CET 1998


                     LET'S BUILD A COMPILER!

                                By

                     Jack W. Crenshaw, Ph.D.

                           3 June 1989


                 Part XI: LEXICAL SCAN REVISITED


*****************************************************************
*                                                               *
*                        COPYRIGHT NOTICE                       *
*                                                               *
*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *
*                                                               *
*****************************************************************


INTRODUCTION

I've got some  good news and some bad news.  The bad news is that
this installment is  not  the  one  I promised last time.  What's
more, the one after this one won't be, either.

The good news is the reason for this installment:  I've  found  a
way  to simplify and improve the lexical  scanning  part  of  the
compiler.  Let me explain.


BACKGROUND

If  you'll remember, we talked at length  about  the  subject  of
lexical  scanners in Part VII, and I left you with a design for a
distributed scanner that I felt was about as simple  as  I  could
make it ... more than most that I've  seen  elsewhere.    We used
that idea in Part X.  The compiler structure  that  resulted  was
simple, and it got the job done.

Recently, though, I've begun  to  have  problems, and they're the
kind that send a message that you might be doing something wrong.

The  whole thing came to a head when I tried to address the issue
of  semicolons.  Several people have asked  me  about  them,  and
whether or not KISS will have them separating the statements.  My
intention has been NOT to  use semicolons, simply because I don't
like them and, as you can see, they have not proved necessary.

But I know that many of you, like me, have  gotten  used to them,
and so  I  set  out  to write a short installment to show you how
they could easily be added, if you were so inclined.

Well, it  turned  out  that  they weren't easy to add at all.  In
fact it was darned difficult.

I guess I should have  realized that something was wrong, because
of the issue  of  newlines.    In the last couple of installments
we've addressed that issue,  and  I've shown you how to deal with
newlines with a  procedure called, appropriately enough, NewLine.
In  TINY  Version  1.0,  I  sprinkled calls to this procedure  in
strategic spots in the code.

It  seems  that  every time I've addressed the issue of newlines,
though,  I've found it to be tricky,  and  the  resulting  parser
turned out to be quite fragile ... one addition or  deletion here
or  there and things tended to go to pot.  Looking back on it,  I
realize that  there  was  a  message  in  this that I just wasn't
paying attention to.

When I tried to add semicolons  on  top of the newlines, that was
the last straw.   I ended up with much too complex a solution.  I
began to realize that something fundamental had to change.

So,  in  a  way this installment will cause us to backtrack a bit
and revisit the issue of scanning all over again.    Sorry  about
that.  That's the price you pay for watching me  do  this in real
time.  But the new version is definitely an improvement, and will
serve us well for what is to come.

As  I said, the scanner we used in Part X was about as simple  as
one can get.  But anything can be improved.   The  new scanner is
more like the classical  scanner,  and  not  as simple as before.
But the overall  compiler  structure is even simpler than before.
It's also more robust, and easier to add  to  and/or  modify.   I
think that's worth the time spent in this digression.  So in this
installment, I'll be showing  you  the  new  structure.  No doubt
you'll  be  happy  to  know  that, while the changes affect  many
procedures, they aren't very profound  and so we lose very little
of what's been done so far.

Ironically, the new scanner  is  much  more conventional than the
old one, and is very much like the more generic scanner  I showed
you  earlier  in  Part VII.  Then I started trying to get clever,
and I almost clevered myself clean out of business.   You'd think
one day I'd learn: K-I-S-S!


THE PROBLEM

The problem begins to show  itself in procedure Block, which I've
reproduced below:


{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }

procedure Block;
begin
   Scan;
   while not(Token in ['e', 'l']) do begin
      case Token of
       'i': DoIf;
       'w': DoWhile;
       'R': DoRead;
       'W': DoWrite;
      else Assignment;
      end;
      Scan;
   end;
end;
{--------------------------------------------------------------}


As  you   can  see,  Block  is  oriented  to  individual  program
statements.  At each pass through  the  loop, we know that we are
at  the beginning of a statement.  We exit the block when we have
scanned an END or an ELSE.

But suppose that we see a semicolon instead.   The  procedure  as
it's shown above  can't  handle that, because procedure Scan only
expects and can only accept tokens that begin with a letter.

I  tinkered  around for quite awhile to come up with a  fix.    I
found many possible approaches, but none were very satisfying.  I
finally figured out the reason.

Recall that when we started with our single-character parsers, we
adopted a convention that the lookahead character would always be
prefetched.    That   is,   we  would  have  the  character  that
corresponds to our  current  position in the input stream fetched
into the global character Look, so that we could  examine  it  as
many  times  as  needed.    The  rule  we  adopted was that EVERY
recognizer, if it found its target token, would  advance  Look to
the next character in the input stream.

That simple and fixed convention served us very well when  we had
single-character tokens, and it still does.  It would make  a lot
of sense to apply the same rule to multi-character tokens.

But when we got into lexical scanning, I began  to  violate  that
simple rule.  The scanner of Part X  did  indeed  advance  to the
next token if it found an identifier or keyword, but it DIDN'T do
that if it found a carriage return, a whitespace character, or an
operator.

Now, that sort of mixed-mode  operation gets us into deep trouble
in procedure Block, because whether or not the  input  stream has
been advanced depends upon the kind of token we  encounter.    If
it's  a keyword or the target of  an  assignment  statement,  the
"cursor," as defined by the contents of Look,  has  been advanced
to  the next token OR to the beginning of whitespace.  If, on the
other  hand,  the  token  is  a  semicolon,  or if we have hit  a
carriage return, the cursor has NOT advanced.

Needless to say, we can add enough logic  to  keep  us  on track.
But it's tricky, and makes the whole parser very fragile.

There's a much  better  way,  and  that's just to adopt that same
rule that's worked so well before, to apply to TOKENS as  well as
single characters.  In other words, we'll prefetch tokens just as
we've always done for  characters.   It seems so obvious once you
think about it that way.

Interestingly enough, if we do things this way  the  problem that
we've had with newline characters goes away.  We  can  just  lump
them in as  whitespace  characters, which means that the handling
of  newlines  becomes  very trivial, and MUCH less prone to error
than we've had to deal with in the past.


THE SOLUTION

Let's  begin  to  fix  the  problem  by  re-introducing  the  two
procedures:

{--------------------------------------------------------------}
{ Get an Identifier }

procedure GetName;
begin
   SkipWhite;
   if Not IsAlpha(Look) then Expected('Identifier');
   Token := 'x';
   Value := '';
   repeat
      Value := Value + UpCase(Look);
      GetChar;
   until not IsAlNum(Look);
end;


{--------------------------------------------------------------}
{ Get a Number }

procedure GetNum;
begin
   SkipWhite;
   if not IsDigit(Look) then Expected('Number');
   Token := '#';
   Value := '';
   repeat
      Value := Value + Look;
      GetChar;
   until not IsDigit(Look);
end;
{--------------------------------------------------------------}


These two procedures are  functionally  almost  identical  to the
ones  I  showed  you in Part VII.  They each  fetch  the  current
token, either an identifier or a number, into  the  global string
Value.    They  also  set  the  encoded  version, Token,  to  the
appropriate code.  The input  stream is left with Look containing
the first character NOT part of the token.

We  can do the same thing  for  operators,  even  multi-character
operators, with a procedure such as:


{--------------------------------------------------------------}
{ Get an Operator }

procedure GetOp;
begin
   Token := Look;
   Value := '';
   repeat
      Value := Value + Look;
      GetChar;
   until IsAlpha(Look) or IsDigit(Look) or IsWhite(Look);
end;
{--------------------------------------------------------------}

Note  that  GetOp  returns,  as  its  encoded  token,  the  FIRST
character of the operator.  This is important,  because  it means
that we can now use that single character to  drive  the  parser,
instead of the lookahead character.

We need to tie these  procedures together into a single procedure
that can handle all three  cases.  The  following  procedure will
read any one of the token types and always leave the input stream
advanced beyond it:


{--------------------------------------------------------------}
{ Get the Next Input Token }

procedure Next;
begin
   SkipWhite;
   if IsAlpha(Look) then GetName
   else if IsDigit(Look) then GetNum
   else GetOp;
end;
{--------------------------------------------------------------}


***NOTE  that  here  I have put SkipWhite BEFORE the calls rather
than after.  This means that, in general, the variable  Look will
NOT have a meaningful value in it, and therefore  we  should  NOT
use it as a test value for parsing, as we have been doing so far.
That's the big departure from our normal approach.

Now, remember that before I was careful not to treat the carriage
return (CR) and line  feed  (LF) characters as white space.  This
was  because,  with  SkipWhite  called  as the last thing in  the
scanner, the encounter with  LF  would  trigger a read statement.
If we were on the last line of the program,  we  couldn't get out
until we input another line with a non-white  character.   That's
why I needed the second procedure, NewLine, to handle the CRLF's.

But now, with the call  to SkipWhite coming first, that's exactly
the behavior we want.    The  compiler  must know there's another
token coming or it wouldn't be calling Next.  In other words,  it
hasn't found the terminating  END  yet.  So we're going to insist
on more data until we find something.

All this means that we can greatly simplify both the  program and
the concepts, by treating CR and LF as whitespace characters, and
eliminating NewLine.  You  can  do  that  simply by modifying the
function IsWhite:


{--------------------------------------------------------------}
{ Recognize White Space }

function IsWhite(c: char): boolean;
begin
   IsWhite := c in [' ', TAB, CR, LF];
end;
{--------------------------------------------------------------}


We've already tried similar routines in Part VII,  but  you might
as well try these new ones out.  Add them to a copy of the Cradle
and call Next with the following main program:


{--------------------------------------------------------------}
{ Main Program }

begin
   Init;
   repeat
      Next;
      WriteLn(Token, ' ', Value);
   until Token = '.';
end.
{--------------------------------------------------------------}


Compile  it and verify that you can separate  a  program  into  a
series of tokens, and that you get the right  encoding  for  each
token.

This ALMOST works,  but  not  quite.    There  are  two potential
problems:    First,  in KISS/TINY almost all of our operators are
single-character operators.  The only exceptions  are  the relops
>=, <=, and <>.  It seems  a  shame  to  treat  all  operators as
strings and do a  string  compare,  when  only a single character
compare  will  almost  always  suffice.   Second, and  much  more
important, the  thing  doesn't  WORK  when  two  operators appear
together, as in (a+b)*(c+d).  Here the string following 'b' would
be interpreted as a single operator ")*(."

It's possible to fix that problem.  For example,  we  could  just
give GetOp a  list  of  legal  characters, and we could treat the
parentheses as different operator types  than  the  others.   But
this begins to get messy.

Fortunately, there's a  better  way that solves all the problems.
Since almost  all the operators are single characters, let's just
treat  them  that  way, and let GetOp get only one character at a
time.  This not only simplifies GetOp, but also speeds  things up
quite a  bit.    We  still have the problem of the relops, but we
were treating them as special cases anyway.

So here's the final version of GetOp:


{--------------------------------------------------------------}
{ Get an Operator }

procedure GetOp;
begin
   SkipWhite;
   Token := Look;
   Value := Look;
   GetChar;
end;
{--------------------------------------------------------------}


Note that I still give the string Value a value.  If you're truly
concerned about efficiency, you could leave this out.  When we're
expecting an operator, we will only be testing  Token  anyhow, so
the  value of the string won't matter.  But to me it seems to  be
good practice to give the thing a value just in case.

Try  this  new  version with some realistic-looking  code.    You
should  be  able  to  separate  any program into  its  individual
tokens, with the  caveat  that the two-character relops will scan
into two separate tokens.  That's OK ... we'll  parse  them  that
way.

Now, in Part VII the function of Next was combined with procedure
Scan,  which  also  checked every identifier against  a  list  of
keywords and encoded each one that was found.  As I  mentioned at
the time, the last thing we would want  to  do  is  to use such a
procedure in places where keywords  should not appear, such as in
expressions.  If we  did  that, the keyword list would be scanned
for every identifier appearing in the code.  Not good.

The  right  way  to  deal  with  that  is  to simply separate the
functions  of  fetching  tokens and looking for  keywords.    The
version of Scan shown below  does NOTHING but check for keywords.
Notice that it operates on the current token and does NOT advance
the input stream.


{--------------------------------------------------------------}
{ Scan the Current Identifier for Keywords }

procedure Scan;
begin
   if Token = 'x' then
      Token := KWcode[Lookup(Addr(KWlist), Value, NKW) + 1];
end;
{--------------------------------------------------------------}


There is one last detail.  In the compiler there are a few places
that we must  actually  check  the  string  value  of  the token.
Mainly, this  is done to distinguish between the different END's,
but there are a couple  of  other  places.    (I  should  note in
passing that we could always  eliminate the need for matching END
characters by encoding each one  to a different character.  Right
now we are definitely taking the lazy man's route.)

The  following  version  of MatchString takes the  place  of  the
character-oriented Match.  Note that, like Match, it DOES advance
the input stream.


{--------------------------------------------------------------}
{ Match a Specific Input String }

procedure MatchString(x: string);
begin
   if Value <> x then Expected('''' + x + '''');
   Next;
end;
{--------------------------------------------------------------}


FIXING UP THE COMPILER

Armed with these new scanner procedures, we can now begin  to fix
the compiler to  use  them  properly.   The changes are all quite
minor,  but  there  are quite a  few  places  where  changes  are
necessary.  Rather than  showing  you each place, I will give you
the general idea and then just give the finished product.


First of all, the code for procedure Block doesn't change, though
its function does:


{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }

procedure Block;
begin
   Scan;
   while not(Token in ['e', 'l']) do begin
      case Token of
       'i': DoIf;
       'w': DoWhile;
       'R': DoRead;
       'W': DoWrite;
      else Assignment;
      end;
      Scan;
   end;
end;
{--------------------------------------------------------------}


Remember that the new version of Scan doesn't  advance  the input
stream, it only  scans  for  keywords.   The input stream must be
advanced by each procedure that Block calls.

In general, we have to replace every test on Look with  a similar
test on Token.  For example:


{---------------------------------------------------------------}
{ Parse and Translate a Boolean Expression }

procedure BoolExpression;
begin
   BoolTerm;
   while IsOrOp(Token) do begin
      Push;
      case Token of
       '|': BoolOr;
       '~': BoolXor;
      end;
   end;
end;
{--------------------------------------------------------------}


In procedures like Add, we don't  have  to use Match anymore.  We
need only call Next to advance the input stream:


{--------------------------------------------------------------}
{ Recognize and Translate an Add }

procedure Add;
begin
   Next;
   Term;
   PopAdd;
end;
{-------------------------------------------------------------}


Control  structures  are  actually simpler.  We just call Next to
advance over the control keywords:


{---------------------------------------------------------------}
{ Recognize and Translate an IF Construct }

procedure Block; Forward;

procedure DoIf;
var L1, L2: string;
begin
   Next;
   BoolExpression;
   L1 := NewLabel;
   L2 := L1;
   BranchFalse(L1);
   Block;
   if Token = 'l' then begin
      Next;
      L2 := NewLabel;
      Branch(L2);
      PostLabel(L1);
      Block;
   end;
   PostLabel(L2);
   MatchString('ENDIF');
end;
{--------------------------------------------------------------}


That's about the extent of the REQUIRED changes.  In  the listing
of TINY  Version  1.1  below,  I've  also  made a number of other
"improvements" that  aren't really required.  Let me explain them
briefly:

 (1)  I've deleted the two procedures Prog and Main, and combined
      their functions into the main program.  They didn't seem to
      add  to program clarity ... in fact  they  seemed  to  just
      muddy things up a little.

 (2)  I've  deleted  the  keywords  PROGRAM  and  BEGIN  from the
      keyword list.  Each  one  only occurs in one place, so it's
      not necessary to search for it.

 (3)  Having been  bitten  by  an  overdose  of  cleverness, I've
      reminded myself that TINY  is  supposed  to be a minimalist
      program.  Therefore I've  replaced  the  fancy  handling of
      unary minus with the dumbest one I could think of.  A giant
      step backwards in code quality, but a  great simplification
      of the compiler.  KISS is the right place to use  the other
      version.

 (4)  I've added some  error-checking routines such as CheckTable
      and CheckDup, and  replaced  in-line code by calls to them.
      This cleans up a number of routines.

 (5)  I've  taken  the  error  checking  out  of  code generation
      routines  like Store, and put it in  the  parser  where  it
      belongs.  See Assignment, for example.

 (6)  There was an error in InTable and Locate  that  caused them
      to search all locations  instead  of  only those with valid
      data  in them.  They now search only  valid  cells.    This
      allows us to eliminate  the  initialization  of  the symbol
      table, which was done in Init.

 (7)  Procedure AddEntry now has two  arguments,  which  helps to
      make things a bit more modular.

 (8)  I've cleaned up the  code  for  the relational operators by
      the addition of the  new  procedures  CompareExpression and
      NextExpression.

 (9)  I fixed an error in the Read routine ... the  earlier value
      did not check for a valid variable name.


 CONCLUSION

The resulting compiler for  TINY  is given below.  Other than the
removal  of  the  keyword PROGRAM, it parses the same language as
before.    It's  just  a  bit cleaner, and more importantly  it's
considerably more robust.  I feel good about it.

The next installment will be another  digression:  the discussion
of  semicolons  and  such that got me into this mess in the first
place.  THEN we'll press on  into  procedures and types.  Hang in
there with me.  The addition of those features will go a long way
towards removing KISS from  the  "toy  language" category.  We're
getting very close to being able to write a serious compiler.


TINY VERSION 1.1


{--------------------------------------------------------------}
program Tiny11;

{--------------------------------------------------------------}
{ Constant Declarations }

const TAB = ^I;
      CR  = ^M;
      LF  = ^J;

      LCount: integer = 0;
      NEntry: integer = 0;


{--------------------------------------------------------------}
{ Type Declarations }

type Symbol = string[8];

     SymTab = array[1..1000] of Symbol;

     TabPtr = ^SymTab;


{--------------------------------------------------------------}
{ Variable Declarations }

var Look : char;             { Lookahead Character }
    Token: char;             { Encoded Token       }
    Value: string[16];       { Unencoded Token     }


const MaxEntry = 100;

var ST   : array[1..MaxEntry] of Symbol;
    SType: array[1..MaxEntry] of char;


{--------------------------------------------------------------}
{ Definition of Keywords and Token Types }

const NKW =   9;
      NKW1 = 10;

const KWlist: array[1..NKW] of Symbol =
              ('IF', 'ELSE', 'ENDIF', 'WHILE', 'ENDWHILE',
               'READ', 'WRITE', 'VAR', 'END');

const KWcode: string[NKW1] = 'xileweRWve';


{--------------------------------------------------------------}
{ Read New Character From Input Stream }

procedure GetChar;
begin
   Read(Look);
end;

{--------------------------------------------------------------}
{ Report an Error }

procedure Error(s: string);
begin
   WriteLn;
   WriteLn(^G, 'Error: ', s, '.');
end;


{--------------------------------------------------------------}
{ Report Error and Halt }

procedure Abort(s: string);
begin
   Error(s);
   Halt;
end;


{--------------------------------------------------------------}
{ Report What Was Expected }

procedure Expected(s: string);
begin
   Abort(s + ' Expected');
end;

{--------------------------------------------------------------}
{ Report an Undefined Identifier }

procedure Undefined(n: string);
begin
   Abort('Undefined Identifier ' + n);
end;


{--------------------------------------------------------------}
{ Report a Duplicate Identifier }

procedure Duplicate(n: string);
begin
   Abort('Duplicate Identifier ' + n);
end;


{--------------------------------------------------------------}
{ Check to Make Sure the Current Token is an Identifier }

procedure CheckIdent;
begin
   if Token <> 'x' then Expected('Identifier');
end;


{--------------------------------------------------------------}
{ Recognize an Alpha Character }

function IsAlpha(c: char): boolean;
begin
   IsAlpha := UpCase(c) in ['A'..'Z'];
end;


{--------------------------------------------------------------}
{ Recognize a Decimal Digit }

function IsDigit(c: char): boolean;
begin
   IsDigit := c in ['0'..'9'];
end;


{--------------------------------------------------------------}
{ Recognize an AlphaNumeric Character }

function IsAlNum(c: char): boolean;
begin
   IsAlNum := IsAlpha(c) or IsDigit(c);
end;


{--------------------------------------------------------------}
{ Recognize an Addop }

function IsAddop(c: char): boolean;
begin
   IsAddop := c in ['+', '-'];
end;


{--------------------------------------------------------------}
{ Recognize a Mulop }

function IsMulop(c: char): boolean;
begin
   IsMulop := c in ['*', '/'];
end;


{--------------------------------------------------------------}
{ Recognize a Boolean Orop }

function IsOrop(c: char): boolean;
begin
   IsOrop := c in ['|', '~'];
end;


{--------------------------------------------------------------}
{ Recognize a Relop }

function IsRelop(c: char): boolean;
begin
   IsRelop := c in ['=', '#', '<', '>'];
end;


{--------------------------------------------------------------}
{ Recognize White Space }

function IsWhite(c: char): boolean;
begin
   IsWhite := c in [' ', TAB, CR, LF];
end;


{--------------------------------------------------------------}
{ Skip Over Leading White Space }

procedure SkipWhite;
begin
   while IsWhite(Look) do
      GetChar;
end;


{--------------------------------------------------------------}
{ Table Lookup }

function Lookup(T: TabPtr; s: string; n: integer): integer;
var i: integer;
    found: Boolean;
begin
   found := false;
   i := n;
   while (i > 0) and not found do
      if s = T^[i] then
         found := true
      else
         dec(i);
   Lookup := i;
end;


{--------------------------------------------------------------}
{ Locate a Symbol in Table }
{ Returns the index of the entry.  Zero if not present. }

function Locate(N: Symbol): integer;
begin
   Locate := Lookup(@ST, n, NEntry);
end;


{--------------------------------------------------------------}
{ Look for Symbol in Table }

function InTable(n: Symbol): Boolean;
begin
   InTable := Lookup(@ST, n, NEntry) <> 0;
end;


{--------------------------------------------------------------}
{ Check to See if an Identifier is in the Symbol Table         }
{ Report an error if it's not. }


procedure CheckTable(N: Symbol);
begin
   if not InTable(N) then Undefined(N);
end;


{--------------------------------------------------------------}
{ Check the Symbol Table for a Duplicate Identifier }
{ Report an error if identifier is already in table. }


procedure CheckDup(N: Symbol);
begin
   if InTable(N) then Duplicate(N);
end;


{--------------------------------------------------------------}
{ Add a New Entry to Symbol Table }

procedure AddEntry(N: Symbol; T: char);
begin
   CheckDup(N);
   if NEntry = MaxEntry then Abort('Symbol Table Full');
   Inc(NEntry);
   ST[NEntry] := N;
   SType[NEntry] := T;
end;


{--------------------------------------------------------------}
{ Get an Identifier }

procedure GetName;
begin
   SkipWhite;
   if Not IsAlpha(Look) then Expected('Identifier');
   Token := 'x';
   Value := '';
   repeat
      Value := Value + UpCase(Look);
      GetChar;
   until not IsAlNum(Look);
end;


{--------------------------------------------------------------}
{ Get a Number }

procedure GetNum;
begin
   SkipWhite;
   if not IsDigit(Look) then Expected('Number');
   Token := '#';
   Value := '';
   repeat
      Value := Value + Look;
      GetChar;
   until not IsDigit(Look);
end;


{--------------------------------------------------------------}
{ Get an Operator }

procedure GetOp;
begin
   SkipWhite;
   Token := Look;
   Value := Look;
   GetChar;
end;


{--------------------------------------------------------------}
{ Get the Next Input Token }

procedure Next;
begin
   SkipWhite;
   if IsAlpha(Look) then GetName
   else if IsDigit(Look) then GetNum
   else GetOp;
end;


{--------------------------------------------------------------}
{ Scan the Current Identifier for Keywords }

procedure Scan;
begin
   if Token = 'x' then
      Token := KWcode[Lookup(Addr(KWlist), Value, NKW) + 1];
end;


{--------------------------------------------------------------}
{ Match a Specific Input String }

procedure MatchString(x: string);
begin
   if Value <> x then Expected('''' + x + '''');
   Next;
end;


{--------------------------------------------------------------}
{ Output a String with Tab }

procedure Emit(s: string);
begin
   Write(TAB, s);
end;


{--------------------------------------------------------------}
{ Output a String with Tab and CRLF }

procedure EmitLn(s: string);
begin
   Emit(s);
   WriteLn;
end;


{--------------------------------------------------------------}
{ Generate a Unique Label }

function NewLabel: string;
var S: string;
begin
   Str(LCount, S);
   NewLabel := 'L' + S;
   Inc(LCount);
end;


{--------------------------------------------------------------}
{ Post a Label To Output }

procedure PostLabel(L: string);
begin
   WriteLn(L, ':');
end;


{---------------------------------------------------------------}
{ Clear the Primary Register }

procedure Clear;
begin
   EmitLn('CLR D0');
end;


{---------------------------------------------------------------}
{ Negate the Primary Register }

procedure Negate;
begin
   EmitLn('NEG D0');
end;


{---------------------------------------------------------------}
{ Complement the Primary Register }

procedure NotIt;
begin
   EmitLn('NOT D0');
end;


{---------------------------------------------------------------}
{ Load a Constant Value to Primary Register }

procedure LoadConst(n: string);
begin
   Emit('MOVE #');
   WriteLn(n, ',D0');
end;


{---------------------------------------------------------------}
{ Load a Variable to Primary Register }

procedure LoadVar(Name: string);
begin
   if not InTable(Name) then Undefined(Name);
   EmitLn('MOVE ' + Name + '(PC),D0');
end;


{---------------------------------------------------------------}
{ Push Primary onto Stack }

procedure Push;
begin
   EmitLn('MOVE D0,-(SP)');
end;


{---------------------------------------------------------------}
{ Add Top of Stack to Primary }

procedure PopAdd;
begin
   EmitLn('ADD (SP)+,D0');
end;


{---------------------------------------------------------------}
{ Subtract Primary from Top of Stack }

procedure PopSub;
begin
   EmitLn('SUB (SP)+,D0');
   EmitLn('NEG D0');
end;


{---------------------------------------------------------------}
{ Multiply Top of Stack by Primary }

procedure PopMul;
begin
   EmitLn('MULS (SP)+,D0');
end;


{---------------------------------------------------------------}
{ Divide Top of Stack by Primary }

procedure PopDiv;
begin
   EmitLn('MOVE (SP)+,D7');
   EmitLn('EXT.L D7');
   EmitLn('DIVS D0,D7');
   EmitLn('MOVE D7,D0');
end;


{---------------------------------------------------------------}
{ AND Top of Stack with Primary }

procedure PopAnd;
begin
   EmitLn('AND (SP)+,D0');
end;


{---------------------------------------------------------------}
{ OR Top of Stack with Primary }

procedure PopOr;
begin
   EmitLn('OR (SP)+,D0');
end;


{---------------------------------------------------------------}
{ XOR Top of Stack with Primary }

procedure PopXor;
begin
   EmitLn('EOR (SP)+,D0');
end;


{---------------------------------------------------------------}
{ Compare Top of Stack with Primary }

procedure PopCompare;
begin
   EmitLn('CMP (SP)+,D0');
end;


{---------------------------------------------------------------}
{ Set D0 If Compare was = }

procedure SetEqual;
begin
   EmitLn('SEQ D0');
   EmitLn('EXT D0');
end;


{---------------------------------------------------------------}
{ Set D0 If Compare was != }

procedure SetNEqual;
begin
   EmitLn('SNE D0');
   EmitLn('EXT D0');
end;


{---------------------------------------------------------------}
{ Set D0 If Compare was > }

procedure SetGreater;
begin
   EmitLn('SLT D0');
   EmitLn('EXT D0');
end;


{---------------------------------------------------------------}
{ Set D0 If Compare was < }

procedure SetLess;
begin
   EmitLn('SGT D0');
   EmitLn('EXT D0');
end;


{---------------------------------------------------------------}
{ Set D0 If Compare was <= }

procedure SetLessOrEqual;
begin
   EmitLn('SGE D0');
   EmitLn('EXT D0');
end;


{---------------------------------------------------------------}
{ Set D0 If Compare was >= }

procedure SetGreaterOrEqual;
begin
   EmitLn('SLE D0');
   EmitLn('EXT D0');
end;


{---------------------------------------------------------------}
{ Store Primary to Variable }

procedure Store(Name: string);
begin
   EmitLn('LEA ' + Name + '(PC),A0');
   EmitLn('MOVE D0,(A0)')
end;


{---------------------------------------------------------------}
{ Branch Unconditional  }

procedure Branch(L: string);
begin
   EmitLn('BRA ' + L);
end;


{---------------------------------------------------------------}
{ Branch False }

procedure BranchFalse(L: string);
begin
   EmitLn('TST D0');
   EmitLn('BEQ ' + L);
end;


{---------------------------------------------------------------}
{ Read Variable to Primary Register }

procedure ReadIt(Name: string);
begin
   EmitLn('BSR READ');
   Store(Name);
end;


{ Write from Primary Register }

procedure WriteIt;
begin
   EmitLn('BSR WRITE');
end;


{--------------------------------------------------------------}
{ Write Header Info }

procedure Header;
begin
   WriteLn('WARMST', TAB, 'EQU $A01E');
end;


{--------------------------------------------------------------}
{ Write the Prolog }

procedure Prolog;
begin
   PostLabel('MAIN');
end;


{--------------------------------------------------------------}
{ Write the Epilog }

procedure Epilog;
begin
   EmitLn('DC WARMST');
   EmitLn('END MAIN');
end;


{---------------------------------------------------------------}
{ Allocate Storage for a Static Variable }

procedure Allocate(Name, Val: string);
begin
   WriteLn(Name, ':', TAB, 'DC ', Val);
end;


{---------------------------------------------------------------}
{ Parse and Translate a Math Factor }

procedure BoolExpression; Forward;

procedure Factor;
begin
   if Token = '(' then begin
      Next;
      BoolExpression;
      MatchString(')');
      end
   else begin
      if Token = 'x' then
         LoadVar(Value)
      else if Token = '#' then
         LoadConst(Value)
      else Expected('Math Factor');
      Next;
   end;
end;


{--------------------------------------------------------------}
{ Recognize and Translate a Multiply }

procedure Multiply;
begin
   Next;
   Factor;
   PopMul;
end;


{-------------------------------------------------------------}
{ Recognize and Translate a Divide }

procedure Divide;
begin
   Next;
   Factor;
   PopDiv;
end;


{---------------------------------------------------------------}
{ Parse and Translate a Math Term }

procedure Term;
begin
   Factor;
   while IsMulop(Token) do begin
      Push;
      case Token of
       '*': Multiply;
       '/': Divide;
      end;
   end;
end;


{--------------------------------------------------------------}
{ Recognize and Translate an Add }

procedure Add;
begin
   Next;
   Term;
   PopAdd;
end;


{-------------------------------------------------------------}
{ Recognize and Translate a Subtract }

procedure Subtract;
begin
   Next;
   Term;
   PopSub;
end;


{---------------------------------------------------------------}
{ Parse and Translate an Expression }

procedure Expression;
begin
   if IsAddop(Token) then
      Clear
   else
      Term;
   while IsAddop(Token) do begin
      Push;
      case Token of
       '+': Add;
       '-': Subtract;
      end;
   end;
end;


{---------------------------------------------------------------}
{ Get Another Expression and Compare }

procedure CompareExpression;
begin
   Expression;
   PopCompare;
end;


{---------------------------------------------------------------}
{ Get The Next Expression and Compare }

procedure NextExpression;
begin
   Next;
   CompareExpression;
end;


{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Equals" }

procedure Equal;
begin
   NextExpression;
   SetEqual;
end;


{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Less Than or Equal" }

procedure LessOrEqual;
begin
   NextExpression;
   SetLessOrEqual;
end;


{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Not Equals" }

procedure NotEqual;
begin
   NextExpression;
   SetNEqual;
end;


{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Less Than" }

procedure Less;
begin
   Next;
   case Token of
     '=': LessOrEqual;
     '>': NotEqual;
   else begin
           CompareExpression;
           SetLess;
        end;
   end;
end;


{---------------------------------------------------------------}
{ Recognize and Translate a Relational "Greater Than" }

procedure Greater;
begin
   Next;
   if Token = '=' then begin
      NextExpression;
      SetGreaterOrEqual;
      end
   else begin
      CompareExpression;
      SetGreater;
   end;
end;


{---------------------------------------------------------------}
{ Parse and Translate a Relation }


procedure Relation;
begin
   Expression;
   if IsRelop(Token) then begin
      Push;
      case Token of
       '=': Equal;
       '<': Less;
       '>': Greater;
      end;
   end;
end;


{---------------------------------------------------------------}
{ Parse and Translate a Boolean Factor with Leading NOT }

procedure NotFactor;
begin
   if Token = '!' then begin
      Next;
      Relation;
      NotIt;
      end
   else
      Relation;
end;


{---------------------------------------------------------------}
{ Parse and Translate a Boolean Term }

procedure BoolTerm;
begin
   NotFactor;
   while Token = '&' do begin
      Push;
      Next;
      NotFactor;
      PopAnd;
   end;
end;


{--------------------------------------------------------------}
{ Recognize and Translate a Boolean OR }

procedure BoolOr;
begin
   Next;
   BoolTerm;
   PopOr;
end;


{--------------------------------------------------------------}
{ Recognize and Translate an Exclusive Or }

procedure BoolXor;
begin
   Next;
   BoolTerm;
   PopXor;
end;


{---------------------------------------------------------------}
{ Parse and Translate a Boolean Expression }

procedure BoolExpression;
begin
   BoolTerm;
   while IsOrOp(Token) do begin
      Push;
      case Token of
       '|': BoolOr;
       '~': BoolXor;
      end;
   end;
end;


{--------------------------------------------------------------}
{ Parse and Translate an Assignment Statement }

procedure Assignment;
var Name: string;
begin
   CheckTable(Value);
   Name := Value;
   Next;
   MatchString('=');
   BoolExpression;
   Store(Name);
end;


{---------------------------------------------------------------}
{ Recognize and Translate an IF Construct }

procedure Block; Forward;

procedure DoIf;
var L1, L2: string;
begin
   Next;
   BoolExpression;
   L1 := NewLabel;
   L2 := L1;
   BranchFalse(L1);
   Block;
   if Token = 'l' then begin
      Next;
      L2 := NewLabel;
      Branch(L2);
      PostLabel(L1);
      Block;
   end;
   PostLabel(L2);
   MatchString('ENDIF');
end;


{--------------------------------------------------------------}
{ Parse and Translate a WHILE Statement }

procedure DoWhile;
var L1, L2: string;
begin
   Next;
   L1 := NewLabel;
   L2 := NewLabel;
   PostLabel(L1);
   BoolExpression;
   BranchFalse(L2);
   Block;
   MatchString('ENDWHILE');
   Branch(L1);
   PostLabel(L2);
end;


{--------------------------------------------------------------}
{ Read a Single Variable }

procedure ReadVar;
begin
   CheckIdent;
   CheckTable(Value);
   ReadIt(Value);
   Next;
end;


{--------------------------------------------------------------}
{ Process a Read Statement }

procedure DoRead;
begin
   Next;
   MatchString('(');
   ReadVar;
   while Token = ',' do begin
      Next;
      ReadVar;
   end;
   MatchString(')');
end;


{--------------------------------------------------------------}
{ Process a Write Statement }

procedure DoWrite;
begin
   Next;
   MatchString('(');
   Expression;
   WriteIt;
   while Token = ',' do begin
      Next;
      Expression;
      WriteIt;
   end;
   MatchString(')');
end;


{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }

procedure Block;
begin
   Scan;
   while not(Token in ['e', 'l']) do begin
      case Token of
       'i': DoIf;
       'w': DoWhile;
       'R': DoRead;
       'W': DoWrite;
      else Assignment;
      end;
      Scan;
   end;
end;


{--------------------------------------------------------------}
{ Allocate Storage for a Variable }

procedure Alloc;
begin
   Next;
   if Token <> 'x' then Expected('Variable Name');
   CheckDup(Value);
   AddEntry(Value, 'v');
   Allocate(Value, '0');
   Next;
end;


{--------------------------------------------------------------}
{ Parse and Translate Global Declarations }

procedure TopDecls;
begin
   Scan;
   while Token = 'v' do
      Alloc;
      while Token = ',' do
         Alloc;
end;


{--------------------------------------------------------------}
{ Initialize }

procedure Init;
begin
   GetChar;
   Next;
end;


{--------------------------------------------------------------}
{ Main Program }

begin
   Init;
   MatchString('PROGRAM');
   Header;
   TopDecls;
   MatchString('BEGIN');
   Prolog;
   Block;
   MatchString('END');
   Epilog;
end.
{--------------------------------------------------------------}
*****************************************************************
*                                                               *
*                        COPYRIGHT NOTICE                       *
*                                                               *
*   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *
*                                                               *
*****************************************************************

--
--/*\ Jon A. Lambert - TychoMUD     Internet:jlsysinc at ix.netcom.com /*\--
--/*\ Mud Server Developer's Page <http://www.netcom.com/~jlsysinc> /*\--
--/*\   "Everything that deceives may be said to enchant" - Plato   /*\--





More information about the mud-dev-archive mailing list