# To unbundle, run this file
echo porttour1
sed 's/.//' >porttour1 <<'//GO.SYSIN DD porttour1'
-.TL
-A Tour Through the Portable C Compiler
-.AU
-S. C. Johnson
-.AI
-.MH
-.SH
-Introduction
-.PP
-A C compiler has been implemented that has proved to be quite portable,
-serving as the basis for C compilers on roughly a dozen machines, including
-the Honeywell 6000, IBM 370, and Interdata 8/32.
-The compiler is highly compatible with the C language standard.
-.[
-Kernighan Ritchie Prentice 1978
-.]
-.PP
-Among the goals of this compiler are portability, high reliability,
-and the use of state-of-the-art techniques and tools wherever practical.
-Although the efficiency of the compiling process is not
-a primary goal, the compiler is efficient enough, and produces
-good enough code, to serve as a production compiler.
-.PP
-The language implemented is highly compatible with the current PDP-11 version of C.
-Moreover, roughly 75% of the compiler, including
-nearly all the syntactic and semantic routines, is machine independent.
-The compiler also serves as the major portion of the program
-.I lint ,
-described elsewhere.
-.[
-Johnson Lint Program Checker
-.]
-.PP
-A number of earlier attempts to make portable compilers are worth noting.
-While on CO-OP assignment to Bell Labs in 1973, Alan Snyder wrote a portable C
-compiler which was the basis of his Master's Thesis at M.I.T.
-.[
-Snyder Portable C Compiler
-.]
-This compiler was very slow and complicated, and contained a number of rather
-serious implementation difficulties;
-nevertheless, a number of Snyder's ideas appear in this work.
-.PP
-Most earlier portable compilers, including Snyder's, have proceeded by
-defining an intermediate language, perhaps based
-on three-address code or code for a stack machine,
-and writing a machine independent program to
-translate from the source code to this
-intermediate code.
-The intermediate code is then read by a second pass, and interpreted or compiled.
-This approach is elegant, and has a number of advantages, especially if the
-target machine is far removed from the host.
-It suffers from some disadvantages as well.
-Some constructions, like initialization and subroutine prologs,
-are difficult or expensive to express in a
-machine independent way that still allows them
-to be easily adapted to the target assemblers.
-Most of these approaches require a symbol table to be constructed
-in the second (machine dependent) pass, and/or
-require powerful target assemblers.
-Also, many conversion operators may be generated
-that have no effect on a given machine,
-but may be needed on others (for example, pointer to pointer
-conversions usually do nothing in C, but must be generated because
-there are some machines where they are significant).
-.PP
-For these reasons, the first pass of the portable compiler
-is not entirely machine independent.
-It contains some machine dependent features, such as initialization, subroutine
-prolog and epilog, certain storage allocation functions,
-code for the
-.I switch
-statement, and code to throw out unneeded conversion operators.
-.PP
-As a crude measure of the degree of
-portability actually achieved,
-the Interdata 8/32 C compiler has
-roughly 600 machine dependent lines of source out of 4600 in Pass 1, and 1000
-out of 3400 in Pass 2.
-In total, 1600 out of 8000, or 20%,
-of the total source is machine dependent
-(12% in Pass 1, 30% in Pass 2).
-These percentages can be expected to rise slightly as the
-compiler is tuned.
-The percentage of machine-dependent code for the IBM is 22%, for
-the Honeywell 25%.
-If the assembler format and structure were the same for all
-these machines, perhaps another 5-10% of the code
-would become machine independent.
-.PP
-These figures are sufficiently misleading as to be almost
-meaningless.
-A large fraction of the machine dependent code can be converted
-in a straightforward, almost mechanical way.
-On the other hand, a certain amount of the code requres hard
-intellectual effort to convert, since the algorithms
-embodied in this part of the code are typically complicated
-and machine dependent.
-.PP
-To summarize, however, if you need a C compiler written for a machine with
-a reasonable architecture, the compiler is already three quarters finished!
-.SH
-Overview
-.PP
-This paper discusses the structure and organization of the portable compiler.
-The intent is to give the big picture, rather than discussing the details of a particular machine
-implementation.
-After a brief overview and a discussion of the source file structure,
-the paper describes the major data structures, and then delves more
-closely into the two passes.
-Some of the theoretical work on which the compiler is based, and
-its application to the compiler, is discussed elsewhere.
-.[
-johnson portable theory practice
-.]
-One of the major design issues in any C compiler, the design of
-the calling sequence and stack frame, is the subject of a separate
-memorandum.
-.[
-johnson lesk ritchie calling sequence
-.]
-.PP
-The compiler consists of two passes,
-.I pass1
-and
-.I pass2 ,
-that together turn C source code into assembler code for the target
-machine.
-The two passes are preceded by a preprocessor, that
-handles the
-.B "#define"
-and
-.B "#include"
-statements, and related features (e.g.,
-.B #ifdef ,
-etc.).
-It is a nearly machine independent program, and will
-not be further discussed here.
-.PP
-The output of the preprocessor is a text file that is read as the standard
-input of the first pass.
-This
-produces as standard output another text file that becomes the standard
-input of the second pass.
-The second pass produces, as standard output, the desired assembler language source code.
-The preprocessor and the two passes
-all write error messages on the standard error file.
-Thus the compiler itself makes few demands on the I/O
-library support, aiding in the bootstrapping process.
-.PP
-Although the compiler is divided into two passes,
-this represents historical accident more than deep necessity.
-In fact, the compiler can optionally be loaded
-so that both passes operate in the same program.
-This ``one pass'' operation eliminates the overhead of
-reading and writing the intermediate file, so the compiler
-operates about 30% faster in this mode.
-It also occupies about 30% more space than the larger
-of the two component passes.
-.PP
-Because the compiler is fundamentally structured as two
-passes, even when loaded as one, this document primarily
-describes the two pass version.
-.PP
-The first pass does the lexical analysis, parsing, and
-symbol table maintenance.
-It also constructs parse trees for expressions,
-and keeps track of the types of the nodes in these trees.
-Additional code is devoted to initialization.
-Machine dependent portions of the first pass serve to
-generate subroutine prologs and epilogs,
-code for switches, and code for branches, label definitions,
-alignment operations,
-changes of location counter, etc.
-.PP
-The intermediate file is a text file organized into lines.
-Lines beginning with a right parenthesis are copied by the second
-pass directly to its output file, with the
-parenthesis stripped off.
-Thus, when the first pass produces assembly code, such as subroutine prologs,
-etc., each line is prefaced with a right parenthesis;
-the second pass passes these lines to through to the assembler.
-.PP
-The major job done by the second pass is generation of code
-for expressions.
-The expression parse trees produced in the first pass are
-written onto the
-intermediate file in Polish Prefix form:
-first, there is a line beginning with a period, followed by the source
-file line number and name on which the expression appeared (for debugging purposes).
-The successive lines represent the nodes of the parse tree,
-one node per line.
-Each line contains the node number, type, and
-any values (e.g., values of constants)
-that may appear in the node.
-Lines representing nodes with descendants are immediately
-followed by the left subtree of descendants, then the
-right.
-Since the number of descendants of any node is completely
-determined by the node number, there is no need to mark the end
-of the tree.
-.PP
-There are only two other line types in the intermediate file.
-Lines beginning with a left square bracket (`[') represent the beginning of
-blocks (delimited by { ... } in the C source);
-lines beginning with right square brackets (`]') represent the end of blocks.
-The remainder of these lines tell how much
-stack space, and how many register variables,
-are currently in use.
-.PP
-Thus, the second pass reads the intermediate files, copies the `)' lines,
-makes note of the information in the `[' and `]' lines,
-and devotes most of its effort to the
-`.' lines and their associated expression trees, turning them
-turns into assembly code to evaluate the expressions.
-.PP
-In the one pass version of the compiler, the expression
-trees that are built by the first pass have
-been declared to have room for the second pass
-information as well.
-Instead of writing the trees onto an intermediate file,
-each tree is transformed in place into an acceptable
-form for the code generator.
-The code generator then writes the result of compiling
-this tree onto the standard output.
-Instead of `[' and `]' lines in the intermediate file, the information is passed
-directly to the second pass routines.
-Assembly code produced by the first pass is simply written out,
-without the need for ')' at the head of each line.
-.SH
-The Source Files
-.PP
-The compiler source consists of 22 source files.
-Two files,
-.I manifest
-and
-.I macdefs ,
-are header files included with all other files.
-.I Manifest
-has declarations for the node numbers, types, storage classes,
-and other global data definitions.
-.I Macdefs
-has machine-dependent definitions, such as the size and alignment of the
-various data representations.
-Two machine independent header files,
-.I mfile1
-and
-.I mfile2 ,
-contain the data structure and manifest definitions
-for the first and second passes, respectively.
-In the second pass, a machine dependent header file,
-.I mac2defs ,
-contains declarations of register names, etc.
-.PP
-There is a file,
-.I common ,
-containing (machine independent) routines used in both passes.
-These include routines for allocating and freeing trees, walking over trees,
-printing debugging information, and printing error messages.
-There are two dummy files,
-.I comm1.c
-and
-.I comm2.c ,
-that simply include
-.I common
-within the scope of the appropriate pass1 or pass2 header files.
-When the compiler is loaded as a single pass,
-.I common
-only needs to be included once:
-.I comm2.c
-is not needed.
-.PP
-Entire sections of this document are devoted to the detailed structure of the
-passes.
-For the moment, we just give a brief description of the files.
-The first pass
-is obtained by compiling and loading
-.I scan.c ,
-.I cgram.c ,
-.I xdefs.c ,
-.I pftn.c ,
-.I trees.c ,
-.I optim.c ,
-.I local.c ,
-.I code.c ,
-and
-.I comm1.c .
-.I Scan.c
-is the lexical analyzer, which is used by
-.I cgram.c ,
-the result of applying
-.I Yacc
-.[
-Johnson Yacc Compiler cstr
-.]
-to the input grammar
-.I cgram.y .
-.I Xdefs.c
-is a short file of external definitions.
-.I Pftn.c
-maintains the symbol table, and does initialization.
-.I Trees.c
-builds the expression trees, and computes the node types.
-.I Optim.c
-does some machine independent optimizations on the expression trees.
-.I Comm1.c
-includes
-.I common ,
-that contains service routines common to the two passes of the compiler.
-All the above files are machine independent.
-The files
-.I local.c
-and
-.I code.c
-contain machine dependent code for generating subroutine
-prologs, switch code, and the like.
-.PP
-The second pass
-is produced by compiling and loading
-.I reader.c ,
-.I allo.c ,
-.I match.c ,
-.I comm1.c ,
-.I order.c ,
-.I local.c ,
-and
-.I table.c .
-.I Reader.c
-reads the intermediate file, and controls the major logic of the
-code generation.
-.I Allo.c
-keeps track of busy and free registers.
-.I Match.c
-controls the matching
-of code templates to subtrees of the expression tree to be compiled.
-.I Comm2.c
-includes the file
-.I common ,
-as in the first pass.
-The above files are machine independent.
-.I Order.c
-controls the machine dependent details of the code generation strategy.
-.I Local2.c
-has many small machine dependent routines,
-and tables of opcodes, register types, etc.
-.I Table.c
-has the code template tables,
-which are also clearly machine dependent.
-.SH
-Data Structure Considerations.
-.PP
-This section discusses the node numbers, type words, and expression trees, used
-throughout both passes of the compiler.
-.PP
-The file
-.I manifest
-defines those symbols used throughout both passes.
-The intent is to use the same symbol name (e.g., MINUS)
-for the given operator throughout the lexical analysis, parsing, tree building,
-and code generation phases;
-this requires some synchronization with the
-.I Yacc
-input file,
-.I cgram.y ,
-as well.
-.PP
-A token
-like MINUS may be seen in the lexical analyzer before it is known
-whether it is a unary or binary operator;
-clearly, it is necessary to know this by the time the parse tree
-is constructed.
-Thus, an operator (really a macro) called
-UNARY is provided, so that MINUS
-and UNARY MINUS are both distinct node numbers.
-Similarly, many binary operators exist in an assignment form (for example, \-=),
-and the operator ASG may be applied to such node names to generate new ones, e.g. ASG MINUS.
-.PP
-It is frequently desirable to know if a node represents a leaf (no descendants), a unary operator (one
-descendant) or a binary operator (two descendants).
-The macro
-.I optype(o)
-returns one of the manifest constants LTYPE, UTYPE, or BITYPE, respectively, depending
-on the node number
-.I o .
-Similarly,
-.I asgop(o)
-returns true if
-.I o
-is an assignment operator number (=, +=, etc. ), and
-.I logop(o)
-returns true if
-.I o
-is a relational or logical (&&, \(or\(or, or !) operator.
-.PP
-C has a rich typing structure, with a potentially infinite number of types.
-To begin with, there are the basic types: CHAR, SHORT, INT, LONG, the unsigned versions
-known as
-UCHAR, USHORT, UNSIGNED, ULONG, and FLOAT, DOUBLE,
-and finally
-STRTY (a structure), UNIONTY, and ENUMTY.
-Then, there are three operators that can be applied to types to make others:
-if
-.I t
-is a type, we may potentially have types
-.I "pointer to t" ,
-.I "function returning t" ,
-and
-.I "array of t's"
-generated from
-.I t .
-Thus, an arbitrary type in C consists of a basic type, and zero or more of these operators.
-.PP
-In the compiler, a type is represented by an unsigned integer; the rightmost four bits hold the basic
-type, and the remaining bits are divided into two-bit fields, containing
-0 (no operator), or one of the three operators
-described above.
-The modifiers are read right to left in the word, starting with the two-bit
-field adjacent to the basic type, until a field with 0 in it is reached.
-The macros PTR, FTN, and ARY represent the
-.I "pointer to" ,
-.I "function returning" ,
-and
-.I "array of"
-operators.
-The macro values are shifted so that they align with the first two-bit
-field; thus PTR+INT
-represents the type for an integer pointer, and
-.DS
-ARY + (PTR<<2) + (FTN<<4) + DOUBLE
-.DE
-represents the type of an array of pointers to functions returning doubles.
-.PP
-The type words are ordinarily manipulated by macros.
-If
-.I t
-is a type word,
-.I BTYPE(t)
-gives the basic type.
-.I ISPTR(t) ,
-.I ISARY(t) ,
-and
-.I ISFTN(t)
-ask if an object of this type is a pointer, array, or a function,
-respectively.
-.I MODTYPE(t,b)
-sets the basic type of
-.I t
-to
-.I b .
-.I DECREF(t)
-gives the type resulting from removing the first operator from
-.I t .
-Thus, if
-.I t
-is a pointer to
-.I t' ,
-a function returning
-.I t' ,
-or an array of
-.I t' ,
-then
-.I DECREF(t)
-would equal
-.I t' .
-.I INCREF(t)
-gives the type representing a pointer to
-.I t .
-Finally, there are operators for dealing with the unsigned types.
-.I ISUNSIGNED(t)
-returns true if
-.I t
-is one of the four basic unsigned types;
-in this case,
-.I DEUNSIGN(t)
-gives the associated `signed' type.
-Similarly,
-.I UNSIGNABLE(t)
-returns true if
-.I t
-is one of the four basic types that could become unsigned, and
-.I ENUNSIGN(t)
-returns the unsigned analogue of
-.I t
-in this case.
-.PP
-The other important global data structure is that of expression trees.
-The actual shapes of the nodes are given in
-.I mfile1
-and
-.I mfile2 .
-They are not the same in the two passes; the first pass nodes contain
-dimension and size information, while the second pass
-nodes contain register allocation information.
-Nevertheless, all nodes contain fields called
-.I op ,
-containing the node number,
-and
-.I type ,
-containing the type word.
-A function called
-.I talloc()
-returns a pointer to a new tree node.
-To free a node, its
-.I op
-field need merely be set to FREE.
-The other fields in the node will remain intact at least until the next allocation.
-.PP
-Nodes representing binary operators contain fields,
-.I left
-and
-.I right ,
-that contain pointers to the left and right descendants.
-Unary operator nodes have the
-.I left
-field, and a value field called
-.I rval .
-Leaf nodes, with no descendants, have two value fields:
-.I lval
-and
-.I rval .
-.PP
-At appropriate times, the function
-.I tcheck()
-can be called, to check that there are no busy nodes remaining.
-This is used as a compiler consistency check.
-The function
-.I tcopy(p)
-takes a pointer
-.I p
-that points to an expression tree, and returns a pointer
-to a disjoint copy of the tree.
-The function
-.I walkf(p,f)
-performs a postorder walk of the tree pointed to by
-.I p ,
-and applies the function
-.I f
-to each node.
-The function
-.I fwalk(p,f,d)
-does a preorder walk of the tree pointed to by
-.I p .
-At each node, it calls a function
-.I f ,
-passing to it the node pointer, a value passed down
-from its ancestor, and two pointers to values to be passed
-down to the left and right descendants (if any).
-The value
-.I d
-is the value passed down to the root.
-.a
-.I Fwalk
-is used for a number of tree labeling and debugging activities.
-.PP
-The other major data structure, the symbol table, exists only in pass one,
-and will be discussed later.
-.SH
-Pass One
-.PP
-The first pass does lexical analysis, parsing, symbol table maintenance,
-tree building, optimization, and a number of machine dependent things.
-This pass is largely machine independent, and the machine independent
-sections can be pretty successfully ignored.
-Thus, they will be only sketched here.
-.SH
-Lexical Analysis
-.PP
-The lexical analyzer is a conceptually simple routine that reads the input
-and returns the tokens of the C language as it encounters them:
-names, constants, operators, and keywords.
-The conceptual simplicity of this job is confounded a bit by several
-other simple jobs that unfortunately must go on simultaneously.
-These include
-.IP \(bu
-Keeping track of the current filename and line number, and occasionally
-setting this information as the result of preprocessor control lines.
-.IP \(bu
-Skipping comments.
-.IP \(bu
-Properly dealing with octal, decimal, hex, floating
-point, and character constants, as well as character strings.
-.PP
-To achieve speed, the program maintains several tables
-that are indexed into by character value, to tell
-the lexical analyzer what to do next.
-To achieve portability, these tables must be initialized
-each time the compiler is run, in order that the
-table entries reflect the local character set values.
-.SH
-Parsing
-.PP
-As mentioned above, the parser is generated by Yacc from the grammar on file
-.I cgram.y.
-The grammar is relatively readable, but contains some unusual features
-that are worth comment.
-.PP
-Perhaps the strangest feature of the grammar is the treatment of
-declarations.
-The problem is to keep track of the basic type
-and the storage class while interpreting the various
-stars, brackets, and parentheses that may surround a given name.
-The entire declaration mechanism must be recursive,
-since declarations may appear within declarations of
-structures and unions, or even within a
-.B sizeof
-construction
-inside a dimension in another declaration!
-.PP
-There are some difficulties in using a bottom-up parser,
-such as produced by Yacc, to handle constructions where a lot
-of left context information must be kept around.
-The problem is that the original PDP-11 compiler is top-down in
-implementation, and some of the semantics of C reflect this.
-In a top-down parser, the input rules are restricted
-somewhat, but one can naturally associate temporary
-storage with a rule at a very early stage in the recognition
-of that rule.
-In a bottom-up parser, there is more freedom in the
-specification of rules, but it is more difficult to know what
-rule is being matched until the entire rule is seen.
-The parser described by
-.I cgram.c
-makes effective use of
-the bottom-up parsing mechanism in some places (notably the treatment
-of expressions), but struggles against the
-restrictions in others.
-The usual result is that it is necessary to run a stack of values
-``on the side'', independent of the Yacc value stack,
-in order to be able to store and access information
-deep within inner constructions, where the relationship of the
-rules being recognized to the total picture is not yet clear.
-.PP
-In the case of declarations, the attribute information
-(type, etc.) for a declaration is carefully
-kept immediately to the left of the declarator
-(that part of the declaration involving the name).
-In this way, when it is time to declare the name, the
-name and the type information can be quickly brought together.
-The ``$0'' mechanism of Yacc is used to accomplish this.
-The result is not pretty, but it works.
-The storage class information changes more slowly,
-so it is kept in an external variable, and stacked if
-necessary.
-Some of the grammar could be considerably cleaned up by using
-some more recent features of Yacc, notably actions within
-rules and the ability to return multiple values for actions.
-.PP
-A stack is also used to keep track of the current location
-to be branched to when a
-.B break
-or
-.B continue
-statement is processed.
-.PP
-This use of external stacks dates from the time when Yacc did not permit
-values to be structures.
-Some, or most, of this use of external stacks
-could be eliminated by redoing the grammar to use the mechanisms now provided.
-There are some areas, however, particularly the processing of structure, union,
-and enum declarations, function
-prologs, and switch statement processing, when having
-all the affected data together in an array speeds later
-processing; in this case, use of external storage seems essential.
-.PP
-The
-.I cgram.y
-file also contains some small functions used as
-utility functions in the parser.
-These include routines for saving case values and labels in processing switches,
-and stacking and popping 
-values on the external stack described above.
-.SH
-Storage Classes
-.PP
-C has a finite, but fairly extensive, number of storage classes
-available.
-One of the compiler design decisions was to
-process the storage class information totally in the first pass;
-by the second pass, this information must have
-been totally dealt with.
-This means that all of the storage allocation must take place in the first
-pass, so that references to automatics and
-parameters can be turned into references to cells lying a certain
-number of bytes offset from certain machine registers.
-Much of this transformation is machine dependent, and strongly
-depends on the storage class.
-.PP
-The classes include EXTERN (for externally declared, but not defined variables),
-EXTDEF (for external definitions), and similar distinctions for
-USTATIC and STATIC, UFORTRAN and FORTRAN (for fortran functions) and
-ULABEL and LABEL.
-The storage classes REGISTER and AUTO are obvious, as
-are STNAME, UNAME, and ENAME (for structure, union, and enumeration
-tags), and the associated MOS, MOU, and MOE (for the members).
-TYPEDEF is treated as a storage class as well.
-There are two special storage classes: PARAM and SNULL.
-SNULL is used to distinguish the case where no explicit
-storage class has been given; before an entry is made
-in the symbol table the true storage class is discovered.
-Similarly, PARAM is used for the temporary entry in the symbol
-table made before the declaration of function parameters is completed.
-.PP
-The most complexity in the storage class process comes from
-bit fields.
-A separate storage class is kept for each width bit field;
-a
-.I k
-bit bit field has storage class
-.I k
-plus FIELD.
-This enables the size to be quickly recovered from the storage class.
-.SH
-Symbol Table Maintenance.
-.PP
-The symbol table routines do far more than simply enter
-names into the symbol table;
-considerable semantic processing and checking is done as well.
-For example, if a new declaration comes in, it must be checked
-to see if there is a previous declaration of the same symbol.
-If there is, there are many cases.
-The declarations may agree and be compatible (for example,
-an extern declaration can appear twice) in which case the
-new declaration is ignored.
-The new declaration may add information (such as an explicit array
-dimension) to an already present declaration.
-The new declaration may be different, but still correct (for example,
-an extern declaration of something may be entered,
-and then later the definition may be seen).
-The new declaration may be incompatible, but appear in an inner block;
-in this case, the old declaration is carefully hidden away, and
-the new one comes into force until the block is left.
-Finally, the declarations may be incompatible, and an error
-message must be produced.
-.PP
-A number of other factors make for additional complexity.
-The type declared by the user is not always the type
-entered into the symbol table (for example, if an formal parameter to a function
-is declared to be an array, C requires that this be changed into
-a pointer before entry in the symbol table).
-Moreover, there are various kinds of illegal types that
-may be declared which are difficult to
-check for syntactically (for example,
-a function returning an array).
-Finally, there is a strange feature in C that requires
-structure tag names and member names for structures and unions
-to be taken from a different logical symbol table than ordinary identifiers.
-Keeping track of which kind of name is involved is a bit of struggle
-(consider typedef names used within structure declarations, for example).
-.PP
-The symbol table handling routines have been rewritten a number of times to
-extend features, improve performance, and fix bugs.
-They address the above problems with reasonable effectiveness
-but a singular lack of grace.
-.PP
-When a name is read in the input, it is hashed, and the
-routine
-.I lookup
-is called, together with a flag which tells which symbol table
-should be searched (actually, both symbol tables are stored in one, and a flag
-is used to distinguish individual entries).
-If the name is found,
-.I lookup
-returns the index to the entry found; otherwise, it makes
-a new entry, marks it UNDEF (undefined), and returns the
-index of the new entry.
-This index is stored in the
-.I rval
-field of a NAME node.
-.PP
-When a declaration is being parsed, this NAME node is
-made part of a tree with UNARY MUL nodes for each *,
-LB nodes for each array descriptor (the right descendant
-has the dimension), and UNARY CALL nodes for each function
-descriptor.
-This tree is passed to the routine
-.I tymerge ,
-along with the attribute type of the whole declaration;
-this routine collapses the tree to a single node, by calling
-.I tyreduce ,
-and then modifies the type to reflect the overall
-type of the declaration.
-.PP
-Dimension and size information is stored in a table called
-.I dimtab .
-To properly describe a type in C, one needs not just the
-type information but also size information (for structures and
-enums) and dimension information (for arrays).
-Sizes and offsets are dealt with in the compiler by
-giving the associated indices into
-.I dimtab .
-.I Tymerge
-and
-.I tyreduce
-call
-.I dstash
-to put the discovered dimensions away into the
-.I dimtab
-array.
-.I Tymerge
-returns a pointer to a single node that contains
-the symbol table index in its
-.I rval
-field, and the size and dimension indices in
-fields
-.I csiz
-and
-.I cdim ,
-respectively.
-This information is properly considered part of the type in the first pass,
-and is carried around at all times.
-.PP
-To enter an element into the symbol table, the routine
-.I defid
-is called; it is handed a storage class, and a pointer
-to the node produced by
-.I tymerge .
-.I Defid
-calls
-.I fixtype ,
-which adjusts and checks the given type depending on the storage
-class, and converts null types appropriately.
-It then calls
-.I fixclass ,
-which does a similar job for the storage class;
-it is here, for example, that
-register declarations are either allowed or changed
-to auto.
-.PP
-The new declaration is now compared against an older one,
-if present, and several pages of validity checks performed.
-If the definitions are compatible, with possibly some added information,
-the processing is straightforward.
-If the definitions differ, the block levels of the
-current and the old declaration are compared.
-The current block level is kept in
-.I blevel ,
-an external variable; the old declaration level is kept in the symbol table.
-Block level 0 is for external declarations, 1 is for arguments to functions,
-and 2 and above are blocks within a function.
-If the current block level is the same as the old declaration, an error
-results.
-If the current block level is higher, the new declaration overrides the old.
-This is done by marking the old symbol table entry ``hidden'', and making
-a new entry, marked ``hiding''.
-.I Lookup
-will skip over hidden entries.
-When a block is left, the symbol table is searched,
-and any entries defined in that block are destroyed; if
-they hid other entries, the old entries are ``unhidden''.
-.PP
-This nice block structure is warped a bit because labels
-do not follow the block structure rules (one can do a
-.B goto
-into
-a block, for example);
-default definitions of functions in inner blocks also persist clear out to the outermost scope.
-This implies that cleaning up the symbol table after block exit is more
-subtle than it might first seem.
-.PP
-For successful new definitions,
-.I defid
-also initializes a ``general purpose'' field,
-.I offset ,
-in the symbol table.
-It contains the stack offset for automatics and parameters, the register number
-for register variables, the bit offset into the structure for
-structure members, and
-the internal label number for static variables and labels.
-The offset field is set by
-.I falloc
-for bit fields, and
-.I dclstruct
-for structures and unions.
-.PP
-The symbol table entry itself thus contains
-the name, type word, size and dimension offsets,
-offset value, and declaration block level.
-It also has a field of flags, describing what symbol table the
-name is in, and whether the entry is hidden, or hides another.
-Finally, a field gives the line number of the last use,
-or of the definition, of the name.
-This is used mainly for diagnostics, but is useful to
-.I lint
-as well.
-.PP
-In some special cases, there is more than the above amount of information kept
-for the use of the compiler.
-This is especially true with structures; for use in initialization,
-structure declarations must have access to a list of the members of the
-structure.
-This list is also kept in
-.I dimtab .
-Because a structure can be mentioned long before the
-members are known, it is necessary to have another
-level of indirection in the table.
-The two words following the
-.I csiz
-entry in
-.I dimtab
-are used to hold the alignment of the structure, and
-the index in dimtab of the list of members.
-This list contains the symbol table indices for the structure members, terminated by a
-\-1.
-.SH
-Tree Building
-.PP
-The portable compiler transforms expressions
-into expression trees.
-As the parser recognizes each rule making up an
-expression,
-it calls
-.I buildtree
-which is given an operator number, and pointers to the
-left and right descendants.
-.I Buildtree
-first examines the left and right descendants,
-and, if they are both constants, and the operator
-is appropriate, simply does the constant
-computation at compile time, and returns
-the result as a constant.
-Otherwise,
-.I buildtree
-allocates a node for the head of the tree,
-attaches the descendants to it, and ensures that
-conversion operators are generated if needed, and that
-the type of the new node is consistent with the types
-of the operands.
-There is also a considerable amount of semantic complexity here;
-many combinations of types are illegal, and the portable
-compiler makes a strong effort to check
-the legality of expression types completely.
-This is done both for
-.I lint
-purposes, and to prevent such semantic errors
-from being passed through to the code generator.
-.PP
-The heart of
-.I buildtree
-is a large table, accessed by the routine
-.I opact .
-This routine maps the types of the left and right
-operands into a rather smaller set of descriptors, and then
-accesses a table (actually encoded in a switch statement) which
-for each operator and pair of types causes
-an action to be returned.
-The actions are logical or's of a number of
-separate actions, which may be
-carried out by
-.I buildtree .
-These component actions may include
-checking the left side to ensure that it
-is an lvalue (can be stored into),
-applying a type conversion to the left or right operand,
-setting the type of the new node to
-the type of the left or right operand, calling various
-routines to balance the types of the left and right operands,
-and suppressing the ordinary conversion
-of arrays and function operands to pointers.
-An important operation is OTHER, which causes
-some special code to be invoked
-in
-.I buildtree ,
-to handle issues which are unique to a particular operator.
-Examples of this are
-structure and union reference
-(actually handled by
-the routine
-.I stref ),
-the building of NAME, ICON, STRING and FCON (floating point constant) nodes,
-unary * and &, structure assignment, and calls.
-In the case of unary * and &,
-.I buildtree
-will cancel a * applied to a tree, the top node of which
-is &, and conversely.
-.PP
-Another special operation is
-PUN; this
-causes the compiler to check for type mismatches,
-such as intermixing pointers and integers.
-.PP
-The treatment of conversion operators is still a rather
-strange area of the compiler (and of C!).
-The recent introduction of type casts has only confounded this
-situation.
-Most of the conversion operators are generated by
-calls to
-.I tymatch
-and
-.I ptmatch ,
-both of which are given a tree, and asked to make the
-operands agree in type.
-.I Ptmatch
-treats the case where one of the operands is a pointer;
-.I tymatch
-treats all other cases.
-Where these routines have decided on the
-proper type for an operand, they call
-.I makety ,
-which is handed a tree, and a type word, dimension offset, and size offset.
-If necessary, it inserts a conversion operation to make the
-types correct.
-Conversion operations are never inserted on the left side of
-assignment operators, however.
-There are two conversion operators used;
-PCONV, if the conversion is to a non-basic type (usually a pointer),
-and
-SCONV, if the conversion is to a basic type (scalar).
-.PP
-To allow for maximum flexibility, every node produced by
-.I buildtree
-is given to a machine dependent routine,
-.I clocal ,
-immediately after it is produced.
-This is to allow more or less immediate rewriting of those
-nodes which must be adapted for the local machine.
-The conversion operations are given to
-.I clocal
-as well; on most machines, many of these
-conversions do nothing, and should be thrown away
-(being careful to retain the type).
-If this operation is done too early,
-however,
-later calls to
-.I buildtree
-may get confused about correct type of the
-subtrees;
-thus
-.I clocal
-is given the conversion ops only after the entire tree is built.
-This topic will be dealt with in more detail later.
-.SH
-Initialization
-.PP
-Initialization is one of the messier areas in the portable compiler.
-The only consolation is that most of the mess takes place
-in the machine independent part, where it
-is may be safely ignored by the implementor of the
-compiler for a particular machine.
-.PP
-The basic problem is that the semantics of initialization
-really calls for a co-routine structure; one collection
-of programs reading constants from the input stream, while another,
-independent set of programs places these constants into the
-appropriate spots in memory.
-The dramatic differences in the local assemblers also
-come to the fore here.
-The parsing problems are dealt with by keeping a rather
-extensive stack containing the current
-state of the initialization; the assembler
-problems are dealt with by having a fair number of machine dependent routines.
-.PP
-The stack contains the symbol table number,
-type, dimension index, and size index for the current identifier
-being initialized.
-Another entry has the offset, in bits, of the beginning
-of the current identifier.
-Another entry keeps track of how many elements have been seen,
-if the current identifier is an array.
-Still another entry keeps track of the current
-member of a structure being initialized.
-Finally, there is an entry containing flags
-which keep track of the current state of the initialization process
-(e.g., tell if a } has been seen for the current identifier.)
-.PP
-When an initialization begins, the routine
-.I beginit
-is called; it handles the alignment restrictions, if
-any, and calls
-.I instk
-to create the stack entry.
-This is done by first making an entry on the top of the stack for the item
-being initialized.
-If the top entry is an array, another entry is made on the stack
-for the first element.
-If the top entry is a structure, another entry is made on the
-stack for the first member of the structure.
-This continues until the top element of the stack is a scalar.
-.I Instk
-then returns, and the parser begins collecting initializers.
-.PP
-When a constant is obtained, the routine
-.I doinit
-is called; it examines the stack, and does whatever
-is necessary to assign the current constant to the
-scalar on the top of the stack.
-.I gotscal
-is then called, which rearranges the stack so that the
-next scalar to be initialized gets placed on top of the stack.
-This process continues until the end of the initializers;
-.I endinit
-cleans up.
-If a { or } is encountered in the
-string of initializers, it is handled by calling
-.I ilbrace
-or
-.I irbrace ,
-respectively.
-.PP
-A central issue is the treatment of the ``holes'' that
-arise as a result of alignment restrictions or explicit
-requests for holes in bit fields.
-There is a global variable,
-.I inoff ,
-which contains the current offset in the initialization
-(all offsets in the first pass of the compiler are in bits).
-.I Doinit
-figures out from the top entry on the stack the expected
-bit offset of the next identifier; it calls the
-machine dependent
-routine
-.I inforce
-which, in a machine dependent way, forces
-the assembler to
-set aside space if need be so that the
-next scalar seen will go into the appropriate
-bit offset position.
-The scalar itself is passed to one of the machine dependent
-routines
-.I fincode 
-(for floating point initialization),
-.I incode
-(for fields, and other initializations less than an int in
-size),
-and
-.I cinit
-(for all other initializations).
-The size is passed to all these routines, and it is up to
-the machine dependent routines to ensure that the
-initializer occupies exactly the right size.
-.PP
-Character strings represent a bit of an exception.
-If a character string is seen as the initializer for
-a pointer, the characters making up the string must
-be put out under a different location counter.
-When the lexical analyzer sees the quote at the head
-of a character string, it returns the token STRING,
-but does not do anything with the contents.
-The parser calls
-.I getstr ,
-which sets up the appropriate location counters
-and flags, and calls
-.I lxstr
-to read and process the contents of the string.
-.PP
-If the string is being used to initialize a character array,
-.I lxstr
-calls
-.I putbyte ,
-which in effect simulates
-.I doinit
-for each character read.
-If the string is used to initialize a character pointer,
-.I lxstr
-calls a machine dependent routine,
-.I bycode ,
-which stashes away each character.
-The pointer to this string is then returned,
-and processed normally by
-.I doinit .
-.PP
-The null at the end of the string is treated as if it
-were read explicitly by
-.I lxstr .
-.SH
-Statements
-.PP
-The first pass addresses four main areas; declarations, expressions, initialization, and statements.
-The statement processing is relatively simple; most of it is carried out in the
-parser directly.
-Most of the logic is concerned with allocating
-label numbers, defining the labels, and branching appropriately.
-An external symbol,
-.I reached ,
-is 1 if a statement can be reached, 0 otherwise; this is
-used to do a bit of simple flow analysis as the program is being parsed,
-and also to avoid generating the subroutine return sequence if the subroutine
-cannot ``fall through'' the last statement.
-.PP
-Conditional branches are handled by generating an expression
-node, CBRANCH,
-whose left descendant is the conditional expression and the
-right descendant is an ICON node containing the internal label
-number to be branched to.
-For efficiency, the semantics are that
-the label is gone to if the condition is
-.I false .
-.PP
-The switch statement is 
-compiled by collecting the case entries, and an indication as to whether
-there is a default case;
-an internal label number is generated for each of these,
-and remembered in a big array.
-The expression comprising the value to be switched on is
-compiled when the switch keyword is encountered,
-but the expression tree is headed by
-a special node, FORCE, which tells the code generator to
-put the expression value into a special distinguished
-register (this same mechanism is used for processing the
-return statement).
-When the end of the switch block is reached, the array
-containing the case values is sorted, and checked for
-duplicate entries (an error); if all is
-correct, the machine dependent routine
-.I genswitch
-is called, with this array of labels and values in increasing order.
-.I Genswitch
-can assume that the value to be tested is already in the
-register which is the usual integer return value register.
-.SH
-Optimization
-.PP
-There is a machine independent file,
-.I optim.c ,
-which contains a relatively short optimization routine,
-.I optim .
-Actually the word optimization is something of a misnomer;
-the results are not optimum, only improved, and the
-routine is in fact not optional; it must
-be called for proper operation of the compiler.
-.PP
-.I Optim
-is called after an expression tree is built, but
-before the code generator is called.
-The essential part of its job is to call
-.I clocal
-on the conversion operators.
-On most machines, the treatment of
-& is also essential:
-by this time in the processing, the only node which
-is a legal descendant of & is NAME.
-(Possible descendants of * have been eliminated by
-.I buildtree.)
-The address of a static name is, almost by definition, a
-constant, and can be represented by an ICON node on most machines
-(provided that the loader has enough power).
-Unfortunately, this is not universally true; on some machine, such as the IBM 370,
-the issue of addressability rears its ugly head;
-thus, before turning a NAME node into an ICON node,
-the machine dependent function
-.I andable
-is called.
-.PP
-The optimization attempts of
-.I optim
-are currently quite limited.
-It is primarily concerned with improving the behavior of
-the compiler with operations one of whose arguments is a constant.
-In the simplest case, the constant is placed on the right if the
-operation is commutative.
-The compiler also makes a limited search for expressions
-such as
-.DS
-.I "( x + a ) + b"
-.DE
-where
-.I a
-and
-.I b
-are constants, and attempts to combine
-.I a
-and
-.I b
-at compile time.
-A number of special cases are also examined;
-additions of 0 and multiplications by 1 are removed,
-although the correct processing of these cases to get
-the type of the resulting tree correct is
-decidedly nontrivial.
-In some cases, the addition or multiplication must be replaced by
-a conversion op to keep the types from becoming
-fouled up.
-Finally, in cases where a relational operation is being done,
-and one operand is a constant, the operands are permuted, and the operator altered, if necessary,
-to put the constant on the right.
-Finally, multiplications by a power of 2 are changed to shifts.
-.PP
-There are dozens of similar optimizations that can be, and should be,
-done.
-It seems likely that this routine will be expanded in the relatively near future.
-.SH
-Machine Dependent Stuff
-.PP
-A number of the first pass machine dependent routines have been discussed above.
-In general, the routines are short, and easy to adapt from
-machine to machine.
-The two exceptions to this general rule are
-.I clocal
-and
-the function prolog and epilog generation routines,
-.I bfcode
-and
-.I efcode .
-.PP
-.I Clocal
-has the job of rewriting, if appropriate and desirable,
-the nodes constructed by
-.I buildtree .
-There are two major areas where this
-is important;
-NAME nodes and conversion operations.
-In the case of NAME nodes,
-.I clocal
-must rewrite the NAME node to reflect the
-actual physical location of the name in the machine.
-In effect, the NAME node must be examined, the symbol table
-entry found (through the
-.I rval
-field of the node),
-and, based on the storage class of the node,
-the tree must be rewritten.
-Automatic variables and parameters are typically
-rewritten by treating the reference to the variable as
-a structure reference, off the register which
-holds the stack or argument pointer;
-the
-.I stref
-routine is set up to be called in this way, and to
-build the appropriate tree.
-In the most general case, the tree consists
-of a unary * node, whose descendant is
-a + node, with the stack or argument register as left operand,
-and a constant offset as right operand.
-In the case of LABEL and internal static nodes, the
-.I rval
-field is rewritten to be the negative of the internal
-label number; a negative
-.I rval 
-field is taken to be an internal label number.
-Finally, a name of class REGISTER must be converted into a REG node,
-and the
-.I rval
-field replaced by the register number.
-In fact, this part of the
-.I clocal
-routine is nearly machine independent; only for machines
-with addressability problems (IBM 370 again!) does it
-have to be noticeably different,
-.a
-.PP
-The conversion operator treatment is rather tricky.
-It is necessary to handle the application of conversion operators
-to constants in
-.I clocal ,
-in order that all constant expressions can have their values known
-at compile time.
-In extreme cases, this may mean that some simulation of the
-arithmetic of the target machine might have to be done in a
-cross-compiler.
-In the most common case,
-conversions from pointer to pointer do nothing.
-For some machines, however, conversion from byte pointer to short or long
-pointer might require a shift or rotate operation, which would
-have to be generated here.
-.PP
-The extension of the portable compiler to machines where the size of a pointer
-depends on its type would be straightforward, but has not yet been done.
-.PP
-The other major machine dependent issue involves the subroutine prolog and epilog
-generation.
-The hard part here is the design of the stack frame
-and calling sequence; this design issue is discussed elsewhere.
-.[
-Johnson Lesk Ritchie calling sequence
-.]
-The routine
-.I bfcode
-is called with the number of arguments
-the function is defined with, and
-an array containing the symbol table indices of the
-declared parameters.
-.I Bfcode
-must generate the code to establish the new stack frame,
-save the return address and previous stack pointer
-value on the stack, and save whatever
-registers are to be used for register variables.
-The stack size and the number of register variables is not
-known when
-.I bfcode
-is called, so these numbers must be
-referred to by assembler constants, which are
-defined when they are known (usually in the second pass,
-after all register variables, automatics, and temporaries have been seen).
-The final job is to find those parameters which may have been declared
-register, and generate the code to initialize
-the register with the value passed on the stack.
-Once again, for most machines, the general logic of
-.I bfcode
-remains the same, but the contents of the
-.I printf
-calls in it will change from machine to machine.
-.I efcode
-is rather simpler, having just to generate the default
-return at the end of a function.
-This may be nontrivial in the case of a function returning a structure or union, however.
-.PP
-There seems to be no really good place to discuss structures and unions, but
-this is as good a place as any.
-The C language now supports structure assignment,
-and the passing of structures as arguments to functions,
-and the receiving of structures back from functions.
-This was added rather late to C, and thus to the portable compiler.
-Consequently, it fits in less well than the older features.
-Moreover, most of the burden of making these features work is
-placed on the machine dependent code.
-.PP
-There are both conceptual and practical problems.
-Conceptually, the compiler is structured around
-the idea that to compute something, you put it into
-a register and work on it.
-This notion causes a bit of trouble on some machines (e.g., machines with 3-address opcodes), but
-matches many machines quite well.
-Unfortunately, this notion breaks down with structures.
-The closest that one can come is to keep the addresses of the
-structures in registers.
-The actual code sequences used to move structures vary from the
-trivial (a multiple byte move) to the horrible (a
-function call), and are very machine dependent.
-.PP
-The practical problem is more painful.
-When a function returning a structure is called, this function
-has to have some place to put the structure value.
-If it places it on the stack, it has difficulty popping its stack frame.
-If it places the value in a static temporary, the routine fails to be
-reentrant.
-The most logically consistent way of implementing this is for the
-caller to pass in a pointer to a spot where the called function
-should put the value before returning.
-This is relatively straightforward, although a bit tedious, to implement,
-but means that the caller must have properly declared
-the function type, even if the value is never used.
-On some machines, such as the Interdata 8/32, the return value
-simply overlays the argument region (which on the 8/32 is part
-of the caller's stack frame).
-The caller takes care of leaving enough room if the returned value is larger
-than the arguments.
-This also assumes that the caller know and declares the
-function properly.
-.PP
-The PDP-11 and the VAX have stack hardware which is used in function calls and returns;
-this makes it very inconvenient to
-use either of the above mechanisms.
-In these machines, a static area within the called functionis allocated, and
-the function return value is copied into it on return; the function
-returns the address of that region.
-This is simple to implement, but is non-reentrant.
-However, the function can now be called as a subroutine
-without being properly declared, without the disaster which would otherwise ensue.
-No matter what choice is taken, the convention is that the function
-actually returns the address of the return structure value.
-.PP
-In building expression trees, the portable compiler takes a bit for granted about
-structures.
-It assumes that functions returning structures
-actually return a pointer to the structure, and it assumes that
-a reference to a structure is actually a reference to its address.
-The structure assignment operator is rebuilt so that the left
-operand is the structure being assigned to, but the
-right operand is the address of the structure being assigned;
-this makes it easier to deal with
-.DS
-.I "a = b = c"
-.DE
-and similar constructions.
-.PP
-There are four special tree nodes associated with these
-operations:
-STASG (structure assignment), STARG (structure argument
-to a function call), and STCALL and UNARY STCALL
-(calls of a function with nonzero and zero arguments, respectively).
-These four nodes are unique in that the size and alignment information, which can be determined by
-the type for all other objects in C, 
-must be known to carry out these operations; special
-fields are set aside in these nodes to contain
-this information, and special
-intermediate code is used to transmit this information.
-.SH
-First Pass Summary
-.PP
-There are may other issues which have been ignored here,
-partly to justify the title ``tour'', and partially
-because they have seemed to cause little trouble.
-There are some debugging flags
-which may be turned on, by giving the compiler's first pass
-the argument
-.DS
-\-X[flags]
-.DE
-Some of the more interesting flags are
-\-Xd for the defining and freeing of symbols,
-\-Xi for initialization comments, and
-\-Xb for various comments about the building of trees.
-In many cases, repeating the flag more than once gives more information;
-thus,
-\-Xddd gives more information than \-Xd.
-In the two pass version of the compiler, the
-flags should not be set when the output is sent to the second
-pass, since the debugging output and the intermediate code both go onto the standard
-output.
-.PP
-We turn now to consideration of the second pass.
//GO.SYSIN DD porttour1
echo porttour2
sed 's/.//' >porttour2 <<'//GO.SYSIN DD porttour2'
-.SH
-Pass Two
-.PP
-Code generation is far less well understood than
-parsing or lexical analysis, and for this reason
-the second pass is far harder to discuss in a file by file manner.
-A great deal of the difficulty is in understanding the issues
-and the strategies employed to meet them.
-Any particular function is likely to be reasonably straightforward.
-.PP
-Thus, this part of the paper will concentrate a good deal on the broader
-aspects of strategy in the code generator,
-and will not get too intimate with the details.
-.SH
-Overview.
-.PP
-It is difficult to organize a code generator to be flexible enough to
-generate code for a large number of machines,
-and still be efficient for any one of them.
-Flexibility is also important when it comes time to tune
-the code generator to improve the output code quality.
-On the other hand, too much flexibility can lead to semantically
-incorrect code, and potentially a combinatorial explosion in the
-number of cases to be considered in the compiler.
-.PP
-One goal of the code generator is to have a high degree of correctness.
-It is very desirable to have the compiler detect its own inability to
-generate correct code, rather than to produce incorrect code.
-This goal is achieved by having a simple model of the job
-to be done (e.g., an expression tree)
-and a simple model of the machine state
-(e.g., which registers are free).
-The act of generating an instruction performs a transformation
-on the tree and the machine state;
-hopefully, the tree eventually gets
-reduced to a single node.
-If each of these instruction/transformation pairs is correct,
-and if the machine state model really represents the actual machine,
-and if the transformations reduce the input tree to the desired single node, then the
-output code will be correct.
-.PP
-For most real machines, there is no definitive theory of code generation that
-encompasses all the C operators.
-Thus the selection of which instruction/transformations to generate, and in what
-order, will have a heuristic flavor.
-If, for some expression tree, no transformation applies, or, more
-seriously, if the heuristics select a sequence of instruction/transformations
-that do not in fact reduce the tree, the compiler
-will report its inability to generate code, and abort.
-.PP
-A major part of the code generator is concerned with the model and the transformations,
-\(em most of this is machine independent, or depends only on simple tables.
-The flexibility comes from the heuristics that guide the transformations
-of the trees, the selection of subgoals, and the ordering of the computation.
-.SH
-The Machine Model
-.PP
-The machine is assumed to have a number of registers, of at most two different
-types:
-.I A
-and
-.I B .
-Within each register class, there may be scratch (temporary) registers and dedicated registers
-(e.g., register variables, the stack pointer, etc.).
-Requests to allocate and free registers involve only the temporary registers.
-.PP
-Each of the registers in the machine is given a name and a number
-in the
-.I mac2defs
-file; the numbers are used as indices into various tables
-that describe the registers, so they should be kept small.
-One such table is the
-.I rstatus
-table on file
-.I local2.c .
-This table is indexed by register number, and
-contains expressions made up from manifest constants describing the register types:
-SAREG for dedicated AREG's, SAREG\(orSTAREG for scratch AREGS's,
-and SBREG and SBREG\(orSTBREG similarly for BREG's.
-There are macros that access this information:
-.I isbreg(r)
-returns true if register number
-.I r
-is a BREG, and
-.I istreg(r)
-returns true if register number
-.I r
-is a temporary AREG or BREG.
-Another table,
-.I rnames ,
-contains the register names; this is used when
-putting out assembler code and diagnostics.
-.PP
-The usage of registers is kept track of by an array called
-.I busy .
-.I Busy[r]
-is the number of uses of register
-.I r
-in the current tree being processed.
-The allocation and freeing of registers will be discussed later as part of the code generation
-algorithm.
-.SH
-General Organization
-.PP
-As mentioned above, the second pass reads lines from
-the intermediate file, copying through to the output unchanged
-any lines that begin with a `)', and making note of the
-information about stack usage and register allocation contained on
-lines beginning with `]' and `['.
-The expression trees, whose beginning is indicated by a line beginning with `.',
-are read and rebuilt into trees.
-If the compiler is loaded as one pass, the expression trees are
-immediately available to the code generator.
-.PP
-The actual code generation is done by a hierarchy of routines.
-The routine
-.I delay
-is first given the tree; it attempts to delay some postfix
-++ and \-\- computations that might reasonably be done after the
-smoke clears.
-It also attempts to handle comma (,) operators by computing the
-left side expression first, and then rewriting the tree
-to eliminate the operator.
-.I Delay
-calls
-.I codgen
-to control the actual code generation process.
-.I Codgen
-takes as arguments a pointer to the expression tree,
-and a second argument that, for socio-historical reasons, is called a
-.I cookie .
-The cookie describes a set of goals that would be acceptable
-for the code generation: these are assigned to individual bits,
-so they may be logically or'ed together to form a large number of possible goals.
-Among the possible goals are FOREFF (compute for side effects only;
-don't worry about the value), INTEMP (compute and store value into a temporary location
-in memory), INAREG (compute into an A register), INTAREG (compute into a scratch
-A register), INBREG and INTBREG similarly, FORCC (compute for condition codes),
-and FORARG (compute it as a function argument; e.g., stack it if appropriate).
-.PP
-.I Codgen
-first canonicalizes the tree by calling
-.I canon .
-This routine looks for certain transformations that might now be
-applicable to the tree.
-One, which is very common and very powerful, is to
-fold together an indirection operator (UNARY MUL)
-and a register (REG); in most machines, this combination is
-addressable directly, and so is similar to a NAME in its
-behavior.
-The UNARY MUL and REG are folded together to make
-another node type called OREG.
-In fact, in many machines it is possible to directly address not just the
-cell pointed to by a register, but also cells differing by a constant
-offset from the cell pointed to by the register.
-.I Canon
-also looks for such cases, calling the
-machine dependent routine 
-.I notoff
-to decide if the offset is acceptable (for example, in the IBM 370 the offset
-must be between 0 and 4095 bytes).
-Another optimization is to replace bit field operations
-by shifts and masks if the operation involves extracting the field.
-Finally, a machine dependent routine,
-.I sucomp ,
-is called that computes the Sethi-Ullman numbers
-for the tree (see below).
-.PP
-After the tree is canonicalized,
-.I codgen
-calls the routine
-.I store
-whose job is to select a subtree of the tree to be computed
-and (usually) stored before beginning the
-computation of the full tree.
-.I Store
-must return a tree that can be computed without need
-for any temporary storage locations.
-In effect, the only store operations generated while processing the subtree must be as a response
-to explicit assignment operators in the tree.
-This division of the job marks one of the more significant, and successful,
-departures from most other compilers.
-It means that the code generator can operate
-under the assumption that there are enough
-registers to do its job, without worrying about
-temporary storage.
-If a store into a temporary appears in the output, it is always
-as a direct result of logic in the
-.I store
-routine; this makes debugging easier.
-.PP
-One consequence of this organization is that
-code is not generated by a treewalk.
-There are theoretical results that support this decision.
-.[
-Aho Johnson Optimal Code Trees jacm
-.]
-It may be desirable to compute several subtrees and store
-them before tackling the whole tree;
-if a subtree is to be stored, this is known before the code
-generation for the subtree is begun, and the subtree is computed
-when all scratch registers are available.
-.PP
-The
-.I store
-routine decides what subtrees, if any, should be
-stored by making use of numbers,
-called
-.I "Sethi-Ullman numbers" ,
-that give, for each subtree of an expression tree,
-the minimum number of scratch registers required to
-compile the subtree, without any stores into temporaries.
-.[
-Sethi Ullman jacm 1970
-.]
-These numbers are computed by the machine-dependent
-routine
-.I sucomp ,
-called by
-.I canon .
-The basic notion is that, knowing the Sethi-Ullman numbers for
-the descendants of a node, and knowing the operator of the
-node and some information about the machine, the
-Sethi-Ullman number of the node itself can be computed.
-If the Sethi-Ullman number for a tree exceeds the number of scratch
-registers available, some subtree must be stored.
-Unfortunately, the theory behind the Sethi-Ullman numbers
-applies only to uselessly simple machines and operators.
-For the rich set of C operators, and for machines with
-asymmetric registers, register pairs, different kinds of registers,
-and exceptional forms of addressing,
-the theory cannot be applied directly.
-The basic idea of estimation is a good one, however,
-and well worth applying; the application, especially
-when the compiler comes to be tuned for high code
-quality, goes beyond the park of theory into the
-swamp of heuristics.
-This topic will be taken up again later, when more of the compiler
-structure has been described.
-.PP
-After examining the Sethi-Ullman numbers,
-.I store
-selects a subtree, if any, to be stored, and returns the subtree and the associated cookie in
-the external variables
-.I stotree
-and
-.I stocook .
-If a subtree has been selected, or if
-the whole tree is ready to be processed, the
-routine
-.I order
-is called, with a tree and cookie.
-.I Order
-generates code for trees that
-do not require temporary locations.
-.I Order
-may make recursive calls on itself, and,
-in some cases, on
-.I codgen ;
-for example, when processing the operators &&, \(or\(or, and comma (`,'), that have
-a left to right evaluation, it is
-incorrect for
-.I store
-examine the right operand for subtrees to be stored.
-In these cases,
-.I order
-will call
-.I codgen
-recursively when it is permissible to work on the right operand.
-A similar issue arises with the ? : operator.
-.PP
-The
-.I order
-routine works by matching the current tree with
-a set of code templates.
-If a template is discovered that will
-match the current tree and cookie, the associated assembly language
-statement or statements are generated.
-The tree is then rewritten,
-as specified by the template, to represent the effect of the output instruction(s).
-If no template match is found, first an attempt is made to find a match with a
-different cookie; for example, in order to compute
-an expression with cookie INTEMP (store into a temporary storage location),
-it is usually necessary to compute the expression into a scratch register
-first.
-If all attempts to match the tree fail, the heuristic part of
-the algorithm becomes dominant.
-Control is typically given to one of a number of machine-dependent routines
-that may in turn recursively call
-.I order
-to achieve a subgoal of the computation (for example, one of the
-arguments may be computed into a temporary register).
-After this subgoal has been achieved, the process begins again with the
-modified tree.
-If the machine-dependent heuristics are unable to reduce the tree further,
-a number of default rewriting rules may be considered appropriate.
-For example, if the left operand of a + is a scratch
-register, the + can be replaced by a += operator;
-the tree may then match a template.
-.PP
-To close this introduction, we will discuss the steps in compiling
-code for the expression
-.DS
-\fIa\fR += \fIb\fR
-.DE
-where
-.I a
-and
-.I b
-are static variables.
-.PP
-To begin with, the whole expression tree is examined with cookie FOREFF, and
-no match is found.  Search with other cookies is equally fruitless, so an
-attempt at rewriting is made.
-Suppose we are dealing with the Interdata 8/32 for the moment.
-It is recognized that the left hand and right hand sides of the += operator
-are addressable, and in particular the left hand side has no
-side effects, so it is permissible to rewrite this as
-.DS
-\fIa\fR = \fIa\fR + \fIb\fR
-.DE
-and this is done.
-No match is found on this tree either, so a machine dependent rewrite is done; it is recognized
-that the left hand side of the assignment is addressable, but the right hand side is not
-in a register, so
-.I order
-is called recursively, being asked to put the right
-hand side of the assignment into a register.
-This invocation of
-.I order
-searches the tree for a match, and fails.
-The machine dependent rule for +
-notices that the right hand operand is addressable;
-it decides to put the left operand into a scratch register.
-Another recursive call to
-.I order
-is made, with the tree
-consisting solely of the leaf
-.I a ,
-and the cookie asking that the value be placed into a scratch register.
-This now matches a template, and a load instruction is emitted.
-The node consisting of
-.I a
-is rewritten in place to represent the register into which
-.I a
-is loaded,
-and this third call to
-.I order
-returns.
-The second call to
-.I order
-now finds that it has the tree
-.DS
-\fBreg\fR + \fIb\fR
-.DE
-to consider.
-Once again, there is no match, but the default rewriting rule rewrites
-the + as a += operator, since the left operand is a scratch register.
-When this is done, there is a match: in fact,
-.DS
-\fBreg\fR += \fIb\fR
-.DE
-simply describes the effect of the add instruction
-on a typical machine.
-After the add is emitted, the tree is rewritten
-to consist merely of the register node, since the result of the add
-is now in the register.
-This agrees with the cookie passed to the second invocation of
-.I order ,
-so this invocation
-terminates, returning to the first level.
-The original tree has now
-become
-.DS
-\fIa\fR = \fBreg\fR
-.DE
-which matches a template for the store instruction.
-The store is output, and the tree rewritten to become
-just a single register node.
-At this point, since the top level call to
-.I order
-was
-interested only in side effects, the call to
-.I order
-returns, and the code generation is completed;
-we have generated a load, add, and store, as might have been expected.
-.PP
-The effect of machine architecture on this is considerable.
-For example, on the Honeywell 6000, the machine dependent heuristics recognize that there is an ``add to storage''
-instruction, so the strategy is quite different;
-.I b
-is loaded in to
-a register, and then an add to storage instruction generated
-to add this register in to
-.I a .
-The transformations, involving as they do the semantics of C,
-are largely machine independent.
-The decisions as to when to use them, however, are
-almost totally machine dependent.
-.PP
-Having given a broad outline of
-the code generation process, we shall next consider the
-heart of it: the templates.
-This leads naturally into discussions of template matching and register allocation,
-and finally a discussion of the machine dependent interfaces and strategies.
-.SH
-The Templates
-.PP
-The templates describe the effect of the target machine instructions
-on the model of computation around which the compiler is organized.
-In effect, each template has five logical sections, and represents an assertion
-of the form:
-.IP
-.B If
-we have a subtree of a given shape (1), and we have a goal (cookie) or goals to
-achieve (2), and we have sufficient free resources (3),
-.B then
-we may emit an instruction or instructions (4), and
-rewrite the subtree in a particular manner (5),
-and the rewritten tree will achieve the desired goals.
-.PP
-These five sections will be discussed in more
-detail later.  First, we give an example of a
-template:
-.DS
-.ta 1i 2i 3i 4i 5i
-ASG PLUS,	INAREG,
-	SAREG,	TINT,
-	SNAME,	TINT,
-		0,	RLEFT,
-		"	add	AL,AR\en",
-.DE
-The top line specifies the operator (+=) and the cookie (compute the
-value of the subtree into an AREG).
-The second and third lines specify the left and right descendants,
-respectively,
-of the += operator.
-The left descendant must be a REG node, representing an
-A register, and have integer type, while the right side must be a NAME node,
-and also have integer type.
-The fourth line contains the resource requirements (no scratch registers
-or temporaries needed), and the rewriting rule (replace the subtree by the left descendant).
-Finally, the quoted string on the last line represents the output to the assembler:
-lower case letters, tabs, spaces, etc. are copied
-.I verbatim .
-to the output; upper case letters trigger various macro-like expansions.
-Thus,
-.B AL
-would expand into the \fBA\fRddress form of the \fBL\fReft operand \(em
-presumably the register number.
-Similarly,
-.B AR
-would expand into the name of the right operand.
-The
-.I add
-instruction of the last section might well be
-emitted by this template.
-.PP
-In principle, it would be possible to make separate templates
-for all legal combinations of operators, cookies, types, and shapes.
-In practice, the number of combinations is very large.
-Thus, a considerable amount of mechanism is present to
-permit a large number of subtrees to be matched
-by a single template.
-Most of the shape and type specifiers are individual bits, and can
-be logically
-or'ed
-together.
-There are a number of special descriptors for matching classes of
-operators.
-The cookies can also be combined.
-As an example of the kind of template
-that really arises in practice, the
-actual template for the Interdata 8/32
-that subsumes the above example is:
-.DS
-.ta 1i 2i 3i 4i 5i
-ASG OPSIMP,	INAREG\(orFORCC,
-	SAREG,	TINT\(orTUNSIGNED\(orTPOINT,
-	SAREG\(orSNAME\(orSOREG\(orSCON,	TINT\(orTUNSIGNED\(orTPOINT,
-		0,	RLEFT\(orRESCC,
-		"	OI	AL,AR\en",
-.DE
-Here, OPSIMP represents the operators
-+, \-, \(or, &, and ^.
-The
-.B OI
-macro in the output string expands into the
-appropriate \fBI\fRnteger \fBO\fRpcode for the operator.
-The left and right sides can be integers, unsigned, or pointer types.
-The right side can be, in addition to a name, a register,
-a memory location whose address is given by a register and displacement (OREG),
-or a constant.
-Finally, these instructions set the condition codes,
-and so can be used in condition contexts:
-the cookie and rewriting rules reflect this.
-.SH
-The Template Matching Algorithm.
-.PP
-The heart of the second pass is the template matching
-algorithm, in the routine
-.I match .
-.I Match
-is called with a tree and a cookie; it attempts to match
-the given tree against some template that will transform it
-according to one of the goals given in the cookie.
-If a match is successful, the transformation is
-applied;
-.I expand
-is called to generate the assembly code, and then
-.I reclaim
-rewrites the tree, and reclaims the resources, such
-as registers, that might have become free as a result
-of the generated code.
-.PP
-This part of the compiler is among the most time critical.
-There is a spectrum of implementation techniques available
-for doing this matching.
-The most naive algorithm simply looks at the templates one by one.
-This can be considerably improved upon by restricting the search
-for an acceptable template.
-It would be possible to do better than this if the templates were given
-to a separate program that ate them and generated a template
-matching subroutine.
-This would make maintenance of the compiler much more
-complicated, however, so this has not been done.
-.PP
-The matching algorithm is actually carried out by restricting
-the range in the table that must be searched for each opcode.
-This introduces a number of complications, however, and needs a
-bit of sympathetic help by the person constructing the
-compiler in order to obtain best results.
-The exact tuning of this algorithm continues; it
-is best to consult the code and comments in
-.I match
-for the latest version.
-.PP
-In order to match a template to a tree,
-it is necessary to match not only the cookie and the
-op of the root, but also the types and shapes of the
-left and right descendants (if any) of the tree.
-A convention is established here that is carried out throughout
-the second pass of the compiler.
-If a node represents a unary operator, the single descendant
-is always the ``left'' descendant.
-If a node represents a unary operator or a leaf node (no descendants)
-the ``right'' descendant is taken by convention to be the node itself.
-This enables templates to easily match leaves and conversion operators, for example,
-without any additional mechanism in the matching program.
-.PP
-The type matching is straightforward; it is possible to specify any combination
-of basic types, general pointers, and pointers to one or more of
-the basic types.
-The shape matching is somewhat more complicated, but still pretty simple.
-Templates have a collection of possible operand shapes
-on which the opcode might match.
-In the simplest case, an
-.I add
-operation might be able to add to either a register variable
-or a scratch register, and might be able (with appropriate
-help from the assembler) to add an integer constant (ICON), a static
-memory cell (NAME), or a stack location (OREG).
-.PP
-It is usually attractive to specify a number of such shapes,
-and distinguish between them when the assembler output is produced.
-It is possible to describe the union of many elementary
-shapes such as ICON, NAME, OREG,
-AREG or BREG
-(both scratch and register forms), etc.
-To handle at least the simple forms of indirection, one can also
-match some more complicated forms of trees; STARNM and STARREG
-can match more complicated trees headed by an indirection operator,
-and SFLD can match certain trees headed by a FLD operator: these
-patterns call machine dependent routines that match the
-patterns of interest on a given machine.
-The shape SWADD may be used to recognize NAME or OREG
-nodes that lie on word boundaries: this may be of some importance
-on word\-addressed machines.
-Finally, there are some special shapes: these may not
-be used in conjunction with the other shapes, but may be
-defined and extended in machine dependent ways.
-The special shapes SZERO, SONE, and SMONE are predefined and match
-constants 0, 1, and \-1, respectively; others are easy to add
-and match by using the machine dependent routine
-.I special .
-.PP
-When a template has been found that matches the root of the tree,
-the cookie, and the shapes and types of the descendants,
-there is still one bar to a total match: the template may
-call for some resources (for example, a scratch register).
-The routine
-.I allo
-is called, and it attempts to allocate the resources.
-If it cannot, the match fails; no resources are
-allocated.
-If successful, the allocated resources are given numbers
-1, 2, etc. for later reference when the assembly code is
-generated.
-The routines
-.I expand
-and
-.I reclaim
-are then called.
-The
-.I match
-routine then returns a special value, MDONE.
-If no match was found, the value MNOPE is returned;
-this is a signal to the caller to try more cookie
-values, or attempt a rewriting rule.
-.I Match
-is also used to select rewriting rules, although
-the way of doing this is pretty straightforward.
-A special cookie, FORREW, is used to ask
-.I match
-to search for a rewriting rule.
-The rewriting rules are keyed to various opcodes; most
-are carried out in
-.I order .
-Since the question of when to rewrite is one of the key issues in
-code generation, it will be taken up again later.
-.SH
-Register Allocation.
-.PP
-The register allocation routines, and the allocation strategy,
-play a central role in the correctness of the code generation algorithm.
-If there are bugs in the Sethi-Ullman computation that cause the
-number of needed registers to be underestimated,
-the compiler may run out of scratch registers;
-it is essential that the allocator keep track of those registers that
-are free and busy, in order to detect such conditions.
-.PP
-Allocation of registers takes place as the result of a template
-match; the routine
-.I allo
-is called with a word describing the number of A registers,
-B registers, and temporary locations needed.
-The allocation of temporary locations on the stack is relatively
-straightforward, and will not be further covered; the
-bookkeeping is a bit tricky, but conceptually trivial, and requests
-for temporary space on the stack will never fail.
-.PP
-Register allocation is less straightforward.
-The two major complications are
-.I pairing
-and
-.I sharing .
-In many machines, some operations (such as multiplication
-and division), and/or some types (such as longs or double precision)
-require even/odd pairs of registers.
-Operations of the first type are exceptionally difficult to
-deal with in the compiler; in fact, their theoretical
-properties are rather bad as well.
-.[
-Aho Johnson Ullman Multiregister
-.]
-The second issue is dealt with rather more successfully;
-a machine dependent function called
-.I szty(t)
-is called that returns 1 or 2, depending on the
-number of A registers required to hold an object of type
-.I t .
-If
-.I szty
-returns 2, an even/odd pair of A registers is allocated
-for each request.
-.PP
-The other issue, sharing, is more subtle, but
-important for good code quality.
-When registers are allocated, it
-is possible to reuse registers that hold address
-information, and use them to contain the values
-computed or accessed.
-For example, on the IBM 360, if register 2 has
-a pointer to an integer in it, we may load the
-integer into register 2 itself by saying:
-.DS
-L	2,0(2)
-.DE
-If register 2 had a byte pointer, however, the sequence for
-loading a character involves clearing the target
-register first, and then inserting the desired character:
-.DS
-SR	3,3
-IC	3,0(2)
-.DE
-In the first case, if register 3 were used as the target,
-it would lead to a larger number of registers
-used for the expression than were required; the compiler would
-generate inefficient code.
-On the other hand, if register 2 were used as the target in the second
-case, the code would simply be wrong.
-In the first case, register 2 can be 
-.I shared
-while in the second, it cannot.
-.PP
-In the specification of the register needs in the templates,
-it is possible to indicate whether required scratch registers
-may be shared with possible registers on the left or the right of the input tree.
-In order that a register be shared, it must be scratch, and it must
-be used only once, on the appropriate side of the tree being compiled.
-.PP
-The
-.I allo
-routine thus has a bit more to do than meets the eye;
-it calls
-.I freereg
-to obtain a free register for each A and B register request.
-.I Freereg
-makes multiple calls on the routine
-.I usable
-to decide if a given register can be used to satisfy
-a given need.
-.I Usable
-calls
-.I shareit
-if the register is busy, but might be shared.
-Finally,
-.I shareit
-calls
-.I ushare
-to decide if the desired register is actually in the appropriate
-subtree, and can be shared.
-.PP
-Just to add additional complexity, on some machines (such as the IBM 370) it
-is possible to have ``double indexing'' forms of
-addressing; these are represented by OREGS's
-with the base and index registers encoded into the register field.
-While the register allocation and deallocation
-.I "per se"
-is not made more difficult by this phenomenon, the code itself
-is somewhat more complex.
-.PP
-Having allocated the registers and expanded the assembly language,
-it is time to reclaim the resources; the routine
-.I reclaim
-does this.
-Many operations produce more than one result.
-For example, many arithmetic operations may produce
-a value in a register, and also set the condition
-codes.
-Assignment operations may leave results both in a register and in memory.
-.I Reclaim
-is passed three parameters; the tree and cookie
-that were matched, and the rewriting field of the template.
-The rewriting field allows the specification of possible results;
-the tree is rewritten to reflect the results of the operation.
-If the tree was computed for side effects only (FOREFF),
-the tree is freed, and all resources in it reclaimed.
-If the tree was computed for condition codes, the resources
-are also freed, and the tree replaced by a special
-node type, FORCC.
-Otherwise, the value may be found in the left
-argument of the root, the right argument of the root,
-or one of the temporary resources allocated.
-In these cases, first the resources of the tree,
-and the newly allocated resources,
-are
-freed; then the resources needed by the result
-are made busy again.
-The final result must always match the shape of the input cookie;
-otherwise, the compiler error
-``cannot reclaim''
-is generated.
-There are some machine dependent ways of
-preferring results in registers or memory when
-there are multiple results matching multiple goals in the cookie.
-.SH
-The Machine Dependent Interface
-.PP
-The files
-.I order.c ,
-.I local2.c ,
-and
-.I table.c ,
-as well as the header file
-.I mac2defs ,
-represent the machine dependent portion of the second pass.
-The machine dependent portion can be roughly divided into
-two: the easy portion and the hard portion.
-The easy portion
-tells the compiler the names of the registers, and arranges that
-the compiler generate the proper assembler formats, opcode names, location counters, etc.
-The hard portion involves the Sethi\-Ullman computation, the
-rewriting rules, and, to some extent, the templates.
-It is hard because there are no real algorithms that apply;
-most of this portion is based on heuristics.
-This section discusses the easy portion; the next several
-sections will discuss the hard portion.
-.PP
-If the compiler is adapted from a compiler for a machine
-of similar architecture, the easy part is indeed easy.
-In
-.I mac2defs ,
-the register numbers are defined, as well as various parameters for
-the stack frame, and various macros that describe the machine architecture.
-If double indexing is to be permitted, for example, the symbol
-R2REGS is defined.
-Also, a number of macros that are involved in function call processing,
-especially for unusual function call mechanisms, are defined here.
-.PP
-In
-.I local2.c ,
-a large number of simple functions are defined.
-These do things such as write out opcodes, register names,
-and address forms for the assembler.
-Part of the function call code is defined here; that is nontrivial
-to design, but typically rather straightforward to implement.
-Among the easy routines in
-.I order.c
-are routines for generating a created label,
-defining a label, and generating the arguments
-of a function call.
-.PP
-These routines tend to have a local effect, and depend on a fairly straightforward way
-on the target assembler and the design decisions already made about
-the compiler.
-Thus they will not be further treated here.
-.SH
-The Rewriting Rules
-.PP
-When a tree fails to match any template, it becomes
-a candidate for rewriting.
-Before the tree is rewritten,
-the machine dependent routine
-.I nextcook
-is called with the tree and the cookie; it suggests
-another cookie that might be a better candidate for the
-matching of the tree.
-If all else fails, the templates are searched with the cookie
-FORREW, to look for a rewriting rule.
-The rewriting rules are of two kinds;
-for most of the common operators, there are
-machine dependent rewriting rules that may be applied;
-these are handled by machine dependent functions
-that are called and given the tree to be computed.
-These routines may recursively call
-.I order
-or
-.I codgen
-to cause certain subgoals to be achieved;
-if they actually call for some alteration of the tree,
-they return 1, and the
-code generation algorithm recanonicalizes and tries again.
-If these routines choose not to deal with the tree, the
-default rewriting rules are applied.
-.PP
-The assignment ops, when rewritten, call the routine
-.I setasg .
-This is assumed to rewrite the tree at least to the point where there are
-no side effects in the left hand side.
-If there is still no template match,
-a default rewriting is done that causes
-an expression such as
-.DS
-.I "a += b"
-.DE
-to be rewritten as
-.DS
-.I "a = a + b"
-.DE
-This is a useful default for certain mixtures of strange types
-(for example, when
-.I a
-is a bit field and
-.I b
-an character) that
-otherwise might need separate table entries.
-.PP
-Simple assignment, structure assignment, and all forms of calls
-are handled completely by the machine dependent routines.
-For historical reasons, the routines generating the calls return
-1 on failure, 0 on success, unlike the other routines.
-.PP
-The machine dependent routine
-.I setbin
-handles binary operators; it too must do most of the job.
-In particular, when it returns 0, it must do so with
-the left hand side in a temporary register.
-The default rewriting rule in this case is to convert the
-binary operator into the associated assignment operator;
-since the left hand side is assumed to be a temporary register,
-this preserves the semantics and often allows a considerable
-saving in the template table.
-.PP
-The increment and decrement operators may be dealt with with
-the machine dependent routine
-.I setincr .
-If this routine chooses not to deal with the tree, the rewriting rule replaces
-.DS
-.I "x ++"
-.DE
-by
-.DS
-.I "( (x += 1) \- 1)"
-.DE
-which preserves the semantics.
-Once again, this is not too attractive for the most common
-cases, but can generate close to optimal code when the
-type of x is unusual.
-.PP
-Finally, the indirection (UNARY MUL) operator is also handled
-in a special way.
-The machine dependent routine
-.I offstar
-is extremely important for the efficient generation of code.
-.I Offstar
-is called with a tree that is the direct descendant of a UNARY MUL node;
-its job is to transform this tree so that the combination of
-UNARY MUL with the transformed tree becomes addressable.
-On most machines,
-.I offstar
-can simply compute the tree into an A or B register,
-depending on the architecture, and then
-.I canon
-will make the resulting tree into an OREG.
-On many machines,
-.I offstar
-can profitably choose to do less work than computing
-its entire argument into a register.
-For example, if the target machine supports OREGS
-with a constant offset from a register, and
-.I offstar
-is called
-with a tree of the form
-.DS
-.I "expr + const"
-.DE
-where
-.I const
-is a constant, then
-.I offstar
-need only compute
-.I expr
-into the appropriate form of register.
-On machines that support double indexing,
-.I offstar
-may have even more choice as to how to proceed.
-The proper tuning of
-.I offstar ,
-which is not typically too difficult, should be one of the
-first tries at optimization attempted by the
-compiler writer.
-.SH
-The Sethi-Ullman Computation
-.PP
-The heart of the heuristics is the computation of the Sethi-Ullman numbers.
-This computation is closely linked with the rewriting rules and the
-templates.
-As mentioned before, the Sethi-Ullman numbers are expected to
-estimate the number of scratch registers needed to compute
-the subtrees without using any stores.
-However, the original theory does not apply to real machines.
-For one thing, the theory assumes that all registers
-are interchangeable.
-Real machines have general purpose, floating point, and index registers,
-register pairs, etc.
-The theory also does not account for side effects;
-this rules out various forms of pathology that arise
-from assignment and assignment ops.
-Condition codes are also undreamed of.
-Finally, the influence of types, conversions, and the
-various addressability restrictions and extensions of real
-machines are also ignored.
-.PP
-Nevertheless, for a ``useless'' theory,
-the basic insight of Sethi and Ullman is amazingly
-useful in a real compiler.
-The notion that one should attempt to estimate the
-resource needs of trees before starting the
-code generation provides a natural means of splitting the
-code generation problem, and provides a bit of redundancy
-and self checking in the compiler.
-Moreover, if writing the
-Sethi-Ullman routines is hard, describing, writing, and debugging the
-alternative (routines that attempt to free up registers by stores into
-temporaries ``on the fly'') is even worse.
-Nevertheless, it should be clearly understood that these routines exist in a realm
-where there is no ``right'' way to write them;
-it is an art, the realm of heuristics, and, consequently, a major
-source of bugs in the compiler.
-Often, the early, crude versions of these routines give little trouble;
-only after
-the compiler is actually working
-and the
-code quality is being improved do serious problem have to be faced.
-Having a simple, regular machine architecture is worth quite
-a lot at this time.
-.PP
-The major problems arise from asymmetries in the registers: register pairs,
-having different kinds of registers, and the related problem of
-needing more than one register (frequently a pair) to store certain
-data
-types (such as longs or doubles).
-There appears to be no general way of treating this problem;
-solutions have to be fudged for each machine where the problem arises.
-On the Honeywell 66, for example, there are only two general purpose registers,
-so a need for a pair is the same as the need for two registers.
-On the IBM 370, the register pair (0,1) is used to do multiplications and divisions;
-registers 0 and 1 are not generally considered part of the scratch registers, and so
-do not require allocation explicitly.
-On the Interdata 8/32, after much consideration, the
-decision was made not to try to deal with the register pair issue;
-operations such as multiplication and division that required pairs
-were simply assumed to take all of the scratch registers.
-Several weeks of effort had failed to produce
-an algorithm that seemed to have much chance of running successfully
-without inordinate debugging effort.
-The difficulty of this issue should not be minimized; it represents one of the
-main intellectual efforts in porting the compiler.
-Nevertheless, this problem has been fudged with a degree of
-success on nearly a dozen machines, so the compiler writer should
-not abandon hope.
-.PP
-The Sethi-Ullman computations interact with the
-rest of the compiler in a number of rather subtle ways.
-As already discussed, the
-.I store
-routine uses the Sethi-Ullman numbers to decide which subtrees are too difficult
-to compute in registers, and must be stored.
-There are also subtle interactions between the
-rewriting routines and the Sethi-Ullman numbers.
-Suppose we have a tree such as
-.DS
-.I "A \- B"
-.DE
-where
-.I A
-and
-.I B
-are expressions; suppose further that
-.I B
-takes two registers, and
-.I A
-one.
-It is possible to compute the full expression in two registers by
-first computing
-.I B ,
-and then, using the scratch register
-used by
-.I B ,
-but not containing the answer, compute
-.I A .
-The subtraction can then be done, computing the expression.
-(Note that this assumes a number of things, not the least of which
-are register-to-register subtraction operators and symmetric
-registers.)
-If the machine dependent routine
-.I setbin ,
-however, is not prepared to recognize this case
-and compute the more difficult side of the expression
-first, the
-Sethi-Ullman number must be set to three.
-Thus, the
-Sethi-Ullman number for a tree should represent the code that
-the machine dependent routines are actually willing to generate.
-.PP
-The interaction can go the other way.
-If we take an expression such as
-.DS
-.I "* ( p + i )"
-.DE
-where
-.I p
-is a pointer and
-.I i
-an integer,
-this can probably be done in one register on most machines.
-Thus, its Sethi-Ullman number would probably be set to one.
-If double indexing is possible in the machine, a possible way
-of computing the expression is to load both
-.I p
-and
-.I i
-into registers, and then use double indexing.
-This would use two scratch registers; in such a case,
-it is possible that the scratch registers might be unobtainable,
-or might make some other part of the computation run out of
-registers.
-The usual solution is to cause
-.I offstar
-to ignore opportunities for double indexing that would tie up more scratch
-registers than the Sethi-Ullman number had reserved.
-.PP
-In summary, the Sethi-Ullman computation represents much of the craftsmanship and artistry in any application
-of the portable compiler.
-It is also a frequent source of bugs.
-Algorithms are available that will produce nearly optimal code
-for specialized machines, but unfortunately most existing machines
-are far removed from these ideals.
-The best way of proceeding in practice is to start with a compiler
-for a similar machine to the target, and proceed very
-carefully.
-.SH
-Register Allocation
-.PP
-After the Sethi-Ullman numbers are computed,
-.I order
-calls a routine,
-.I rallo ,
-that does register allocation, if appropriate.
-This routine does relatively little, in general;
-this is especially true if the target machine
-is fairly regular.
-There are a few cases where it is assumed that
-the result of a computation takes place in a particular register;
-switch and function return are the two major places.
-The expression tree has a field,
-.I rall ,
-that may be filled with a register number; this is taken
-to be a preferred register, and the first temporary
-register allocated by a template match will be this preferred one, if
-it is free.
-If not, no particular action is taken; this is just a heuristic.
-If no register preference is present, the field contains NOPREF.
-In some cases, the result must be placed in
-a given register, no matter what.
-The register number is placed in
-.I rall ,
-and the mask MUSTDO is logically or'ed in with it.
-In this case, if the subtree is requested in a register, and comes
-back in a register other than the demanded one, it is moved
-by calling the routine
-.I rmove .
-If the target register for this move is busy, it is a compiler
-error.
-.PP
-Note that this mechanism is the only one that will ever cause a register-to-register
-move between scratch registers (unless such a move is buried in the depths of
-some template).
-This simplifies debugging.
-In some cases, there is a rather strange interaction between
-the register allocation and the Sethi-Ullman number;
-if there is an operator or situation requiring a
-particular register, the allocator and the Sethi-Ullman
-computation must conspire to ensure that the target
-register is not being used by some intermediate result of some far-removed computation.
-This is most easily done by making the special operation take
-all of the free registers, preventing any other partially-computed
-results from cluttering up the works.
-.SH
-Compiler Bugs
-.PP
-The portable compiler has an excellent record of generating correct code.
-The requirement for reasonable cooperation between the register allocation,
-Sethi-Ullman computation, rewriting rules, and templates builds quite a bit
-of redundancy into the compiling process.
-The effect of this is that, in a surprisingly short time, the compiler will
-start generating correct code for those
-programs that it can compile.
-The hard part of the job then becomes finding and
-eliminating those situations where the compiler refuses to
-compile a program because it knows it cannot do it right.
-For example, a template may simply be missing; this may either
-give a compiler error of the form ``no match for op ...'' , or cause
-the compiler to go into an infinite loop applying various rewriting rules.
-The compiler has a variable,
-.I nrecur ,
-that is set to 0 at the beginning of an expressions, and
-incremented at key spots in the
-compilation process; if this parameter gets too large, the
-compiler decides that it is in a loop, and aborts.
-Loops are also characteristic of botches in the machine-dependent rewriting rules.
-Bad Sethi-Ullman computations usually cause the scratch registers
-to run out; this often means
-that the Sethi-Ullman number was underestimated, so
-.I store
-did not store something it should have; alternatively,
-it can mean that the rewriting rules were not smart enough to
-find the sequence that
-.I sucomp
-assumed would be used.
-.PP
-The best approach when a compiler error is detected involves several stages.
-First, try to get a small example program that steps on the bug.
-Second, turn on various debugging flags in the code generator, and follow the
-tree through the process of being matched and rewritten.
-Some flags of interest are
-\-e, which prints the expression tree,
-\-r, which gives information about the allocation of registers,
-\-a, which gives information about the performance of
-.I rallo ,
-and \-o, which gives information about the behavior of
-.I order .
-This technique should allow most bugs to be found relatively quickly.
-.PP
-Unfortunately, finding the bug is usually not enough; it must also
-be fixed!
-The difficulty arises because a fix to the particular bug of interest tends
-to break other code that already works.
-Regression tests, tests that compare the performance of
-a new compiler against the performance of an older one, are very
-valuable in preventing major catastrophes.
-.SH
-Summary and Conclusion
-.PP
-The portable compiler has been a useful tool for providing C
-capability on a large number of diverse machines,
-and for testing a number of theoretical
-constructs in a practical setting.
-It has many blemishes, both in style and functionality.
-It has been applied to many more machines than first
-anticipated, of a much wider range than originally dreamed of.
-Its use has also spread much faster than expected, leaving parts of
-the compiler still somewhat raw in shape.
-.PP
-On the theoretical side, there is some hope that the
-skeleton of the
-.I sucomp
-routine could be generated for many machines directly from the
-templates; this would give a considerable boost
-to the portability and correctness of the compiler,
-but might affect tunability and code quality.
-There is also room for more optimization, both within
-.I optim
-and in the form of a portable ``peephole'' optimizer.
-.PP
-On the practical, development side,
-the compiler could probably be sped up and made smaller
-without doing too much violence to its basic structure.
-Parts of the compiler deserve to be rewritten;
-the initialization code, register allocation, and
-parser are prime candidates.
-It might be that doing some or all of the parsing
-with a recursive descent parser might
-save enough space and time to be worthwhile;
-it would certainly ease the problem of moving the
-compiler to an environment where
-.I Yacc
-is not already present.
-.PP
-Finally, I would like to thank the many people who have
-sympathetically, and even enthusiastically, helped me grapple
-with what has been a frustrating program to write, test, and install.
-D. M. Ritchie and E. N. Pinson provided needed early
-encouragement and philosophical guidance;
-M. E. Lesk,
-R. Muha, T. G. Peterson,
-G. Riddle, L. Rosler,
-R. W. Mitze,
-B. R. Rowland,
-S. I. Feldman,
-and
-T. B. London
-have all contributed ideas, gripes, and all, at one time or another,
-climbed ``into the pits'' with me to help debug.
-Without their help this effort would have not been possible;
-with it, it was often kind of fun.
-.sp 100
-.LP
-.[
-$LIST$
-.]
-.LP
-.sp 100
-.LP
//GO.SYSIN DD porttour2