# To unbundle, run this file echo porttour1 sed 's/.//' >porttour1 <<'//GO.SYSIN DD porttour1' -.TL -A Tour Through the Portable C Compiler -.AU -S. C. Johnson -.AI -.MH -.SH -Introduction -.PP -A C compiler has been implemented that has proved to be quite portable, -serving as the basis for C compilers on roughly a dozen machines, including -the Honeywell 6000, IBM 370, and Interdata 8/32. -The compiler is highly compatible with the C language standard. -.[ -Kernighan Ritchie Prentice 1978 -.] -.PP -Among the goals of this compiler are portability, high reliability, -and the use of state-of-the-art techniques and tools wherever practical. -Although the efficiency of the compiling process is not -a primary goal, the compiler is efficient enough, and produces -good enough code, to serve as a production compiler. -.PP -The language implemented is highly compatible with the current PDP-11 version of C. -Moreover, roughly 75% of the compiler, including -nearly all the syntactic and semantic routines, is machine independent. -The compiler also serves as the major portion of the program -.I lint , -described elsewhere. -.[ -Johnson Lint Program Checker -.] -.PP -A number of earlier attempts to make portable compilers are worth noting. -While on CO-OP assignment to Bell Labs in 1973, Alan Snyder wrote a portable C -compiler which was the basis of his Master's Thesis at M.I.T. -.[ -Snyder Portable C Compiler -.] -This compiler was very slow and complicated, and contained a number of rather -serious implementation difficulties; -nevertheless, a number of Snyder's ideas appear in this work. -.PP -Most earlier portable compilers, including Snyder's, have proceeded by -defining an intermediate language, perhaps based -on three-address code or code for a stack machine, -and writing a machine independent program to -translate from the source code to this -intermediate code. -The intermediate code is then read by a second pass, and interpreted or compiled. -This approach is elegant, and has a number of advantages, especially if the -target machine is far removed from the host. -It suffers from some disadvantages as well. -Some constructions, like initialization and subroutine prologs, -are difficult or expensive to express in a -machine independent way that still allows them -to be easily adapted to the target assemblers. -Most of these approaches require a symbol table to be constructed -in the second (machine dependent) pass, and/or -require powerful target assemblers. -Also, many conversion operators may be generated -that have no effect on a given machine, -but may be needed on others (for example, pointer to pointer -conversions usually do nothing in C, but must be generated because -there are some machines where they are significant). -.PP -For these reasons, the first pass of the portable compiler -is not entirely machine independent. -It contains some machine dependent features, such as initialization, subroutine -prolog and epilog, certain storage allocation functions, -code for the -.I switch -statement, and code to throw out unneeded conversion operators. -.PP -As a crude measure of the degree of -portability actually achieved, -the Interdata 8/32 C compiler has -roughly 600 machine dependent lines of source out of 4600 in Pass 1, and 1000 -out of 3400 in Pass 2. -In total, 1600 out of 8000, or 20%, -of the total source is machine dependent -(12% in Pass 1, 30% in Pass 2). -These percentages can be expected to rise slightly as the -compiler is tuned. -The percentage of machine-dependent code for the IBM is 22%, for -the Honeywell 25%. -If the assembler format and structure were the same for all -these machines, perhaps another 5-10% of the code -would become machine independent. -.PP -These figures are sufficiently misleading as to be almost -meaningless. -A large fraction of the machine dependent code can be converted -in a straightforward, almost mechanical way. -On the other hand, a certain amount of the code requres hard -intellectual effort to convert, since the algorithms -embodied in this part of the code are typically complicated -and machine dependent. -.PP -To summarize, however, if you need a C compiler written for a machine with -a reasonable architecture, the compiler is already three quarters finished! -.SH -Overview -.PP -This paper discusses the structure and organization of the portable compiler. -The intent is to give the big picture, rather than discussing the details of a particular machine -implementation. -After a brief overview and a discussion of the source file structure, -the paper describes the major data structures, and then delves more -closely into the two passes. -Some of the theoretical work on which the compiler is based, and -its application to the compiler, is discussed elsewhere. -.[ -johnson portable theory practice -.] -One of the major design issues in any C compiler, the design of -the calling sequence and stack frame, is the subject of a separate -memorandum. -.[ -johnson lesk ritchie calling sequence -.] -.PP -The compiler consists of two passes, -.I pass1 -and -.I pass2 , -that together turn C source code into assembler code for the target -machine. -The two passes are preceded by a preprocessor, that -handles the -.B "#define" -and -.B "#include" -statements, and related features (e.g., -.B #ifdef , -etc.). -It is a nearly machine independent program, and will -not be further discussed here. -.PP -The output of the preprocessor is a text file that is read as the standard -input of the first pass. -This -produces as standard output another text file that becomes the standard -input of the second pass. -The second pass produces, as standard output, the desired assembler language source code. -The preprocessor and the two passes -all write error messages on the standard error file. -Thus the compiler itself makes few demands on the I/O -library support, aiding in the bootstrapping process. -.PP -Although the compiler is divided into two passes, -this represents historical accident more than deep necessity. -In fact, the compiler can optionally be loaded -so that both passes operate in the same program. -This ``one pass'' operation eliminates the overhead of -reading and writing the intermediate file, so the compiler -operates about 30% faster in this mode. -It also occupies about 30% more space than the larger -of the two component passes. -.PP -Because the compiler is fundamentally structured as two -passes, even when loaded as one, this document primarily -describes the two pass version. -.PP -The first pass does the lexical analysis, parsing, and -symbol table maintenance. -It also constructs parse trees for expressions, -and keeps track of the types of the nodes in these trees. -Additional code is devoted to initialization. -Machine dependent portions of the first pass serve to -generate subroutine prologs and epilogs, -code for switches, and code for branches, label definitions, -alignment operations, -changes of location counter, etc. -.PP -The intermediate file is a text file organized into lines. -Lines beginning with a right parenthesis are copied by the second -pass directly to its output file, with the -parenthesis stripped off. -Thus, when the first pass produces assembly code, such as subroutine prologs, -etc., each line is prefaced with a right parenthesis; -the second pass passes these lines to through to the assembler. -.PP -The major job done by the second pass is generation of code -for expressions. -The expression parse trees produced in the first pass are -written onto the -intermediate file in Polish Prefix form: -first, there is a line beginning with a period, followed by the source -file line number and name on which the expression appeared (for debugging purposes). -The successive lines represent the nodes of the parse tree, -one node per line. -Each line contains the node number, type, and -any values (e.g., values of constants) -that may appear in the node. -Lines representing nodes with descendants are immediately -followed by the left subtree of descendants, then the -right. -Since the number of descendants of any node is completely -determined by the node number, there is no need to mark the end -of the tree. -.PP -There are only two other line types in the intermediate file. -Lines beginning with a left square bracket (`[') represent the beginning of -blocks (delimited by { ... } in the C source); -lines beginning with right square brackets (`]') represent the end of blocks. -The remainder of these lines tell how much -stack space, and how many register variables, -are currently in use. -.PP -Thus, the second pass reads the intermediate files, copies the `)' lines, -makes note of the information in the `[' and `]' lines, -and devotes most of its effort to the -`.' lines and their associated expression trees, turning them -turns into assembly code to evaluate the expressions. -.PP -In the one pass version of the compiler, the expression -trees that are built by the first pass have -been declared to have room for the second pass -information as well. -Instead of writing the trees onto an intermediate file, -each tree is transformed in place into an acceptable -form for the code generator. -The code generator then writes the result of compiling -this tree onto the standard output. -Instead of `[' and `]' lines in the intermediate file, the information is passed -directly to the second pass routines. -Assembly code produced by the first pass is simply written out, -without the need for ')' at the head of each line. -.SH -The Source Files -.PP -The compiler source consists of 22 source files. -Two files, -.I manifest -and -.I macdefs , -are header files included with all other files. -.I Manifest -has declarations for the node numbers, types, storage classes, -and other global data definitions. -.I Macdefs -has machine-dependent definitions, such as the size and alignment of the -various data representations. -Two machine independent header files, -.I mfile1 -and -.I mfile2 , -contain the data structure and manifest definitions -for the first and second passes, respectively. -In the second pass, a machine dependent header file, -.I mac2defs , -contains declarations of register names, etc. -.PP -There is a file, -.I common , -containing (machine independent) routines used in both passes. -These include routines for allocating and freeing trees, walking over trees, -printing debugging information, and printing error messages. -There are two dummy files, -.I comm1.c -and -.I comm2.c , -that simply include -.I common -within the scope of the appropriate pass1 or pass2 header files. -When the compiler is loaded as a single pass, -.I common -only needs to be included once: -.I comm2.c -is not needed. -.PP -Entire sections of this document are devoted to the detailed structure of the -passes. -For the moment, we just give a brief description of the files. -The first pass -is obtained by compiling and loading -.I scan.c , -.I cgram.c , -.I xdefs.c , -.I pftn.c , -.I trees.c , -.I optim.c , -.I local.c , -.I code.c , -and -.I comm1.c . -.I Scan.c -is the lexical analyzer, which is used by -.I cgram.c , -the result of applying -.I Yacc -.[ -Johnson Yacc Compiler cstr -.] -to the input grammar -.I cgram.y . -.I Xdefs.c -is a short file of external definitions. -.I Pftn.c -maintains the symbol table, and does initialization. -.I Trees.c -builds the expression trees, and computes the node types. -.I Optim.c -does some machine independent optimizations on the expression trees. -.I Comm1.c -includes -.I common , -that contains service routines common to the two passes of the compiler. -All the above files are machine independent. -The files -.I local.c -and -.I code.c -contain machine dependent code for generating subroutine -prologs, switch code, and the like. -.PP -The second pass -is produced by compiling and loading -.I reader.c , -.I allo.c , -.I match.c , -.I comm1.c , -.I order.c , -.I local.c , -and -.I table.c . -.I Reader.c -reads the intermediate file, and controls the major logic of the -code generation. -.I Allo.c -keeps track of busy and free registers. -.I Match.c -controls the matching -of code templates to subtrees of the expression tree to be compiled. -.I Comm2.c -includes the file -.I common , -as in the first pass. -The above files are machine independent. -.I Order.c -controls the machine dependent details of the code generation strategy. -.I Local2.c -has many small machine dependent routines, -and tables of opcodes, register types, etc. -.I Table.c -has the code template tables, -which are also clearly machine dependent. -.SH -Data Structure Considerations. -.PP -This section discusses the node numbers, type words, and expression trees, used -throughout both passes of the compiler. -.PP -The file -.I manifest -defines those symbols used throughout both passes. -The intent is to use the same symbol name (e.g., MINUS) -for the given operator throughout the lexical analysis, parsing, tree building, -and code generation phases; -this requires some synchronization with the -.I Yacc -input file, -.I cgram.y , -as well. -.PP -A token -like MINUS may be seen in the lexical analyzer before it is known -whether it is a unary or binary operator; -clearly, it is necessary to know this by the time the parse tree -is constructed. -Thus, an operator (really a macro) called -UNARY is provided, so that MINUS -and UNARY MINUS are both distinct node numbers. -Similarly, many binary operators exist in an assignment form (for example, \-=), -and the operator ASG may be applied to such node names to generate new ones, e.g. ASG MINUS. -.PP -It is frequently desirable to know if a node represents a leaf (no descendants), a unary operator (one -descendant) or a binary operator (two descendants). -The macro -.I optype(o) -returns one of the manifest constants LTYPE, UTYPE, or BITYPE, respectively, depending -on the node number -.I o . -Similarly, -.I asgop(o) -returns true if -.I o -is an assignment operator number (=, +=, etc. ), and -.I logop(o) -returns true if -.I o -is a relational or logical (&&, \(or\(or, or !) operator. -.PP -C has a rich typing structure, with a potentially infinite number of types. -To begin with, there are the basic types: CHAR, SHORT, INT, LONG, the unsigned versions -known as -UCHAR, USHORT, UNSIGNED, ULONG, and FLOAT, DOUBLE, -and finally -STRTY (a structure), UNIONTY, and ENUMTY. -Then, there are three operators that can be applied to types to make others: -if -.I t -is a type, we may potentially have types -.I "pointer to t" , -.I "function returning t" , -and -.I "array of t's" -generated from -.I t . -Thus, an arbitrary type in C consists of a basic type, and zero or more of these operators. -.PP -In the compiler, a type is represented by an unsigned integer; the rightmost four bits hold the basic -type, and the remaining bits are divided into two-bit fields, containing -0 (no operator), or one of the three operators -described above. -The modifiers are read right to left in the word, starting with the two-bit -field adjacent to the basic type, until a field with 0 in it is reached. -The macros PTR, FTN, and ARY represent the -.I "pointer to" , -.I "function returning" , -and -.I "array of" -operators. -The macro values are shifted so that they align with the first two-bit -field; thus PTR+INT -represents the type for an integer pointer, and -.DS -ARY + (PTR<<2) + (FTN<<4) + DOUBLE -.DE -represents the type of an array of pointers to functions returning doubles. -.PP -The type words are ordinarily manipulated by macros. -If -.I t -is a type word, -.I BTYPE(t) -gives the basic type. -.I ISPTR(t) , -.I ISARY(t) , -and -.I ISFTN(t) -ask if an object of this type is a pointer, array, or a function, -respectively. -.I MODTYPE(t,b) -sets the basic type of -.I t -to -.I b . -.I DECREF(t) -gives the type resulting from removing the first operator from -.I t . -Thus, if -.I t -is a pointer to -.I t' , -a function returning -.I t' , -or an array of -.I t' , -then -.I DECREF(t) -would equal -.I t' . -.I INCREF(t) -gives the type representing a pointer to -.I t . -Finally, there are operators for dealing with the unsigned types. -.I ISUNSIGNED(t) -returns true if -.I t -is one of the four basic unsigned types; -in this case, -.I DEUNSIGN(t) -gives the associated `signed' type. -Similarly, -.I UNSIGNABLE(t) -returns true if -.I t -is one of the four basic types that could become unsigned, and -.I ENUNSIGN(t) -returns the unsigned analogue of -.I t -in this case. -.PP -The other important global data structure is that of expression trees. -The actual shapes of the nodes are given in -.I mfile1 -and -.I mfile2 . -They are not the same in the two passes; the first pass nodes contain -dimension and size information, while the second pass -nodes contain register allocation information. -Nevertheless, all nodes contain fields called -.I op , -containing the node number, -and -.I type , -containing the type word. -A function called -.I talloc() -returns a pointer to a new tree node. -To free a node, its -.I op -field need merely be set to FREE. -The other fields in the node will remain intact at least until the next allocation. -.PP -Nodes representing binary operators contain fields, -.I left -and -.I right , -that contain pointers to the left and right descendants. -Unary operator nodes have the -.I left -field, and a value field called -.I rval . -Leaf nodes, with no descendants, have two value fields: -.I lval -and -.I rval . -.PP -At appropriate times, the function -.I tcheck() -can be called, to check that there are no busy nodes remaining. -This is used as a compiler consistency check. -The function -.I tcopy(p) -takes a pointer -.I p -that points to an expression tree, and returns a pointer -to a disjoint copy of the tree. -The function -.I walkf(p,f) -performs a postorder walk of the tree pointed to by -.I p , -and applies the function -.I f -to each node. -The function -.I fwalk(p,f,d) -does a preorder walk of the tree pointed to by -.I p . -At each node, it calls a function -.I f , -passing to it the node pointer, a value passed down -from its ancestor, and two pointers to values to be passed -down to the left and right descendants (if any). -The value -.I d -is the value passed down to the root. -.a -.I Fwalk -is used for a number of tree labeling and debugging activities. -.PP -The other major data structure, the symbol table, exists only in pass one, -and will be discussed later. -.SH -Pass One -.PP -The first pass does lexical analysis, parsing, symbol table maintenance, -tree building, optimization, and a number of machine dependent things. -This pass is largely machine independent, and the machine independent -sections can be pretty successfully ignored. -Thus, they will be only sketched here. -.SH -Lexical Analysis -.PP -The lexical analyzer is a conceptually simple routine that reads the input -and returns the tokens of the C language as it encounters them: -names, constants, operators, and keywords. -The conceptual simplicity of this job is confounded a bit by several -other simple jobs that unfortunately must go on simultaneously. -These include -.IP \(bu -Keeping track of the current filename and line number, and occasionally -setting this information as the result of preprocessor control lines. -.IP \(bu -Skipping comments. -.IP \(bu -Properly dealing with octal, decimal, hex, floating -point, and character constants, as well as character strings. -.PP -To achieve speed, the program maintains several tables -that are indexed into by character value, to tell -the lexical analyzer what to do next. -To achieve portability, these tables must be initialized -each time the compiler is run, in order that the -table entries reflect the local character set values. -.SH -Parsing -.PP -As mentioned above, the parser is generated by Yacc from the grammar on file -.I cgram.y. -The grammar is relatively readable, but contains some unusual features -that are worth comment. -.PP -Perhaps the strangest feature of the grammar is the treatment of -declarations. -The problem is to keep track of the basic type -and the storage class while interpreting the various -stars, brackets, and parentheses that may surround a given name. -The entire declaration mechanism must be recursive, -since declarations may appear within declarations of -structures and unions, or even within a -.B sizeof -construction -inside a dimension in another declaration! -.PP -There are some difficulties in using a bottom-up parser, -such as produced by Yacc, to handle constructions where a lot -of left context information must be kept around. -The problem is that the original PDP-11 compiler is top-down in -implementation, and some of the semantics of C reflect this. -In a top-down parser, the input rules are restricted -somewhat, but one can naturally associate temporary -storage with a rule at a very early stage in the recognition -of that rule. -In a bottom-up parser, there is more freedom in the -specification of rules, but it is more difficult to know what -rule is being matched until the entire rule is seen. -The parser described by -.I cgram.c -makes effective use of -the bottom-up parsing mechanism in some places (notably the treatment -of expressions), but struggles against the -restrictions in others. -The usual result is that it is necessary to run a stack of values -``on the side'', independent of the Yacc value stack, -in order to be able to store and access information -deep within inner constructions, where the relationship of the -rules being recognized to the total picture is not yet clear. -.PP -In the case of declarations, the attribute information -(type, etc.) for a declaration is carefully -kept immediately to the left of the declarator -(that part of the declaration involving the name). -In this way, when it is time to declare the name, the -name and the type information can be quickly brought together. -The ``$0'' mechanism of Yacc is used to accomplish this. -The result is not pretty, but it works. -The storage class information changes more slowly, -so it is kept in an external variable, and stacked if -necessary. -Some of the grammar could be considerably cleaned up by using -some more recent features of Yacc, notably actions within -rules and the ability to return multiple values for actions. -.PP -A stack is also used to keep track of the current location -to be branched to when a -.B break -or -.B continue -statement is processed. -.PP -This use of external stacks dates from the time when Yacc did not permit -values to be structures. -Some, or most, of this use of external stacks -could be eliminated by redoing the grammar to use the mechanisms now provided. -There are some areas, however, particularly the processing of structure, union, -and enum declarations, function -prologs, and switch statement processing, when having -all the affected data together in an array speeds later -processing; in this case, use of external storage seems essential. -.PP -The -.I cgram.y -file also contains some small functions used as -utility functions in the parser. -These include routines for saving case values and labels in processing switches, -and stacking and popping -values on the external stack described above. -.SH -Storage Classes -.PP -C has a finite, but fairly extensive, number of storage classes -available. -One of the compiler design decisions was to -process the storage class information totally in the first pass; -by the second pass, this information must have -been totally dealt with. -This means that all of the storage allocation must take place in the first -pass, so that references to automatics and -parameters can be turned into references to cells lying a certain -number of bytes offset from certain machine registers. -Much of this transformation is machine dependent, and strongly -depends on the storage class. -.PP -The classes include EXTERN (for externally declared, but not defined variables), -EXTDEF (for external definitions), and similar distinctions for -USTATIC and STATIC, UFORTRAN and FORTRAN (for fortran functions) and -ULABEL and LABEL. -The storage classes REGISTER and AUTO are obvious, as -are STNAME, UNAME, and ENAME (for structure, union, and enumeration -tags), and the associated MOS, MOU, and MOE (for the members). -TYPEDEF is treated as a storage class as well. -There are two special storage classes: PARAM and SNULL. -SNULL is used to distinguish the case where no explicit -storage class has been given; before an entry is made -in the symbol table the true storage class is discovered. -Similarly, PARAM is used for the temporary entry in the symbol -table made before the declaration of function parameters is completed. -.PP -The most complexity in the storage class process comes from -bit fields. -A separate storage class is kept for each width bit field; -a -.I k -bit bit field has storage class -.I k -plus FIELD. -This enables the size to be quickly recovered from the storage class. -.SH -Symbol Table Maintenance. -.PP -The symbol table routines do far more than simply enter -names into the symbol table; -considerable semantic processing and checking is done as well. -For example, if a new declaration comes in, it must be checked -to see if there is a previous declaration of the same symbol. -If there is, there are many cases. -The declarations may agree and be compatible (for example, -an extern declaration can appear twice) in which case the -new declaration is ignored. -The new declaration may add information (such as an explicit array -dimension) to an already present declaration. -The new declaration may be different, but still correct (for example, -an extern declaration of something may be entered, -and then later the definition may be seen). -The new declaration may be incompatible, but appear in an inner block; -in this case, the old declaration is carefully hidden away, and -the new one comes into force until the block is left. -Finally, the declarations may be incompatible, and an error -message must be produced. -.PP -A number of other factors make for additional complexity. -The type declared by the user is not always the type -entered into the symbol table (for example, if an formal parameter to a function -is declared to be an array, C requires that this be changed into -a pointer before entry in the symbol table). -Moreover, there are various kinds of illegal types that -may be declared which are difficult to -check for syntactically (for example, -a function returning an array). -Finally, there is a strange feature in C that requires -structure tag names and member names for structures and unions -to be taken from a different logical symbol table than ordinary identifiers. -Keeping track of which kind of name is involved is a bit of struggle -(consider typedef names used within structure declarations, for example). -.PP -The symbol table handling routines have been rewritten a number of times to -extend features, improve performance, and fix bugs. -They address the above problems with reasonable effectiveness -but a singular lack of grace. -.PP -When a name is read in the input, it is hashed, and the -routine -.I lookup -is called, together with a flag which tells which symbol table -should be searched (actually, both symbol tables are stored in one, and a flag -is used to distinguish individual entries). -If the name is found, -.I lookup -returns the index to the entry found; otherwise, it makes -a new entry, marks it UNDEF (undefined), and returns the -index of the new entry. -This index is stored in the -.I rval -field of a NAME node. -.PP -When a declaration is being parsed, this NAME node is -made part of a tree with UNARY MUL nodes for each *, -LB nodes for each array descriptor (the right descendant -has the dimension), and UNARY CALL nodes for each function -descriptor. -This tree is passed to the routine -.I tymerge , -along with the attribute type of the whole declaration; -this routine collapses the tree to a single node, by calling -.I tyreduce , -and then modifies the type to reflect the overall -type of the declaration. -.PP -Dimension and size information is stored in a table called -.I dimtab . -To properly describe a type in C, one needs not just the -type information but also size information (for structures and -enums) and dimension information (for arrays). -Sizes and offsets are dealt with in the compiler by -giving the associated indices into -.I dimtab . -.I Tymerge -and -.I tyreduce -call -.I dstash -to put the discovered dimensions away into the -.I dimtab -array. -.I Tymerge -returns a pointer to a single node that contains -the symbol table index in its -.I rval -field, and the size and dimension indices in -fields -.I csiz -and -.I cdim , -respectively. -This information is properly considered part of the type in the first pass, -and is carried around at all times. -.PP -To enter an element into the symbol table, the routine -.I defid -is called; it is handed a storage class, and a pointer -to the node produced by -.I tymerge . -.I Defid -calls -.I fixtype , -which adjusts and checks the given type depending on the storage -class, and converts null types appropriately. -It then calls -.I fixclass , -which does a similar job for the storage class; -it is here, for example, that -register declarations are either allowed or changed -to auto. -.PP -The new declaration is now compared against an older one, -if present, and several pages of validity checks performed. -If the definitions are compatible, with possibly some added information, -the processing is straightforward. -If the definitions differ, the block levels of the -current and the old declaration are compared. -The current block level is kept in -.I blevel , -an external variable; the old declaration level is kept in the symbol table. -Block level 0 is for external declarations, 1 is for arguments to functions, -and 2 and above are blocks within a function. -If the current block level is the same as the old declaration, an error -results. -If the current block level is higher, the new declaration overrides the old. -This is done by marking the old symbol table entry ``hidden'', and making -a new entry, marked ``hiding''. -.I Lookup -will skip over hidden entries. -When a block is left, the symbol table is searched, -and any entries defined in that block are destroyed; if -they hid other entries, the old entries are ``unhidden''. -.PP -This nice block structure is warped a bit because labels -do not follow the block structure rules (one can do a -.B goto -into -a block, for example); -default definitions of functions in inner blocks also persist clear out to the outermost scope. -This implies that cleaning up the symbol table after block exit is more -subtle than it might first seem. -.PP -For successful new definitions, -.I defid -also initializes a ``general purpose'' field, -.I offset , -in the symbol table. -It contains the stack offset for automatics and parameters, the register number -for register variables, the bit offset into the structure for -structure members, and -the internal label number for static variables and labels. -The offset field is set by -.I falloc -for bit fields, and -.I dclstruct -for structures and unions. -.PP -The symbol table entry itself thus contains -the name, type word, size and dimension offsets, -offset value, and declaration block level. -It also has a field of flags, describing what symbol table the -name is in, and whether the entry is hidden, or hides another. -Finally, a field gives the line number of the last use, -or of the definition, of the name. -This is used mainly for diagnostics, but is useful to -.I lint -as well. -.PP -In some special cases, there is more than the above amount of information kept -for the use of the compiler. -This is especially true with structures; for use in initialization, -structure declarations must have access to a list of the members of the -structure. -This list is also kept in -.I dimtab . -Because a structure can be mentioned long before the -members are known, it is necessary to have another -level of indirection in the table. -The two words following the -.I csiz -entry in -.I dimtab -are used to hold the alignment of the structure, and -the index in dimtab of the list of members. -This list contains the symbol table indices for the structure members, terminated by a -\-1. -.SH -Tree Building -.PP -The portable compiler transforms expressions -into expression trees. -As the parser recognizes each rule making up an -expression, -it calls -.I buildtree -which is given an operator number, and pointers to the -left and right descendants. -.I Buildtree -first examines the left and right descendants, -and, if they are both constants, and the operator -is appropriate, simply does the constant -computation at compile time, and returns -the result as a constant. -Otherwise, -.I buildtree -allocates a node for the head of the tree, -attaches the descendants to it, and ensures that -conversion operators are generated if needed, and that -the type of the new node is consistent with the types -of the operands. -There is also a considerable amount of semantic complexity here; -many combinations of types are illegal, and the portable -compiler makes a strong effort to check -the legality of expression types completely. -This is done both for -.I lint -purposes, and to prevent such semantic errors -from being passed through to the code generator. -.PP -The heart of -.I buildtree -is a large table, accessed by the routine -.I opact . -This routine maps the types of the left and right -operands into a rather smaller set of descriptors, and then -accesses a table (actually encoded in a switch statement) which -for each operator and pair of types causes -an action to be returned. -The actions are logical or's of a number of -separate actions, which may be -carried out by -.I buildtree . -These component actions may include -checking the left side to ensure that it -is an lvalue (can be stored into), -applying a type conversion to the left or right operand, -setting the type of the new node to -the type of the left or right operand, calling various -routines to balance the types of the left and right operands, -and suppressing the ordinary conversion -of arrays and function operands to pointers. -An important operation is OTHER, which causes -some special code to be invoked -in -.I buildtree , -to handle issues which are unique to a particular operator. -Examples of this are -structure and union reference -(actually handled by -the routine -.I stref ), -the building of NAME, ICON, STRING and FCON (floating point constant) nodes, -unary * and &, structure assignment, and calls. -In the case of unary * and &, -.I buildtree -will cancel a * applied to a tree, the top node of which -is &, and conversely. -.PP -Another special operation is -PUN; this -causes the compiler to check for type mismatches, -such as intermixing pointers and integers. -.PP -The treatment of conversion operators is still a rather -strange area of the compiler (and of C!). -The recent introduction of type casts has only confounded this -situation. -Most of the conversion operators are generated by -calls to -.I tymatch -and -.I ptmatch , -both of which are given a tree, and asked to make the -operands agree in type. -.I Ptmatch -treats the case where one of the operands is a pointer; -.I tymatch -treats all other cases. -Where these routines have decided on the -proper type for an operand, they call -.I makety , -which is handed a tree, and a type word, dimension offset, and size offset. -If necessary, it inserts a conversion operation to make the -types correct. -Conversion operations are never inserted on the left side of -assignment operators, however. -There are two conversion operators used; -PCONV, if the conversion is to a non-basic type (usually a pointer), -and -SCONV, if the conversion is to a basic type (scalar). -.PP -To allow for maximum flexibility, every node produced by -.I buildtree -is given to a machine dependent routine, -.I clocal , -immediately after it is produced. -This is to allow more or less immediate rewriting of those -nodes which must be adapted for the local machine. -The conversion operations are given to -.I clocal -as well; on most machines, many of these -conversions do nothing, and should be thrown away -(being careful to retain the type). -If this operation is done too early, -however, -later calls to -.I buildtree -may get confused about correct type of the -subtrees; -thus -.I clocal -is given the conversion ops only after the entire tree is built. -This topic will be dealt with in more detail later. -.SH -Initialization -.PP -Initialization is one of the messier areas in the portable compiler. -The only consolation is that most of the mess takes place -in the machine independent part, where it -is may be safely ignored by the implementor of the -compiler for a particular machine. -.PP -The basic problem is that the semantics of initialization -really calls for a co-routine structure; one collection -of programs reading constants from the input stream, while another, -independent set of programs places these constants into the -appropriate spots in memory. -The dramatic differences in the local assemblers also -come to the fore here. -The parsing problems are dealt with by keeping a rather -extensive stack containing the current -state of the initialization; the assembler -problems are dealt with by having a fair number of machine dependent routines. -.PP -The stack contains the symbol table number, -type, dimension index, and size index for the current identifier -being initialized. -Another entry has the offset, in bits, of the beginning -of the current identifier. -Another entry keeps track of how many elements have been seen, -if the current identifier is an array. -Still another entry keeps track of the current -member of a structure being initialized. -Finally, there is an entry containing flags -which keep track of the current state of the initialization process -(e.g., tell if a } has been seen for the current identifier.) -.PP -When an initialization begins, the routine -.I beginit -is called; it handles the alignment restrictions, if -any, and calls -.I instk -to create the stack entry. -This is done by first making an entry on the top of the stack for the item -being initialized. -If the top entry is an array, another entry is made on the stack -for the first element. -If the top entry is a structure, another entry is made on the -stack for the first member of the structure. -This continues until the top element of the stack is a scalar. -.I Instk -then returns, and the parser begins collecting initializers. -.PP -When a constant is obtained, the routine -.I doinit -is called; it examines the stack, and does whatever -is necessary to assign the current constant to the -scalar on the top of the stack. -.I gotscal -is then called, which rearranges the stack so that the -next scalar to be initialized gets placed on top of the stack. -This process continues until the end of the initializers; -.I endinit -cleans up. -If a { or } is encountered in the -string of initializers, it is handled by calling -.I ilbrace -or -.I irbrace , -respectively. -.PP -A central issue is the treatment of the ``holes'' that -arise as a result of alignment restrictions or explicit -requests for holes in bit fields. -There is a global variable, -.I inoff , -which contains the current offset in the initialization -(all offsets in the first pass of the compiler are in bits). -.I Doinit -figures out from the top entry on the stack the expected -bit offset of the next identifier; it calls the -machine dependent -routine -.I inforce -which, in a machine dependent way, forces -the assembler to -set aside space if need be so that the -next scalar seen will go into the appropriate -bit offset position. -The scalar itself is passed to one of the machine dependent -routines -.I fincode -(for floating point initialization), -.I incode -(for fields, and other initializations less than an int in -size), -and -.I cinit -(for all other initializations). -The size is passed to all these routines, and it is up to -the machine dependent routines to ensure that the -initializer occupies exactly the right size. -.PP -Character strings represent a bit of an exception. -If a character string is seen as the initializer for -a pointer, the characters making up the string must -be put out under a different location counter. -When the lexical analyzer sees the quote at the head -of a character string, it returns the token STRING, -but does not do anything with the contents. -The parser calls -.I getstr , -which sets up the appropriate location counters -and flags, and calls -.I lxstr -to read and process the contents of the string. -.PP -If the string is being used to initialize a character array, -.I lxstr -calls -.I putbyte , -which in effect simulates -.I doinit -for each character read. -If the string is used to initialize a character pointer, -.I lxstr -calls a machine dependent routine, -.I bycode , -which stashes away each character. -The pointer to this string is then returned, -and processed normally by -.I doinit . -.PP -The null at the end of the string is treated as if it -were read explicitly by -.I lxstr . -.SH -Statements -.PP -The first pass addresses four main areas; declarations, expressions, initialization, and statements. -The statement processing is relatively simple; most of it is carried out in the -parser directly. -Most of the logic is concerned with allocating -label numbers, defining the labels, and branching appropriately. -An external symbol, -.I reached , -is 1 if a statement can be reached, 0 otherwise; this is -used to do a bit of simple flow analysis as the program is being parsed, -and also to avoid generating the subroutine return sequence if the subroutine -cannot ``fall through'' the last statement. -.PP -Conditional branches are handled by generating an expression -node, CBRANCH, -whose left descendant is the conditional expression and the -right descendant is an ICON node containing the internal label -number to be branched to. -For efficiency, the semantics are that -the label is gone to if the condition is -.I false . -.PP -The switch statement is -compiled by collecting the case entries, and an indication as to whether -there is a default case; -an internal label number is generated for each of these, -and remembered in a big array. -The expression comprising the value to be switched on is -compiled when the switch keyword is encountered, -but the expression tree is headed by -a special node, FORCE, which tells the code generator to -put the expression value into a special distinguished -register (this same mechanism is used for processing the -return statement). -When the end of the switch block is reached, the array -containing the case values is sorted, and checked for -duplicate entries (an error); if all is -correct, the machine dependent routine -.I genswitch -is called, with this array of labels and values in increasing order. -.I Genswitch -can assume that the value to be tested is already in the -register which is the usual integer return value register. -.SH -Optimization -.PP -There is a machine independent file, -.I optim.c , -which contains a relatively short optimization routine, -.I optim . -Actually the word optimization is something of a misnomer; -the results are not optimum, only improved, and the -routine is in fact not optional; it must -be called for proper operation of the compiler. -.PP -.I Optim -is called after an expression tree is built, but -before the code generator is called. -The essential part of its job is to call -.I clocal -on the conversion operators. -On most machines, the treatment of -& is also essential: -by this time in the processing, the only node which -is a legal descendant of & is NAME. -(Possible descendants of * have been eliminated by -.I buildtree.) -The address of a static name is, almost by definition, a -constant, and can be represented by an ICON node on most machines -(provided that the loader has enough power). -Unfortunately, this is not universally true; on some machine, such as the IBM 370, -the issue of addressability rears its ugly head; -thus, before turning a NAME node into an ICON node, -the machine dependent function -.I andable -is called. -.PP -The optimization attempts of -.I optim -are currently quite limited. -It is primarily concerned with improving the behavior of -the compiler with operations one of whose arguments is a constant. -In the simplest case, the constant is placed on the right if the -operation is commutative. -The compiler also makes a limited search for expressions -such as -.DS -.I "( x + a ) + b" -.DE -where -.I a -and -.I b -are constants, and attempts to combine -.I a -and -.I b -at compile time. -A number of special cases are also examined; -additions of 0 and multiplications by 1 are removed, -although the correct processing of these cases to get -the type of the resulting tree correct is -decidedly nontrivial. -In some cases, the addition or multiplication must be replaced by -a conversion op to keep the types from becoming -fouled up. -Finally, in cases where a relational operation is being done, -and one operand is a constant, the operands are permuted, and the operator altered, if necessary, -to put the constant on the right. -Finally, multiplications by a power of 2 are changed to shifts. -.PP -There are dozens of similar optimizations that can be, and should be, -done. -It seems likely that this routine will be expanded in the relatively near future. -.SH -Machine Dependent Stuff -.PP -A number of the first pass machine dependent routines have been discussed above. -In general, the routines are short, and easy to adapt from -machine to machine. -The two exceptions to this general rule are -.I clocal -and -the function prolog and epilog generation routines, -.I bfcode -and -.I efcode . -.PP -.I Clocal -has the job of rewriting, if appropriate and desirable, -the nodes constructed by -.I buildtree . -There are two major areas where this -is important; -NAME nodes and conversion operations. -In the case of NAME nodes, -.I clocal -must rewrite the NAME node to reflect the -actual physical location of the name in the machine. -In effect, the NAME node must be examined, the symbol table -entry found (through the -.I rval -field of the node), -and, based on the storage class of the node, -the tree must be rewritten. -Automatic variables and parameters are typically -rewritten by treating the reference to the variable as -a structure reference, off the register which -holds the stack or argument pointer; -the -.I stref -routine is set up to be called in this way, and to -build the appropriate tree. -In the most general case, the tree consists -of a unary * node, whose descendant is -a + node, with the stack or argument register as left operand, -and a constant offset as right operand. -In the case of LABEL and internal static nodes, the -.I rval -field is rewritten to be the negative of the internal -label number; a negative -.I rval -field is taken to be an internal label number. -Finally, a name of class REGISTER must be converted into a REG node, -and the -.I rval -field replaced by the register number. -In fact, this part of the -.I clocal -routine is nearly machine independent; only for machines -with addressability problems (IBM 370 again!) does it -have to be noticeably different, -.a -.PP -The conversion operator treatment is rather tricky. -It is necessary to handle the application of conversion operators -to constants in -.I clocal , -in order that all constant expressions can have their values known -at compile time. -In extreme cases, this may mean that some simulation of the -arithmetic of the target machine might have to be done in a -cross-compiler. -In the most common case, -conversions from pointer to pointer do nothing. -For some machines, however, conversion from byte pointer to short or long -pointer might require a shift or rotate operation, which would -have to be generated here. -.PP -The extension of the portable compiler to machines where the size of a pointer -depends on its type would be straightforward, but has not yet been done. -.PP -The other major machine dependent issue involves the subroutine prolog and epilog -generation. -The hard part here is the design of the stack frame -and calling sequence; this design issue is discussed elsewhere. -.[ -Johnson Lesk Ritchie calling sequence -.] -The routine -.I bfcode -is called with the number of arguments -the function is defined with, and -an array containing the symbol table indices of the -declared parameters. -.I Bfcode -must generate the code to establish the new stack frame, -save the return address and previous stack pointer -value on the stack, and save whatever -registers are to be used for register variables. -The stack size and the number of register variables is not -known when -.I bfcode -is called, so these numbers must be -referred to by assembler constants, which are -defined when they are known (usually in the second pass, -after all register variables, automatics, and temporaries have been seen). -The final job is to find those parameters which may have been declared -register, and generate the code to initialize -the register with the value passed on the stack. -Once again, for most machines, the general logic of -.I bfcode -remains the same, but the contents of the -.I printf -calls in it will change from machine to machine. -.I efcode -is rather simpler, having just to generate the default -return at the end of a function. -This may be nontrivial in the case of a function returning a structure or union, however. -.PP -There seems to be no really good place to discuss structures and unions, but -this is as good a place as any. -The C language now supports structure assignment, -and the passing of structures as arguments to functions, -and the receiving of structures back from functions. -This was added rather late to C, and thus to the portable compiler. -Consequently, it fits in less well than the older features. -Moreover, most of the burden of making these features work is -placed on the machine dependent code. -.PP -There are both conceptual and practical problems. -Conceptually, the compiler is structured around -the idea that to compute something, you put it into -a register and work on it. -This notion causes a bit of trouble on some machines (e.g., machines with 3-address opcodes), but -matches many machines quite well. -Unfortunately, this notion breaks down with structures. -The closest that one can come is to keep the addresses of the -structures in registers. -The actual code sequences used to move structures vary from the -trivial (a multiple byte move) to the horrible (a -function call), and are very machine dependent. -.PP -The practical problem is more painful. -When a function returning a structure is called, this function -has to have some place to put the structure value. -If it places it on the stack, it has difficulty popping its stack frame. -If it places the value in a static temporary, the routine fails to be -reentrant. -The most logically consistent way of implementing this is for the -caller to pass in a pointer to a spot where the called function -should put the value before returning. -This is relatively straightforward, although a bit tedious, to implement, -but means that the caller must have properly declared -the function type, even if the value is never used. -On some machines, such as the Interdata 8/32, the return value -simply overlays the argument region (which on the 8/32 is part -of the caller's stack frame). -The caller takes care of leaving enough room if the returned value is larger -than the arguments. -This also assumes that the caller know and declares the -function properly. -.PP -The PDP-11 and the VAX have stack hardware which is used in function calls and returns; -this makes it very inconvenient to -use either of the above mechanisms. -In these machines, a static area within the called functionis allocated, and -the function return value is copied into it on return; the function -returns the address of that region. -This is simple to implement, but is non-reentrant. -However, the function can now be called as a subroutine -without being properly declared, without the disaster which would otherwise ensue. -No matter what choice is taken, the convention is that the function -actually returns the address of the return structure value. -.PP -In building expression trees, the portable compiler takes a bit for granted about -structures. -It assumes that functions returning structures -actually return a pointer to the structure, and it assumes that -a reference to a structure is actually a reference to its address. -The structure assignment operator is rebuilt so that the left -operand is the structure being assigned to, but the -right operand is the address of the structure being assigned; -this makes it easier to deal with -.DS -.I "a = b = c" -.DE -and similar constructions. -.PP -There are four special tree nodes associated with these -operations: -STASG (structure assignment), STARG (structure argument -to a function call), and STCALL and UNARY STCALL -(calls of a function with nonzero and zero arguments, respectively). -These four nodes are unique in that the size and alignment information, which can be determined by -the type for all other objects in C, -must be known to carry out these operations; special -fields are set aside in these nodes to contain -this information, and special -intermediate code is used to transmit this information. -.SH -First Pass Summary -.PP -There are may other issues which have been ignored here, -partly to justify the title ``tour'', and partially -because they have seemed to cause little trouble. -There are some debugging flags -which may be turned on, by giving the compiler's first pass -the argument -.DS -\-X[flags] -.DE -Some of the more interesting flags are -\-Xd for the defining and freeing of symbols, -\-Xi for initialization comments, and -\-Xb for various comments about the building of trees. -In many cases, repeating the flag more than once gives more information; -thus, -\-Xddd gives more information than \-Xd. -In the two pass version of the compiler, the -flags should not be set when the output is sent to the second -pass, since the debugging output and the intermediate code both go onto the standard -output. -.PP -We turn now to consideration of the second pass. //GO.SYSIN DD porttour1 echo porttour2 sed 's/.//' >porttour2 <<'//GO.SYSIN DD porttour2' -.SH -Pass Two -.PP -Code generation is far less well understood than -parsing or lexical analysis, and for this reason -the second pass is far harder to discuss in a file by file manner. -A great deal of the difficulty is in understanding the issues -and the strategies employed to meet them. -Any particular function is likely to be reasonably straightforward. -.PP -Thus, this part of the paper will concentrate a good deal on the broader -aspects of strategy in the code generator, -and will not get too intimate with the details. -.SH -Overview. -.PP -It is difficult to organize a code generator to be flexible enough to -generate code for a large number of machines, -and still be efficient for any one of them. -Flexibility is also important when it comes time to tune -the code generator to improve the output code quality. -On the other hand, too much flexibility can lead to semantically -incorrect code, and potentially a combinatorial explosion in the -number of cases to be considered in the compiler. -.PP -One goal of the code generator is to have a high degree of correctness. -It is very desirable to have the compiler detect its own inability to -generate correct code, rather than to produce incorrect code. -This goal is achieved by having a simple model of the job -to be done (e.g., an expression tree) -and a simple model of the machine state -(e.g., which registers are free). -The act of generating an instruction performs a transformation -on the tree and the machine state; -hopefully, the tree eventually gets -reduced to a single node. -If each of these instruction/transformation pairs is correct, -and if the machine state model really represents the actual machine, -and if the transformations reduce the input tree to the desired single node, then the -output code will be correct. -.PP -For most real machines, there is no definitive theory of code generation that -encompasses all the C operators. -Thus the selection of which instruction/transformations to generate, and in what -order, will have a heuristic flavor. -If, for some expression tree, no transformation applies, or, more -seriously, if the heuristics select a sequence of instruction/transformations -that do not in fact reduce the tree, the compiler -will report its inability to generate code, and abort. -.PP -A major part of the code generator is concerned with the model and the transformations, -\(em most of this is machine independent, or depends only on simple tables. -The flexibility comes from the heuristics that guide the transformations -of the trees, the selection of subgoals, and the ordering of the computation. -.SH -The Machine Model -.PP -The machine is assumed to have a number of registers, of at most two different -types: -.I A -and -.I B . -Within each register class, there may be scratch (temporary) registers and dedicated registers -(e.g., register variables, the stack pointer, etc.). -Requests to allocate and free registers involve only the temporary registers. -.PP -Each of the registers in the machine is given a name and a number -in the -.I mac2defs -file; the numbers are used as indices into various tables -that describe the registers, so they should be kept small. -One such table is the -.I rstatus -table on file -.I local2.c . -This table is indexed by register number, and -contains expressions made up from manifest constants describing the register types: -SAREG for dedicated AREG's, SAREG\(orSTAREG for scratch AREGS's, -and SBREG and SBREG\(orSTBREG similarly for BREG's. -There are macros that access this information: -.I isbreg(r) -returns true if register number -.I r -is a BREG, and -.I istreg(r) -returns true if register number -.I r -is a temporary AREG or BREG. -Another table, -.I rnames , -contains the register names; this is used when -putting out assembler code and diagnostics. -.PP -The usage of registers is kept track of by an array called -.I busy . -.I Busy[r] -is the number of uses of register -.I r -in the current tree being processed. -The allocation and freeing of registers will be discussed later as part of the code generation -algorithm. -.SH -General Organization -.PP -As mentioned above, the second pass reads lines from -the intermediate file, copying through to the output unchanged -any lines that begin with a `)', and making note of the -information about stack usage and register allocation contained on -lines beginning with `]' and `['. -The expression trees, whose beginning is indicated by a line beginning with `.', -are read and rebuilt into trees. -If the compiler is loaded as one pass, the expression trees are -immediately available to the code generator. -.PP -The actual code generation is done by a hierarchy of routines. -The routine -.I delay -is first given the tree; it attempts to delay some postfix -++ and \-\- computations that might reasonably be done after the -smoke clears. -It also attempts to handle comma (,) operators by computing the -left side expression first, and then rewriting the tree -to eliminate the operator. -.I Delay -calls -.I codgen -to control the actual code generation process. -.I Codgen -takes as arguments a pointer to the expression tree, -and a second argument that, for socio-historical reasons, is called a -.I cookie . -The cookie describes a set of goals that would be acceptable -for the code generation: these are assigned to individual bits, -so they may be logically or'ed together to form a large number of possible goals. -Among the possible goals are FOREFF (compute for side effects only; -don't worry about the value), INTEMP (compute and store value into a temporary location -in memory), INAREG (compute into an A register), INTAREG (compute into a scratch -A register), INBREG and INTBREG similarly, FORCC (compute for condition codes), -and FORARG (compute it as a function argument; e.g., stack it if appropriate). -.PP -.I Codgen -first canonicalizes the tree by calling -.I canon . -This routine looks for certain transformations that might now be -applicable to the tree. -One, which is very common and very powerful, is to -fold together an indirection operator (UNARY MUL) -and a register (REG); in most machines, this combination is -addressable directly, and so is similar to a NAME in its -behavior. -The UNARY MUL and REG are folded together to make -another node type called OREG. -In fact, in many machines it is possible to directly address not just the -cell pointed to by a register, but also cells differing by a constant -offset from the cell pointed to by the register. -.I Canon -also looks for such cases, calling the -machine dependent routine -.I notoff -to decide if the offset is acceptable (for example, in the IBM 370 the offset -must be between 0 and 4095 bytes). -Another optimization is to replace bit field operations -by shifts and masks if the operation involves extracting the field. -Finally, a machine dependent routine, -.I sucomp , -is called that computes the Sethi-Ullman numbers -for the tree (see below). -.PP -After the tree is canonicalized, -.I codgen -calls the routine -.I store -whose job is to select a subtree of the tree to be computed -and (usually) stored before beginning the -computation of the full tree. -.I Store -must return a tree that can be computed without need -for any temporary storage locations. -In effect, the only store operations generated while processing the subtree must be as a response -to explicit assignment operators in the tree. -This division of the job marks one of the more significant, and successful, -departures from most other compilers. -It means that the code generator can operate -under the assumption that there are enough -registers to do its job, without worrying about -temporary storage. -If a store into a temporary appears in the output, it is always -as a direct result of logic in the -.I store -routine; this makes debugging easier. -.PP -One consequence of this organization is that -code is not generated by a treewalk. -There are theoretical results that support this decision. -.[ -Aho Johnson Optimal Code Trees jacm -.] -It may be desirable to compute several subtrees and store -them before tackling the whole tree; -if a subtree is to be stored, this is known before the code -generation for the subtree is begun, and the subtree is computed -when all scratch registers are available. -.PP -The -.I store -routine decides what subtrees, if any, should be -stored by making use of numbers, -called -.I "Sethi-Ullman numbers" , -that give, for each subtree of an expression tree, -the minimum number of scratch registers required to -compile the subtree, without any stores into temporaries. -.[ -Sethi Ullman jacm 1970 -.] -These numbers are computed by the machine-dependent -routine -.I sucomp , -called by -.I canon . -The basic notion is that, knowing the Sethi-Ullman numbers for -the descendants of a node, and knowing the operator of the -node and some information about the machine, the -Sethi-Ullman number of the node itself can be computed. -If the Sethi-Ullman number for a tree exceeds the number of scratch -registers available, some subtree must be stored. -Unfortunately, the theory behind the Sethi-Ullman numbers -applies only to uselessly simple machines and operators. -For the rich set of C operators, and for machines with -asymmetric registers, register pairs, different kinds of registers, -and exceptional forms of addressing, -the theory cannot be applied directly. -The basic idea of estimation is a good one, however, -and well worth applying; the application, especially -when the compiler comes to be tuned for high code -quality, goes beyond the park of theory into the -swamp of heuristics. -This topic will be taken up again later, when more of the compiler -structure has been described. -.PP -After examining the Sethi-Ullman numbers, -.I store -selects a subtree, if any, to be stored, and returns the subtree and the associated cookie in -the external variables -.I stotree -and -.I stocook . -If a subtree has been selected, or if -the whole tree is ready to be processed, the -routine -.I order -is called, with a tree and cookie. -.I Order -generates code for trees that -do not require temporary locations. -.I Order -may make recursive calls on itself, and, -in some cases, on -.I codgen ; -for example, when processing the operators &&, \(or\(or, and comma (`,'), that have -a left to right evaluation, it is -incorrect for -.I store -examine the right operand for subtrees to be stored. -In these cases, -.I order -will call -.I codgen -recursively when it is permissible to work on the right operand. -A similar issue arises with the ? : operator. -.PP -The -.I order -routine works by matching the current tree with -a set of code templates. -If a template is discovered that will -match the current tree and cookie, the associated assembly language -statement or statements are generated. -The tree is then rewritten, -as specified by the template, to represent the effect of the output instruction(s). -If no template match is found, first an attempt is made to find a match with a -different cookie; for example, in order to compute -an expression with cookie INTEMP (store into a temporary storage location), -it is usually necessary to compute the expression into a scratch register -first. -If all attempts to match the tree fail, the heuristic part of -the algorithm becomes dominant. -Control is typically given to one of a number of machine-dependent routines -that may in turn recursively call -.I order -to achieve a subgoal of the computation (for example, one of the -arguments may be computed into a temporary register). -After this subgoal has been achieved, the process begins again with the -modified tree. -If the machine-dependent heuristics are unable to reduce the tree further, -a number of default rewriting rules may be considered appropriate. -For example, if the left operand of a + is a scratch -register, the + can be replaced by a += operator; -the tree may then match a template. -.PP -To close this introduction, we will discuss the steps in compiling -code for the expression -.DS -\fIa\fR += \fIb\fR -.DE -where -.I a -and -.I b -are static variables. -.PP -To begin with, the whole expression tree is examined with cookie FOREFF, and -no match is found. Search with other cookies is equally fruitless, so an -attempt at rewriting is made. -Suppose we are dealing with the Interdata 8/32 for the moment. -It is recognized that the left hand and right hand sides of the += operator -are addressable, and in particular the left hand side has no -side effects, so it is permissible to rewrite this as -.DS -\fIa\fR = \fIa\fR + \fIb\fR -.DE -and this is done. -No match is found on this tree either, so a machine dependent rewrite is done; it is recognized -that the left hand side of the assignment is addressable, but the right hand side is not -in a register, so -.I order -is called recursively, being asked to put the right -hand side of the assignment into a register. -This invocation of -.I order -searches the tree for a match, and fails. -The machine dependent rule for + -notices that the right hand operand is addressable; -it decides to put the left operand into a scratch register. -Another recursive call to -.I order -is made, with the tree -consisting solely of the leaf -.I a , -and the cookie asking that the value be placed into a scratch register. -This now matches a template, and a load instruction is emitted. -The node consisting of -.I a -is rewritten in place to represent the register into which -.I a -is loaded, -and this third call to -.I order -returns. -The second call to -.I order -now finds that it has the tree -.DS -\fBreg\fR + \fIb\fR -.DE -to consider. -Once again, there is no match, but the default rewriting rule rewrites -the + as a += operator, since the left operand is a scratch register. -When this is done, there is a match: in fact, -.DS -\fBreg\fR += \fIb\fR -.DE -simply describes the effect of the add instruction -on a typical machine. -After the add is emitted, the tree is rewritten -to consist merely of the register node, since the result of the add -is now in the register. -This agrees with the cookie passed to the second invocation of -.I order , -so this invocation -terminates, returning to the first level. -The original tree has now -become -.DS -\fIa\fR = \fBreg\fR -.DE -which matches a template for the store instruction. -The store is output, and the tree rewritten to become -just a single register node. -At this point, since the top level call to -.I order -was -interested only in side effects, the call to -.I order -returns, and the code generation is completed; -we have generated a load, add, and store, as might have been expected. -.PP -The effect of machine architecture on this is considerable. -For example, on the Honeywell 6000, the machine dependent heuristics recognize that there is an ``add to storage'' -instruction, so the strategy is quite different; -.I b -is loaded in to -a register, and then an add to storage instruction generated -to add this register in to -.I a . -The transformations, involving as they do the semantics of C, -are largely machine independent. -The decisions as to when to use them, however, are -almost totally machine dependent. -.PP -Having given a broad outline of -the code generation process, we shall next consider the -heart of it: the templates. -This leads naturally into discussions of template matching and register allocation, -and finally a discussion of the machine dependent interfaces and strategies. -.SH -The Templates -.PP -The templates describe the effect of the target machine instructions -on the model of computation around which the compiler is organized. -In effect, each template has five logical sections, and represents an assertion -of the form: -.IP -.B If -we have a subtree of a given shape (1), and we have a goal (cookie) or goals to -achieve (2), and we have sufficient free resources (3), -.B then -we may emit an instruction or instructions (4), and -rewrite the subtree in a particular manner (5), -and the rewritten tree will achieve the desired goals. -.PP -These five sections will be discussed in more -detail later. First, we give an example of a -template: -.DS -.ta 1i 2i 3i 4i 5i -ASG PLUS, INAREG, - SAREG, TINT, - SNAME, TINT, - 0, RLEFT, - " add AL,AR\en", -.DE -The top line specifies the operator (+=) and the cookie (compute the -value of the subtree into an AREG). -The second and third lines specify the left and right descendants, -respectively, -of the += operator. -The left descendant must be a REG node, representing an -A register, and have integer type, while the right side must be a NAME node, -and also have integer type. -The fourth line contains the resource requirements (no scratch registers -or temporaries needed), and the rewriting rule (replace the subtree by the left descendant). -Finally, the quoted string on the last line represents the output to the assembler: -lower case letters, tabs, spaces, etc. are copied -.I verbatim . -to the output; upper case letters trigger various macro-like expansions. -Thus, -.B AL -would expand into the \fBA\fRddress form of the \fBL\fReft operand \(em -presumably the register number. -Similarly, -.B AR -would expand into the name of the right operand. -The -.I add -instruction of the last section might well be -emitted by this template. -.PP -In principle, it would be possible to make separate templates -for all legal combinations of operators, cookies, types, and shapes. -In practice, the number of combinations is very large. -Thus, a considerable amount of mechanism is present to -permit a large number of subtrees to be matched -by a single template. -Most of the shape and type specifiers are individual bits, and can -be logically -or'ed -together. -There are a number of special descriptors for matching classes of -operators. -The cookies can also be combined. -As an example of the kind of template -that really arises in practice, the -actual template for the Interdata 8/32 -that subsumes the above example is: -.DS -.ta 1i 2i 3i 4i 5i -ASG OPSIMP, INAREG\(orFORCC, - SAREG, TINT\(orTUNSIGNED\(orTPOINT, - SAREG\(orSNAME\(orSOREG\(orSCON, TINT\(orTUNSIGNED\(orTPOINT, - 0, RLEFT\(orRESCC, - " OI AL,AR\en", -.DE -Here, OPSIMP represents the operators -+, \-, \(or, &, and ^. -The -.B OI -macro in the output string expands into the -appropriate \fBI\fRnteger \fBO\fRpcode for the operator. -The left and right sides can be integers, unsigned, or pointer types. -The right side can be, in addition to a name, a register, -a memory location whose address is given by a register and displacement (OREG), -or a constant. -Finally, these instructions set the condition codes, -and so can be used in condition contexts: -the cookie and rewriting rules reflect this. -.SH -The Template Matching Algorithm. -.PP -The heart of the second pass is the template matching -algorithm, in the routine -.I match . -.I Match -is called with a tree and a cookie; it attempts to match -the given tree against some template that will transform it -according to one of the goals given in the cookie. -If a match is successful, the transformation is -applied; -.I expand -is called to generate the assembly code, and then -.I reclaim -rewrites the tree, and reclaims the resources, such -as registers, that might have become free as a result -of the generated code. -.PP -This part of the compiler is among the most time critical. -There is a spectrum of implementation techniques available -for doing this matching. -The most naive algorithm simply looks at the templates one by one. -This can be considerably improved upon by restricting the search -for an acceptable template. -It would be possible to do better than this if the templates were given -to a separate program that ate them and generated a template -matching subroutine. -This would make maintenance of the compiler much more -complicated, however, so this has not been done. -.PP -The matching algorithm is actually carried out by restricting -the range in the table that must be searched for each opcode. -This introduces a number of complications, however, and needs a -bit of sympathetic help by the person constructing the -compiler in order to obtain best results. -The exact tuning of this algorithm continues; it -is best to consult the code and comments in -.I match -for the latest version. -.PP -In order to match a template to a tree, -it is necessary to match not only the cookie and the -op of the root, but also the types and shapes of the -left and right descendants (if any) of the tree. -A convention is established here that is carried out throughout -the second pass of the compiler. -If a node represents a unary operator, the single descendant -is always the ``left'' descendant. -If a node represents a unary operator or a leaf node (no descendants) -the ``right'' descendant is taken by convention to be the node itself. -This enables templates to easily match leaves and conversion operators, for example, -without any additional mechanism in the matching program. -.PP -The type matching is straightforward; it is possible to specify any combination -of basic types, general pointers, and pointers to one or more of -the basic types. -The shape matching is somewhat more complicated, but still pretty simple. -Templates have a collection of possible operand shapes -on which the opcode might match. -In the simplest case, an -.I add -operation might be able to add to either a register variable -or a scratch register, and might be able (with appropriate -help from the assembler) to add an integer constant (ICON), a static -memory cell (NAME), or a stack location (OREG). -.PP -It is usually attractive to specify a number of such shapes, -and distinguish between them when the assembler output is produced. -It is possible to describe the union of many elementary -shapes such as ICON, NAME, OREG, -AREG or BREG -(both scratch and register forms), etc. -To handle at least the simple forms of indirection, one can also -match some more complicated forms of trees; STARNM and STARREG -can match more complicated trees headed by an indirection operator, -and SFLD can match certain trees headed by a FLD operator: these -patterns call machine dependent routines that match the -patterns of interest on a given machine. -The shape SWADD may be used to recognize NAME or OREG -nodes that lie on word boundaries: this may be of some importance -on word\-addressed machines. -Finally, there are some special shapes: these may not -be used in conjunction with the other shapes, but may be -defined and extended in machine dependent ways. -The special shapes SZERO, SONE, and SMONE are predefined and match -constants 0, 1, and \-1, respectively; others are easy to add -and match by using the machine dependent routine -.I special . -.PP -When a template has been found that matches the root of the tree, -the cookie, and the shapes and types of the descendants, -there is still one bar to a total match: the template may -call for some resources (for example, a scratch register). -The routine -.I allo -is called, and it attempts to allocate the resources. -If it cannot, the match fails; no resources are -allocated. -If successful, the allocated resources are given numbers -1, 2, etc. for later reference when the assembly code is -generated. -The routines -.I expand -and -.I reclaim -are then called. -The -.I match -routine then returns a special value, MDONE. -If no match was found, the value MNOPE is returned; -this is a signal to the caller to try more cookie -values, or attempt a rewriting rule. -.I Match -is also used to select rewriting rules, although -the way of doing this is pretty straightforward. -A special cookie, FORREW, is used to ask -.I match -to search for a rewriting rule. -The rewriting rules are keyed to various opcodes; most -are carried out in -.I order . -Since the question of when to rewrite is one of the key issues in -code generation, it will be taken up again later. -.SH -Register Allocation. -.PP -The register allocation routines, and the allocation strategy, -play a central role in the correctness of the code generation algorithm. -If there are bugs in the Sethi-Ullman computation that cause the -number of needed registers to be underestimated, -the compiler may run out of scratch registers; -it is essential that the allocator keep track of those registers that -are free and busy, in order to detect such conditions. -.PP -Allocation of registers takes place as the result of a template -match; the routine -.I allo -is called with a word describing the number of A registers, -B registers, and temporary locations needed. -The allocation of temporary locations on the stack is relatively -straightforward, and will not be further covered; the -bookkeeping is a bit tricky, but conceptually trivial, and requests -for temporary space on the stack will never fail. -.PP -Register allocation is less straightforward. -The two major complications are -.I pairing -and -.I sharing . -In many machines, some operations (such as multiplication -and division), and/or some types (such as longs or double precision) -require even/odd pairs of registers. -Operations of the first type are exceptionally difficult to -deal with in the compiler; in fact, their theoretical -properties are rather bad as well. -.[ -Aho Johnson Ullman Multiregister -.] -The second issue is dealt with rather more successfully; -a machine dependent function called -.I szty(t) -is called that returns 1 or 2, depending on the -number of A registers required to hold an object of type -.I t . -If -.I szty -returns 2, an even/odd pair of A registers is allocated -for each request. -.PP -The other issue, sharing, is more subtle, but -important for good code quality. -When registers are allocated, it -is possible to reuse registers that hold address -information, and use them to contain the values -computed or accessed. -For example, on the IBM 360, if register 2 has -a pointer to an integer in it, we may load the -integer into register 2 itself by saying: -.DS -L 2,0(2) -.DE -If register 2 had a byte pointer, however, the sequence for -loading a character involves clearing the target -register first, and then inserting the desired character: -.DS -SR 3,3 -IC 3,0(2) -.DE -In the first case, if register 3 were used as the target, -it would lead to a larger number of registers -used for the expression than were required; the compiler would -generate inefficient code. -On the other hand, if register 2 were used as the target in the second -case, the code would simply be wrong. -In the first case, register 2 can be -.I shared -while in the second, it cannot. -.PP -In the specification of the register needs in the templates, -it is possible to indicate whether required scratch registers -may be shared with possible registers on the left or the right of the input tree. -In order that a register be shared, it must be scratch, and it must -be used only once, on the appropriate side of the tree being compiled. -.PP -The -.I allo -routine thus has a bit more to do than meets the eye; -it calls -.I freereg -to obtain a free register for each A and B register request. -.I Freereg -makes multiple calls on the routine -.I usable -to decide if a given register can be used to satisfy -a given need. -.I Usable -calls -.I shareit -if the register is busy, but might be shared. -Finally, -.I shareit -calls -.I ushare -to decide if the desired register is actually in the appropriate -subtree, and can be shared. -.PP -Just to add additional complexity, on some machines (such as the IBM 370) it -is possible to have ``double indexing'' forms of -addressing; these are represented by OREGS's -with the base and index registers encoded into the register field. -While the register allocation and deallocation -.I "per se" -is not made more difficult by this phenomenon, the code itself -is somewhat more complex. -.PP -Having allocated the registers and expanded the assembly language, -it is time to reclaim the resources; the routine -.I reclaim -does this. -Many operations produce more than one result. -For example, many arithmetic operations may produce -a value in a register, and also set the condition -codes. -Assignment operations may leave results both in a register and in memory. -.I Reclaim -is passed three parameters; the tree and cookie -that were matched, and the rewriting field of the template. -The rewriting field allows the specification of possible results; -the tree is rewritten to reflect the results of the operation. -If the tree was computed for side effects only (FOREFF), -the tree is freed, and all resources in it reclaimed. -If the tree was computed for condition codes, the resources -are also freed, and the tree replaced by a special -node type, FORCC. -Otherwise, the value may be found in the left -argument of the root, the right argument of the root, -or one of the temporary resources allocated. -In these cases, first the resources of the tree, -and the newly allocated resources, -are -freed; then the resources needed by the result -are made busy again. -The final result must always match the shape of the input cookie; -otherwise, the compiler error -``cannot reclaim'' -is generated. -There are some machine dependent ways of -preferring results in registers or memory when -there are multiple results matching multiple goals in the cookie. -.SH -The Machine Dependent Interface -.PP -The files -.I order.c , -.I local2.c , -and -.I table.c , -as well as the header file -.I mac2defs , -represent the machine dependent portion of the second pass. -The machine dependent portion can be roughly divided into -two: the easy portion and the hard portion. -The easy portion -tells the compiler the names of the registers, and arranges that -the compiler generate the proper assembler formats, opcode names, location counters, etc. -The hard portion involves the Sethi\-Ullman computation, the -rewriting rules, and, to some extent, the templates. -It is hard because there are no real algorithms that apply; -most of this portion is based on heuristics. -This section discusses the easy portion; the next several -sections will discuss the hard portion. -.PP -If the compiler is adapted from a compiler for a machine -of similar architecture, the easy part is indeed easy. -In -.I mac2defs , -the register numbers are defined, as well as various parameters for -the stack frame, and various macros that describe the machine architecture. -If double indexing is to be permitted, for example, the symbol -R2REGS is defined. -Also, a number of macros that are involved in function call processing, -especially for unusual function call mechanisms, are defined here. -.PP -In -.I local2.c , -a large number of simple functions are defined. -These do things such as write out opcodes, register names, -and address forms for the assembler. -Part of the function call code is defined here; that is nontrivial -to design, but typically rather straightforward to implement. -Among the easy routines in -.I order.c -are routines for generating a created label, -defining a label, and generating the arguments -of a function call. -.PP -These routines tend to have a local effect, and depend on a fairly straightforward way -on the target assembler and the design decisions already made about -the compiler. -Thus they will not be further treated here. -.SH -The Rewriting Rules -.PP -When a tree fails to match any template, it becomes -a candidate for rewriting. -Before the tree is rewritten, -the machine dependent routine -.I nextcook -is called with the tree and the cookie; it suggests -another cookie that might be a better candidate for the -matching of the tree. -If all else fails, the templates are searched with the cookie -FORREW, to look for a rewriting rule. -The rewriting rules are of two kinds; -for most of the common operators, there are -machine dependent rewriting rules that may be applied; -these are handled by machine dependent functions -that are called and given the tree to be computed. -These routines may recursively call -.I order -or -.I codgen -to cause certain subgoals to be achieved; -if they actually call for some alteration of the tree, -they return 1, and the -code generation algorithm recanonicalizes and tries again. -If these routines choose not to deal with the tree, the -default rewriting rules are applied. -.PP -The assignment ops, when rewritten, call the routine -.I setasg . -This is assumed to rewrite the tree at least to the point where there are -no side effects in the left hand side. -If there is still no template match, -a default rewriting is done that causes -an expression such as -.DS -.I "a += b" -.DE -to be rewritten as -.DS -.I "a = a + b" -.DE -This is a useful default for certain mixtures of strange types -(for example, when -.I a -is a bit field and -.I b -an character) that -otherwise might need separate table entries. -.PP -Simple assignment, structure assignment, and all forms of calls -are handled completely by the machine dependent routines. -For historical reasons, the routines generating the calls return -1 on failure, 0 on success, unlike the other routines. -.PP -The machine dependent routine -.I setbin -handles binary operators; it too must do most of the job. -In particular, when it returns 0, it must do so with -the left hand side in a temporary register. -The default rewriting rule in this case is to convert the -binary operator into the associated assignment operator; -since the left hand side is assumed to be a temporary register, -this preserves the semantics and often allows a considerable -saving in the template table. -.PP -The increment and decrement operators may be dealt with with -the machine dependent routine -.I setincr . -If this routine chooses not to deal with the tree, the rewriting rule replaces -.DS -.I "x ++" -.DE -by -.DS -.I "( (x += 1) \- 1)" -.DE -which preserves the semantics. -Once again, this is not too attractive for the most common -cases, but can generate close to optimal code when the -type of x is unusual. -.PP -Finally, the indirection (UNARY MUL) operator is also handled -in a special way. -The machine dependent routine -.I offstar -is extremely important for the efficient generation of code. -.I Offstar -is called with a tree that is the direct descendant of a UNARY MUL node; -its job is to transform this tree so that the combination of -UNARY MUL with the transformed tree becomes addressable. -On most machines, -.I offstar -can simply compute the tree into an A or B register, -depending on the architecture, and then -.I canon -will make the resulting tree into an OREG. -On many machines, -.I offstar -can profitably choose to do less work than computing -its entire argument into a register. -For example, if the target machine supports OREGS -with a constant offset from a register, and -.I offstar -is called -with a tree of the form -.DS -.I "expr + const" -.DE -where -.I const -is a constant, then -.I offstar -need only compute -.I expr -into the appropriate form of register. -On machines that support double indexing, -.I offstar -may have even more choice as to how to proceed. -The proper tuning of -.I offstar , -which is not typically too difficult, should be one of the -first tries at optimization attempted by the -compiler writer. -.SH -The Sethi-Ullman Computation -.PP -The heart of the heuristics is the computation of the Sethi-Ullman numbers. -This computation is closely linked with the rewriting rules and the -templates. -As mentioned before, the Sethi-Ullman numbers are expected to -estimate the number of scratch registers needed to compute -the subtrees without using any stores. -However, the original theory does not apply to real machines. -For one thing, the theory assumes that all registers -are interchangeable. -Real machines have general purpose, floating point, and index registers, -register pairs, etc. -The theory also does not account for side effects; -this rules out various forms of pathology that arise -from assignment and assignment ops. -Condition codes are also undreamed of. -Finally, the influence of types, conversions, and the -various addressability restrictions and extensions of real -machines are also ignored. -.PP -Nevertheless, for a ``useless'' theory, -the basic insight of Sethi and Ullman is amazingly -useful in a real compiler. -The notion that one should attempt to estimate the -resource needs of trees before starting the -code generation provides a natural means of splitting the -code generation problem, and provides a bit of redundancy -and self checking in the compiler. -Moreover, if writing the -Sethi-Ullman routines is hard, describing, writing, and debugging the -alternative (routines that attempt to free up registers by stores into -temporaries ``on the fly'') is even worse. -Nevertheless, it should be clearly understood that these routines exist in a realm -where there is no ``right'' way to write them; -it is an art, the realm of heuristics, and, consequently, a major -source of bugs in the compiler. -Often, the early, crude versions of these routines give little trouble; -only after -the compiler is actually working -and the -code quality is being improved do serious problem have to be faced. -Having a simple, regular machine architecture is worth quite -a lot at this time. -.PP -The major problems arise from asymmetries in the registers: register pairs, -having different kinds of registers, and the related problem of -needing more than one register (frequently a pair) to store certain -data -types (such as longs or doubles). -There appears to be no general way of treating this problem; -solutions have to be fudged for each machine where the problem arises. -On the Honeywell 66, for example, there are only two general purpose registers, -so a need for a pair is the same as the need for two registers. -On the IBM 370, the register pair (0,1) is used to do multiplications and divisions; -registers 0 and 1 are not generally considered part of the scratch registers, and so -do not require allocation explicitly. -On the Interdata 8/32, after much consideration, the -decision was made not to try to deal with the register pair issue; -operations such as multiplication and division that required pairs -were simply assumed to take all of the scratch registers. -Several weeks of effort had failed to produce -an algorithm that seemed to have much chance of running successfully -without inordinate debugging effort. -The difficulty of this issue should not be minimized; it represents one of the -main intellectual efforts in porting the compiler. -Nevertheless, this problem has been fudged with a degree of -success on nearly a dozen machines, so the compiler writer should -not abandon hope. -.PP -The Sethi-Ullman computations interact with the -rest of the compiler in a number of rather subtle ways. -As already discussed, the -.I store -routine uses the Sethi-Ullman numbers to decide which subtrees are too difficult -to compute in registers, and must be stored. -There are also subtle interactions between the -rewriting routines and the Sethi-Ullman numbers. -Suppose we have a tree such as -.DS -.I "A \- B" -.DE -where -.I A -and -.I B -are expressions; suppose further that -.I B -takes two registers, and -.I A -one. -It is possible to compute the full expression in two registers by -first computing -.I B , -and then, using the scratch register -used by -.I B , -but not containing the answer, compute -.I A . -The subtraction can then be done, computing the expression. -(Note that this assumes a number of things, not the least of which -are register-to-register subtraction operators and symmetric -registers.) -If the machine dependent routine -.I setbin , -however, is not prepared to recognize this case -and compute the more difficult side of the expression -first, the -Sethi-Ullman number must be set to three. -Thus, the -Sethi-Ullman number for a tree should represent the code that -the machine dependent routines are actually willing to generate. -.PP -The interaction can go the other way. -If we take an expression such as -.DS -.I "* ( p + i )" -.DE -where -.I p -is a pointer and -.I i -an integer, -this can probably be done in one register on most machines. -Thus, its Sethi-Ullman number would probably be set to one. -If double indexing is possible in the machine, a possible way -of computing the expression is to load both -.I p -and -.I i -into registers, and then use double indexing. -This would use two scratch registers; in such a case, -it is possible that the scratch registers might be unobtainable, -or might make some other part of the computation run out of -registers. -The usual solution is to cause -.I offstar -to ignore opportunities for double indexing that would tie up more scratch -registers than the Sethi-Ullman number had reserved. -.PP -In summary, the Sethi-Ullman computation represents much of the craftsmanship and artistry in any application -of the portable compiler. -It is also a frequent source of bugs. -Algorithms are available that will produce nearly optimal code -for specialized machines, but unfortunately most existing machines -are far removed from these ideals. -The best way of proceeding in practice is to start with a compiler -for a similar machine to the target, and proceed very -carefully. -.SH -Register Allocation -.PP -After the Sethi-Ullman numbers are computed, -.I order -calls a routine, -.I rallo , -that does register allocation, if appropriate. -This routine does relatively little, in general; -this is especially true if the target machine -is fairly regular. -There are a few cases where it is assumed that -the result of a computation takes place in a particular register; -switch and function return are the two major places. -The expression tree has a field, -.I rall , -that may be filled with a register number; this is taken -to be a preferred register, and the first temporary -register allocated by a template match will be this preferred one, if -it is free. -If not, no particular action is taken; this is just a heuristic. -If no register preference is present, the field contains NOPREF. -In some cases, the result must be placed in -a given register, no matter what. -The register number is placed in -.I rall , -and the mask MUSTDO is logically or'ed in with it. -In this case, if the subtree is requested in a register, and comes -back in a register other than the demanded one, it is moved -by calling the routine -.I rmove . -If the target register for this move is busy, it is a compiler -error. -.PP -Note that this mechanism is the only one that will ever cause a register-to-register -move between scratch registers (unless such a move is buried in the depths of -some template). -This simplifies debugging. -In some cases, there is a rather strange interaction between -the register allocation and the Sethi-Ullman number; -if there is an operator or situation requiring a -particular register, the allocator and the Sethi-Ullman -computation must conspire to ensure that the target -register is not being used by some intermediate result of some far-removed computation. -This is most easily done by making the special operation take -all of the free registers, preventing any other partially-computed -results from cluttering up the works. -.SH -Compiler Bugs -.PP -The portable compiler has an excellent record of generating correct code. -The requirement for reasonable cooperation between the register allocation, -Sethi-Ullman computation, rewriting rules, and templates builds quite a bit -of redundancy into the compiling process. -The effect of this is that, in a surprisingly short time, the compiler will -start generating correct code for those -programs that it can compile. -The hard part of the job then becomes finding and -eliminating those situations where the compiler refuses to -compile a program because it knows it cannot do it right. -For example, a template may simply be missing; this may either -give a compiler error of the form ``no match for op ...'' , or cause -the compiler to go into an infinite loop applying various rewriting rules. -The compiler has a variable, -.I nrecur , -that is set to 0 at the beginning of an expressions, and -incremented at key spots in the -compilation process; if this parameter gets too large, the -compiler decides that it is in a loop, and aborts. -Loops are also characteristic of botches in the machine-dependent rewriting rules. -Bad Sethi-Ullman computations usually cause the scratch registers -to run out; this often means -that the Sethi-Ullman number was underestimated, so -.I store -did not store something it should have; alternatively, -it can mean that the rewriting rules were not smart enough to -find the sequence that -.I sucomp -assumed would be used. -.PP -The best approach when a compiler error is detected involves several stages. -First, try to get a small example program that steps on the bug. -Second, turn on various debugging flags in the code generator, and follow the -tree through the process of being matched and rewritten. -Some flags of interest are -\-e, which prints the expression tree, -\-r, which gives information about the allocation of registers, -\-a, which gives information about the performance of -.I rallo , -and \-o, which gives information about the behavior of -.I order . -This technique should allow most bugs to be found relatively quickly. -.PP -Unfortunately, finding the bug is usually not enough; it must also -be fixed! -The difficulty arises because a fix to the particular bug of interest tends -to break other code that already works. -Regression tests, tests that compare the performance of -a new compiler against the performance of an older one, are very -valuable in preventing major catastrophes. -.SH -Summary and Conclusion -.PP -The portable compiler has been a useful tool for providing C -capability on a large number of diverse machines, -and for testing a number of theoretical -constructs in a practical setting. -It has many blemishes, both in style and functionality. -It has been applied to many more machines than first -anticipated, of a much wider range than originally dreamed of. -Its use has also spread much faster than expected, leaving parts of -the compiler still somewhat raw in shape. -.PP -On the theoretical side, there is some hope that the -skeleton of the -.I sucomp -routine could be generated for many machines directly from the -templates; this would give a considerable boost -to the portability and correctness of the compiler, -but might affect tunability and code quality. -There is also room for more optimization, both within -.I optim -and in the form of a portable ``peephole'' optimizer. -.PP -On the practical, development side, -the compiler could probably be sped up and made smaller -without doing too much violence to its basic structure. -Parts of the compiler deserve to be rewritten; -the initialization code, register allocation, and -parser are prime candidates. -It might be that doing some or all of the parsing -with a recursive descent parser might -save enough space and time to be worthwhile; -it would certainly ease the problem of moving the -compiler to an environment where -.I Yacc -is not already present. -.PP -Finally, I would like to thank the many people who have -sympathetically, and even enthusiastically, helped me grapple -with what has been a frustrating program to write, test, and install. -D. M. Ritchie and E. N. Pinson provided needed early -encouragement and philosophical guidance; -M. E. Lesk, -R. Muha, T. G. Peterson, -G. Riddle, L. Rosler, -R. W. Mitze, -B. R. Rowland, -S. I. Feldman, -and -T. B. London -have all contributed ideas, gripes, and all, at one time or another, -climbed ``into the pits'' with me to help debug. -Without their help this effort would have not been possible; -with it, it was often kind of fun. -.sp 100 -.LP -.[ -$LIST$ -.] -.LP -.sp 100 -.LP //GO.SYSIN DD porttour2