# To unbundle, run this file echo cdoc0 sed 's/.//' >cdoc0 <<'//GO.SYSIN DD cdoc0' -.TL -A Tour through the UNIX\(dg C Compiler -.AU "MH 2C517 3770 -D. M. Ritchie -.AI -.MH -.OK -Languages -Computing -..AB -..AE -.CS a b c d e f -.de II -.I -\\$1 -.R -.. -.de Op -.SH -\\$1 \fI\\$2 -.IP -.. -.PP //GO.SYSIN DD cdoc0 echo cdoc1 sed 's/.//' >cdoc1 <<'//GO.SYSIN DD cdoc1' -.SH -The Intermediate Language -.PP -.FS -\(dgUNIX is a Trademark of Bell Laboratories. -.FE -Communication between the two phases of the compiler proper -is carried out by means of a pair of intermediate files. -These files are treated as having identical structure, -although the second file contains only the code generated for strings. -It is convenient to write strings out separately to reduce the -need for multiple location counters in a later assembly -phase. -.PP -The intermediate language is not machine-independent; -its structure in a number of ways reflects -the fact that C was originally a one-pass compiler -chopped in two to reduce the maximum memory -requirement. -In fact, only the latest version -of the compiler has a complete -intermediate language at all. -Until recently, the first phase of the compiler generated -assembly code for those constructions it could deal with, -and passed expression parse trees, in absolute binary -form, -to the second phase for code generation. -Now, at least, all inter-phase information -is passed in a describable form, and there are -no absolute pointers involved, so the coupling between -the phases is not so strong. -.PP -The areas in which the machine -(and system) dependencies are most noticeable are -.IP 1. -Storage allocation for automatic variables and arguments -has already been performed, -and nodes for such variables refer to them by offset -from a display pointer. -Type conversion -(for example, from integer to pointer) -has already occurred using the assumption of -byte addressing and 2-byte words. -.IP 2. -Data representations suitable to the PDP-11 are assumed; -in particular, floating point constants are passed as -four words in the machine representation. -.PP -As it happens, each intermediate file is represented as a sequence -of binary numbers without any explicit demarcations. -It consists of a sequence of -conceptual lines, each headed by an operator, and possibly containing -various operands. -The operators are small numbers; -to assist in recognizing failure in synchronization, -the high-order byte of each operator word is always the -octal number 376. -Operands are -either 16-bit binary numbers or strings of characters representing names. -Each name is terminated by a null character. -There is no alignment requirement for numerical -operands and so there is no padding -after a name string. -.PP -The binary representation was chosen to avoid the necessity -of converting to and from character form -and to minimize the size of the files. -It would be very easy to make -each operator-operand `line' in the file be -a genuine, printable line, with the numbers in octal or decimal; -this in fact was the representation originally used. -.PP -The operators fall naturally into two classes: -those which represent part of an expression, and all others. -Expressions are transmitted in a reverse-Polish notation; -as they are being read, a tree is built which is isomorphic -to the tree constructed in the first phase. -Expressions are passed as a whole, with no non-expression operators -intervening. -The reader maintains a stack; each leaf of the expression tree (name, constant) -is pushed on the stack; -each unary operator replaces the top of the stack by a node whose -operand is the old top-of-stack; -each binary operator replaces the top pair on the stack with -a single entry. -When the expression is complete there is exactly one item on the -stack. -Following each expression -is a special operator which passes the unique previous expression -to the `optimizer' described below and then to the code -generator. -.PP -Here is the list of operators not themselves part of expressions. -.LP -.Op EOF -marks the end of an input file. -.Op BDATA "flag data ..." -specifies a sequence of bytes to be assembled -as static data. -It is followed by pairs of words; the first member -of the pair is non-zero to indicate that the data continue; -a zero flag is not followed by data and terminates -the operator. -The data bytes occupy the low-order part of a word. -.Op WDATA "flag data ..." -specifies a sequence of words to be assembled as -static data; it is identical to the BDATA operator -except that entire words, not just bytes, are passed. -.Op PROG -means that subsequent information is to be compiled as program text. -.Op DATA -means that subsequent information is to be compiled as static data. -.Op BSS -means that subsequent information is to be compiled as unitialized -static data. -.Op SYMDEF name -means that -the symbol -.I -name -.R -is an external name defined in the current program. -It is produced for each external data or function definition. -.Op CSPACE "name size" -indicates that the name refers to a data area whose size is the -specified number of bytes. -It is produced for external data definitions without explicit initialization. -.Op SSPACE size -indicates that -.I -size -.R -bytes should be set aside for data storage. -It is used to pad out short initializations of external data -and to reserve space for static (internal) data. -It will be preceded by an appropriate label. -.Op EVEN -is produced after each -external data definition whose size is not -an integral number of words. -It is not produced after strings except when they initialize -a character array. -.Op NLABEL name -is produced just before a BDATA or WDATA initializing -external data, and serves as a label for the data. -.Op RLABEL name -is produced just before each function definition, -and labels its entry point. -.Op SNAME "name number" -is produced at the start of each function for each static variable -or label -declared therein. -Subsequent uses of the variable will be in terms of the given number. -The code generator uses this only to produce a debugging symbol table. -.Op ANAME "name number" -Likewise, each automatic variable's name and stack offset -is specified by this operator. -Arguments count as automatics. -.Op RNAME "name number" -Each register variable is similarly named, with its register number. -.Op SAVE number -produces a register-save sequence at the start of each function, -just after its label (RLABEL). -.Op SETREG number -is used to indicate the number of registers used -for register variables. -It actually gives the register number of the lowest -free register; it is redundant because the RNAME operators could be -counted instead. -.Op PROFIL -is produced before the save sequence for functions -when the profile option is turned on. -It produces code to count the number -of times the function is called. -.Op SWIT "deflab line label value ..." -is produced for switches. -When control flows into it, -the value being switched on is in the register -forced by RFORCE (below). -The switch statement occurred on the indicated line -of the source, and the label number of the default location -is -.I -deflab. -.R -Then the operator is followed by a sequence of label-number and value pairs; -the list is terminated by a 0 label. -.Op LABEL number -generates an internal label. -It is referred to elsewhere using the given number. -.Op BRANCH number -indicates an unconditional transfer to the internal label number -given. -.Op RETRN -produces the return sequence for a function. -It occurs only once, at the end of each function. -.Op EXPR line -causes the expression just preceding to be compiled. -The argument is the line number in the source where the -expression occurred. -.Op NAME "class type name" -.Op NAME "class type number" -indicates a name occurring in an expression. -The first form is used when the name is external; -the second when the name is automatic, static, or a register. -Then the number indicates the stack offset, the label number, -or the register number as appropriate. -Class and type encoding is described elsewhere. -.Op CON "type value" -transmits an integer constant. -This and the next two operators occur as part of expressions. -.Op FCON "type 4-word-value" -transmits a floating constant as -four words in PDP-11 notation. -.Op SFCON "type value" -transmits a floating-point constant -whose value is correctly represented by its high-order word -in PDP-11 notation. -.Op NULL -indicates a null argument list of a function call in an expression; -call is a binary operator whose second operand is the argument list. -.Op CBRANCH "label cond" -produces a conditional branch. -It is an expression operator, and will be followed -by an EXPR. -The branch to the label number takes place if the expression's -truth value is the same as that of -.I -cond. -.R -That is, if -.I -cond=1 -.R -and the expression evaluates to true, the branch is taken. -.Op binary-operator type -There are binary operators corresponding -to each such source-language operator; -the type of the result of each is passed as well. -Some perhaps-unexpected ones are: -COMMA, which is a right-associative operator designed -to simplify right-to-left evaluation -of function arguments; -prefix and postfix ++ and \-\-, whose second operand -is the increment amount, as a CON; -QUEST and COLON, to express the conditional -expression as `a?(b:c)'; -and a sequence of special operators for expressing -relations between pointers, in case pointer comparison -is different from integer comparison -(e.g. unsigned). -.Op unary-operator type -There are also numerous unary operators. -These include ITOF, FTOI, FTOL, LTOF, ITOL, LTOI -which convert among floating, long, and integer; -JUMP which branches indirectly through a label expression; -INIT, which compiles the value of a constant expression -used as an initializer; -RFORCE, which is used before a return sequence or -a switch to place a value in an agreed-upon register. //GO.SYSIN DD cdoc1 echo cdoc2 sed 's/.//' >cdoc2 <<'//GO.SYSIN DD cdoc2' -.SH -Expression Optimization -.PP -Each expression tree, as it is read in, is subjected to -a fairly comprehensive -analysis. -This is performed -by the -.II optim -routine and a number of subroutines; -the major things done are -.IP 1. -Modifications and simplifications -of the tree so its value may be computed more efficiently -and conveniently by the code generator. -.RT -.IP 2. -Marking each interior node with an estimate of the number of -registers required to evaluate it. -This register count is needed to guide the code generation algorithm. -.PP -One thing that is definitely not done is -discovery or exploitation of common subexpressions, nor is this done anywhere in the -compiler. -.PP -The basic organization is simple: a depth-first scan of the tree. -.II Optim -does nothing for leaf nodes (except for automatics; see below), -and calls -.II unoptim -to handle unary operators. -For binary operators, -it calls itself to process the operands, -then treats each operator separately. -One important case is -commutative and associative operators, which are handled -by -.II acommute. -.PP -Here is a brief catalog of the transformations carried out by -by -.II optim -itself. -It is not intended to be complete. -Some of the transformations are machine-dependent, -although they may well be useful on machines other than the -PDP-11. -.IP 1. -As indicated in the discussion of -.II unoptim -below, the optimizer can create a node type corresponding -to the location addressed by a register plus a constant offset. -Since this is precisely the implementation of automatic variables -and arguments, where the register is fixed by convention, -such variables are changed to the new form to simplify -later processing. -.RT -.IP 2. -Associative and commutative operators are processed by the -special routine -.II acommute. -.RT -.IP 3. -After processing by -.II acommute, -the bitwise & operator is turned into a new -.II andn -operator; `a & b' becomes -`a -.II andn -~b'. -This is done because the PDP-11 provides -no -.II and -operator, but only -.II andn. -A similar transformation takes place for -`=&'. -.RT -.IP 4. -Relationals are turned around so the -more complicated expression is on the left. -(So that `2 > f(x)' becomes `f(x) < 2'). -This improves code generation since -the algorithm prefers to have the right operand -require fewer registers than the left. -.RT -.IP 5. -An expression minus a constant is turned into -the expression plus the negative constant, -and the -.II acommute -routine is called -to take advantage of the properties of addition. -.RT -.IP 6. -Operators with constant operands are evaluated. -.RT -.IP 7. -Right shifts (unless by 1) -are turned into left shifts with a negated right operand, -since the PDP-11 lacks a general right-shift operator. -.RT -.IP 8. -A number of special cases are simplified, such as division or -multiplication by 1, -and shifts by 0. -.LP -The -.II unoptim -routine performs the same sort of processing for unary operators. -.IP 1. -`*&x' and `&*x' are simplified to `x'. -.RT -.IP 2. -If -.II r -is a register and -.II c -is a constant or the address of a static or external -variable, -the expressions `*(r+c)' -and `*r' are turned into a special kind of name node -which expresses -the name itself and the offset. -This simplifies subsequent processing -because such constructions can appear as -the the address of a PDP-11 instruction. -.RT -.IP 3. -When the unary `&' operator is applied to -a name node of the special kind just discussed, -it is reworked to make the addition -explicit again; -this is done because the PDP-11 has no `load address' instruction. -.RT -.IP 4. -Constructions -like -`*r++' and -`*\-\-r' -where -.II r -is a register are discovered and marked -as being implementable using the PDP-11 -auto-increment and -decrement modes. -.RT -.IP 5. -If `!' is applied to a relational, -the `!' is discarded -and the sense of the relational is reversed. -.RT -.IP 6. -Special cases involving reflexive -use of negation and complementation are discovered. -.RT -.IP 7. -Operations applying to constants are evaluated. -.PP -The -.II acommute -routine, called for associative and commutative operators, -discovers clusters of the same operator at the top levels -of the current tree, and arranges them in a list: -for `a+((b+c)+(d+f))' -the list would be`a,b,c,d,e,f'. -After each subtree is optimized, the list is sorted in -decreasing difficulty of computation; -as mentioned above, -the code generation algorithm works best when left operands -are the difficult ones. -The `degree of difficulty' -computed is actually finer than -the mere number of registers required; -a constant is considered simpler -than the address of a static or external, which is simpler -than reference to a variable. -This makes it easy to fold all the constants -together, -and also to merge together the sum of a constant and the address of -a static -or external (since in such nodes there is space for -an `offset' value). -There are also special cases, like multiplication by 1 and addition of 0. -.II -A special routine is invoked to handle sums of products. -.II Distrib -is based on the fact that it is better -to compute `c1*c2*x + c1*y' as `c1*(c2*x + y)' -and makes the divisibility tests required to assure the -correctness of the transformation. -This transformation is rarely -possible with code directly written by the user, -but it invariably occurs as a result of the -implementation of multi-dimensional arrays. -.PP -Finally, -.II acommute -reconstructs a tree from the list -of expressions which result. //GO.SYSIN DD cdoc2 echo cdoc3 sed 's/.//' >cdoc3 <<'//GO.SYSIN DD cdoc3' -.SH -Code Generation -.PP -The grand plan for code-generation is -independent of any particular machine; -it depends largely on a set of tables. -But this fact does not necessarily make it very easy -to modify the compiler to produce code for other machines, -both because there is a good deal of machine-dependent structure -in the tables, and because in any event such tables are non-trivial to -prepare. -.PP -The arguments to the basic code generation routine -.II rcexpr -are a pointer to a tree representing an expression, -the name of a code-generation table, -and the number of a register in which the value of the -expression should be placed. -.II Rcexpr -returns the number of the register in which the value actually -ended up; -its caller -may need to produce a -.II mov -instruction if the value really needs to be in the given register. -There are four code generation tables. -.PP -.II Regtab -is the basic one, which actually does the job described -above: namely, -compile code which places the value represented by the expression -tree in a register. -.PP -.II Cctab -is used when the value of the expression is not actually needed, -but instead the value of the condition codes resulting from -evaluation of the expression. -This table is used, for example, to evaluate the expression after -.II if. -It is clearly silly to -calculate the value (0 or 1) of the expression -`a==b' in the context `if (a==b) ... ' -.PP -The -.II sptab -table is used when the value of an expression is to be pushed on the stack, -for example when it is an actual argument. -For example in the function call `f(a)' it is a bad idea to -load -.II a -into a register which is then pushed on the stack, -when there is a single instruction which does the job. -.PP -The -.II efftab -table is used when an expression is to be evaluated for its side effects, -not its value. -This occurs mostly for expressions which are statements, which have no -value. -Thus the code for the statement -`a = b' -need produce only the approoriate -.II mov -instruction, and need not leave the value of -.II b -in a register, -while in the expression `a + (b = c)' -the value of `b = c' will appear in a register. -.PP -All of the tables besides -.II regtab -are rather small, and handle only a relatively few special cases. -If one of these subsidiary tables does not contain -an entry applicable to the given expression tree, -.II rcexpr -uses -.II regtab -to put the value of the expression into a register -and then fixes things up; -nothing need be done when the table -was -.II efftab, -but a -.II tst -instruction is produced when the table called for was -.II cctab, -and a -.II mov -instruction, -pushing the register on the stack, -when the table was -.II sptab. -.PP -The -.II rcexpr -routine itself picks off some special -cases, then calls -.II cexpr -to do the real work. -.II Cexpr -tries to find an entry applicable -to the given tree in the given table, and returns \-1 if -no such entry is found, letting -.II rcexpr -try again with a different table. -A successful match yields a string -containing both literal characters -which are written out and pseudo-operations, or macros, which are expanded. -Before studying the contents -of these strings we will consider how table entries are matched -against trees. -.PP -Recall that most non-leaf nodes in an expression tree -contain the name of the operator, -the type of the value represented, and pointers to the subtrees (operands). -They also contain an estimate of the number of registers required to evaluate -the expression, placed there by the expression-optimizer routines. -The register counts are used to guide the code generation process, -which is based on the Sethi-Ullman algorithm. -.PP -The main code generation -tables consist of entries -each containing an operator number and a pointer -to a subtable for the corresponding operator. -A subtable consists of a sequence -of entries, each with a key describing certain properties of the -operands of the operator involved; associated with the key is a code string. -Once the subtable corresponding to the operator is found, the subtable -is searched linearly until a key is found such that the properties demanded -by the key are compatible with the operands of the tree node. -A successful match returns the code string; -an unsuccessful search, either for the operator in the main table -or a compatble key in the subtable, -returns a failure indication. -.PP -The tables are all contained in a file -which must be processed to obtain an assembly language program. -Thus they are written in a special-purpose language. -To provided definiteness to the following discussion, here is an -example of a subtable entry. -.DS -%n,aw - F - add A2,R -.DE -The `%' indicates the key; -the information following (up to a blank line) specifies the code string. -Very briefly, this entry is in the subtable -for `+' of -.II regtab; -the key specifies that the left operand is any integer, character, or pointer -expression, -and the right operand is any word quantity which is directly addressible -(e.g. a variable or constant). -The code string calls for the generation of the code -to compile the left (first) operand into the -current register (`F') -and then to produce an `add' instruction which adds the -second operand (`A2') to the register (`R'). -All of the notation will be explained below. -.PP -Only three features of the operands are used in deciding -whether a match has occurred. -They are: -.IP 1. -Is the type of the operand compatible with that demanded? -.RT -.IP 2. -Is the `degree of difficulty' (in a sense described below) compatible? -.RT -.IP 3. -The table may demand that the operand have a `*' -(indirection operator) as its highest operator. -.PP -As suggested above, the key for a subtable entry -is indicated by a `%,' and a comma-separated pair -of specifications for the operands. -(The second specification is ignored for unary operators). -A specification indicates -a type requirement by including one of the following letters. -If no type letter is present, any integer, character, -or pointer operand will satisfy the requirement (not float, double, or long). -.IP b -A byte (character) operand is required. -.RT -.IP w -A word (integer or pointer) operand is required. -.RT -.IP f -A float or double operand is required. -.RT -.IP d -A double operand is required. -.RT -.IP l -A long (32-bit integer) operand is required. -.PP -Before discussing the `degree of difficulty' specification, -the algorithm has to be explained more completely. -.II Rcexpr -(and -.II cexpr) -are called with a register number in which to place their result. -Registers 0, 1, ... are used during evaluation of expressions; -the maximum register which can be used in this way depends on the -number of register variables, but in any event only registers -0 through 4 are available since r5 is used as a stack frame -header and r6 (sp) and r7 (pc) have special -hardware properties. -The code generation routines assume that when called with register -.II n -as argument, they may use -.II n+1, -\&... -(up to the first register variable) -as temporaries. -Consider the expression `X+Y', where both -X and Y are expressions. -As a first approximation, there are three ways of compiling -code to put this expression in register -.II n. -.IP 1. -If Y is an addressible cell, -(recursively) put X into register -.II n -and add Y to it. -.RT -.IP 2. -If Y is an expression that can be calculated in -.II k -registers, where -.II k -smaller than the number of registers available, -compile X into register -.II n, -Y into register -.II n+1, -and add register -.II n+1 -to -.II n. -.RT -.IP 3. -Otherwise, compile Y into register -.II n, -save the result in a temporary (actually, on the stack) -compile X into register -.II n, -then add in the temporary. -.PP -The distinction between cases 2 and 3 therefore depends -on whether the right operand can be compiled in fewer than -.II k -registers, where -.II k -is the number of free registers left after registers 0 through -.II n -are taken: -0 through -.II n\-1 -are presumed to contain already computed temporary results; -.II n -will, in case 2, -contain the value of the left operand while the right -is being evaluated. -.PP -These considerations should make clear -the specification codes for the degree of difficulty, -bearing in mind that a number of special cases are also present: -.IP z -is satisfied when the operand is zero, so that special code -can be produced for expressions like `x = 0'. -.RT -.IP 1 -is satisfied when the operand is the constant 1, to optimize -cases like left and right shift by 1, which can be done -efficiently on the PDP-11. -.RT -.IP c -is satisfied when the operand is a positive (16-bit) -constant; this takes care of some special cases in long arithmetic. -.RT -.IP a -is satisfied when the operand is addressible; -this occurs not only for variables and constants, but also for -some more complicated constructions, such as indirection through -a simple variable, `*p++' where -.II p -is a register variable (because of the PDP-11's auto-increment address -mode), and `*(p+c)' where -.II p -is a register and -.II c -is a constant. -Precisely, the requirement is that the operand refers to a cell -whose address can be written as a source or destination of a PDP-11 -instruction. -.RT -.IP e -is satisfied by an operand whose value can be generated in a register -using no more than -.II k -registers, where -.II k -is the number of registers left (not counting the current register). -The `e' stands for `easy.' -.RT -.IP n -is satisfied by any operand. -The `n' stands for `anything.' -.PP -These degrees of difficulty are considered to lie in a linear ordering -and any operand which satisfies an earlier-mentioned requirement -will satisfy a later one. -Since the subtables are searched linearly, -if a `1' specification is included, almost certainly -a `z' must be written first to prevent -expressions containing the constant 0 to be compiled -as if the 0 were 1. -.PP -Finally, -a key specification may contain a `*' which -requires the operand to have an indirection as its leading operator. -Examples below should clarify the utility of this specification. -.PP -Now let us consider the contents of the code string -associated with each subtable entry. -Conventionally, lower-case letters in this string -represent literal information which is copied directly -to the output. -Upper-case letters generally introduce specific -macro-operations, some of which may be followed -by modifying information. -The code strings in the tables are written with tabs and -new-lines used freely to suggest instructions which will be generated; -the table-compiling program compresses tabs (using the 0200 bit of the -next character) and throws away some of the new-lines. -For example the macro `F' is ordinarily written on a line by itself; -but since its expansion will end with a new-line, the new-line -after `F' itself is dispensable. -This is all to reduce the size of the stored tables. -.PP -The first set of macro-operations is concerned with -compiling subtrees. -Recall that this is done by the -.II cexpr -routine. -In the following discussion the `current register' -is generally the argument register to -.II cexpr; -that is, the place where the result is desired. -The `next register' is numbered one -higher -than the current register. -(This explanation isn't fully true -because of complications, described below, involving -operations which require even-odd register pairs.) -.IP F -causes a recursive call to -the -.II rcexpr -routine to compile code which places the value of the first (left) -operand of the operator in the current register. -.RT -.IP F1 -generates code which places the value of the first operand in the -next register. -It is incorrectly used if there might be no next register; -that is, if the degree of difficulty of the first operand is not `easy;' -if not, another register might not be available. -.RT -.IP FS -generates code which pushes the value of the first operand on the stack, -by calling -.II rcexpr -specifying -.II sptab -as the table. -.LP -Analogously, -.IP "S, S1, SS" -compile the second (right) operand -into the current register, the next register, or onto the stack. -.LP -To deal with registers, there are -.IP R -which expands into the name of the current register. -.RT -.IP R1 -which expands into the name of the next register. -.RT -.IP R+ -which expands into the the name of the current register plus 1. -It was suggested above that this is the same as the next register, -except for complications; here is one of them. -Long integer variables have -32 bits and require 2 registers; in such cases the next register -is the current register plus 2. -The code would like to talk about both halves of the -long quantity, so R refers to the register with the high-order part -and R+ to the low-order part. -.RT -.IP R\- -This is another complication, involving division and mod. -These operations involve a pair of registers of which the odd-numbered -contains the left operand. -.II Cexpr -arranges that the current register is odd; -the R\- notation allows the code to refer to the next lower, -even-numbered register. -.LP -To refer to addressible quantities, there are the notations: -.IP A1 -causes generation of the address specified by the first operand. -For this to be legal, the operand must be addressible; its -key must contain an `a' -or a more restrictive specification. -.RT -.IP A2 -correspondingly generates the address of the second operand -providing it has one. -.PP -We now have enough mechanism to show a complete, if suboptimal, -table for the + operator on word or byte operands. -.DS -%n,z - F -.sp 1 -%n,1 - F - inc R -.sp 1 -%n,aw - F - add A2,R -.sp 1 -%n,e - F - S1 - add R1,R -.sp 1 -%n,n - SS - F - add (sp)+,R -.DE -The first two sequences handle some special cases. -Actually it turns out that handling a right operand of 0 -is unnecessary since the expression-optimizer -throws out adds of 0. -Adding 1 by using the `increment' instruction is done next, -and then the case where the right operand is addressible. -It must be a word quantity, since the PDP-11 lacks an `add byte' instruction. -Finally the cases where the right operand either can, or cannot, -be done in the available registers are treated. -.PP -The next macro-instructions are conveniently -introduced by noticing that the above table is suitable -for subtraction as well as addition, since no use is made of the -commutativity of addition. -All that is needed is substitution of `sub' for `add' -and `dec' for 'inc.' -Considerable saving of space is achieved by factoring out -several similar operations. -.IP I -is replaced by a string from another table indexed by the operator -in the node being expanded. -This secondary table actually contains two strings per operator. -.RT -.IP I\(fm -is replaced by the second string in the side table -entry for the current operator. -.PP -Thus, given that the entries for `+' and `\-' in the side table -(which is called -.II instab) -are `add' and `inc,' `sub' and `dec' -respectively, -the middle of of the above addition table can be written -.DS -%n,1 - F - I' R - -%n,aw - F - I A2,R -.DE -and it will be suitable for subtraction, -and several other operators, as well. -.PP -Next, there is the question of character and floating-point operations. -.IP B1 -generates the letter `b' if the first operand is a character, -`f' if it is float or double, and nothing otherwise. -It is used in a context like `movB1' -which generates a `mov', `movb', or `movf' -instruction according to the type of the operand. -.RT -.IP B2 -is just like B1 but applies to the second operand. -.RT -.IP BE -generates `b' if either operand is a character -and null otherwise. -.RT -.IP BF -generates `f' if the type of the operator node itself is float or double, -otherwise null. -.PP -For example, there is an entry in -.II efftab -for the `=' operator -.DS -%a,aw -%ab,a - IBE A2,A1 -.DE -Note first that two key specifications -can be applied to the same code string. -Next, observe that when a word is assigned to a byte or to a word, -or a word is assigned to a byte, -a single instruction, -a -.II mov -or -.II movb -as appropriate, does the job. -However, when a byte is assigned to a word, -it must pass through a register to implement the sign-extension rules: -.DS -%a,n - S - IB1 R,A1 -.DE -.PP -Next, there is the question of handling indirection properly. -Consider the expression `X + *Y', where X and Y are expressions, -Assuming that Y is more complicated than just a variable, -but on the other hand qualifies as `easy' in the context, -the expression would be compiled by placing the value of X in a register, -that of *Y in the next register, and adding the registers. -It is easy to see that a better job can be done -by compiling X, then Y (into the next register), -and producing the -instruction symbolized by `add (R1),R'. -This scheme avoids generating -the instruction `mov (R1),R1' -required actually to place the value of *Y in a register. -A related situation occurs -with the expression `X + *(p+6)', which -exemplifies a construction -frequent in structure and array references. -The addition table shown above would produce -.DS -[put X in register R] -mov p,R1 -add $6,R1 -mov (R1),R1 -add R1,R -.DE -when the best code is -.DS -[put X in R] -mov p,R1 -add 6(R1),R -.DE -As we said above, a key specification for a code table entry -may require an operand to have an indirection as its highest operator. -To make use of the requirement, -the following macros are provided. -.IP F* -the first operand must have the form *X. -If in particular it has the form *(Y + c), for some constant -.II c, -then code is produced which places the value of Y in -the current register. -Otherwise, code is produced which loads X into the current register. -.RT -.IP F1* -resembles F* except that the next register is loaded. -.RT -.IP S* -resembles F* except that the second operand is loaded. -.RT -.IP S1* -resembles S* except that the next register is loaded. -.RT -.IP FS* -The first operand must have the form `*X'. -Push the value of X on the stack. -.RT -.IP SS* -resembles FS* except that it applies to the second operand. -.LP -To capture the constant that may have been skipped over -in the above macros, there are -.IP #1 -The first operand must have the form *X; -if in particular it has the form *(Y + c) for -.II c -a constant, then the constant is written out, -otherwise a null string. -.RT -.IP #2 -is the same as #1 except that the second operand is used. -.LP -Now we can improve the addition table above. -Just before the `%n,e' entry, put -.DS -%n,ew* - F - S1* - add #2(R1),R -.DE -and just before the `%n,n' put -.DS -%n,nw* - SS* - F - add *(sp)+,R -.DE -When using the stacking macros there is no place to use -the constant -as an index word, so that particular special case doesn't occur. -.PP -The constant mentioned above can actually be more -general than a number. -Any quantity acceptable to the assembler as an expression will do, -in particular the address of a static cell, perhaps with a numeric offset. -If -.II x -is an external character array, -the expression `x[i+5] = 0' will generate -the code -.DS -mov i,r0 -clrb x+5(r0) -.DE -via the table entry (in the `=' part of -.II efftab) -.DS -%e*,z - F - I'B1 #1(R) -.DE -Some machine operations place restrictions on the registers -used. -The divide instruction, used to implement the divide and mod -operations, requires the dividend to be placed in the odd member -of an even-odd pair; -other peculiarities -of multiplication make it simplest to put the multiplicand -in an odd-numbered register. -There is no theory which optimally accounts for -this kind of requirement. -.II Cexpr -handles it by checking for a multiply, divide, or mod operation; -in these cases, its argument register number is incremented by -one or two so that it is odd, and if the operation was divide or mod, -so that it is a member of a free even-odd pair. -The routine which determines the number of registers required -estimates, conservatively, that -at least two registers are required for a multiplication -and three for the other peculiar operators. -After the expression is compiled, -the register where the result actually ended up is returned. -(Divide and mod are actually the same operation except for the -location of the result). -.PP -These operations are the ones which cause results to end up in -unexpected places, -and this possibility adds a further level of complexity. -The simplest way of handling the problem is always to move the -result to the place where the caller expected it, -but this will produce unnecessary register moves in many -simple cases; `a = b*c' would generate -.DS -mov b,r1 -mul c,r1 -mov r1,r0 -mov r0,a -.DE -The next thought is used the passed-back -information as to where the result landed to change the notion of the current -register. -While compiling the `=' operation above, which comes from a -table -entry -like -.DS -%a,e - S - mov R,A1 -.DE -it is sufficient to redefine the meaning of `R' -after processing the `S' which does the multiply. -This technique is in fact used; the tables are written in such a way -that correct code is produced. -The trouble is that the technique cannot be used in general, -because it invalidates the count of the number of registers -required for an expression. -Consider just `a*b + X' where X is some expression. -The algorithm assumes that the value of a*b, -once computed, requires just one register. -If there are three registers available, and X requires two registers to -compute, then this expression will match a key specifying -`%n,e'. -If a*b is computed and left in register 1, then there are, contrary -to expectations, no longer two registers available to compute X, -but only one, and bad code will be produced. -To guard against this possibility, -.II cexpr -checks the result returned by recursive calls which implement -F, S and their relatives. -If the result is not in the expected register, then the number of -registers required by the other operand is checked; -if it can be done using those registers which remain even -after making unavailable the unexpectedly-occupied -register, then -the notions of the `next register' and possibly the `current -register' are redefined. -Otherwise a register-copy instruction is produced. -A register-copy is also always produced -when the current operator is one of those which have odd-even requirements. -.PP -Finally, there are a few loose-end macro operations -and facts about the tables. -The operators: -.IP V -is used for long operations. -It is written with an address like a machine instruction; -it expands into `adc' (add carry) if the operation -is an additive operator, -`sbc' (subtract carry) if the operation is a subtractive -operator, and disappears, along with the rest of the line, otherwise. -Its purpose is to allow common treatment of logical -operations, which have no carries, and additive and subtractive -operations, which generate carries. -.RT -.IP T -generates a `tst' instruction if the first operand -of the tree does not set the condition codes correctly. -It is used with divide and mod operations, -which require a sign-extended 32-bit operand. -The code table for the operations contains an `sxt' -(sign-extend) instruction to generate the high-order part of the -dividend. -.RT -.IP H -is analogous to the `F' and `S' macros, -except that it calls for the generation of code for -the current tree -(not one of its operands) -using -.II regtab. -It is used in -.II cctab -for all the operators which, when executed normally, -set the condition codes properly according to the result. -It prevents a `tst' instruction from being generated for -constructions like `if (a+b) ...' -since after calculation of the value of -`a+b' a conditional branch can be written immediately. -.PP -All of the discussion above is in terms of operators with operands. -Leaves of the expression tree (variables and constants), however, -are peculiar in that they have no operands. -In order to regularize the matching process, -.II cexpr -examines its operand to determine if it is a leaf; -if so, it creates a special `load' operator whose operand -is the leaf, and substitutes it for the argument tree; -this allows the table entry for the created operator -to use the `A1' notation to load the leaf into a register. -.PP -Purely to save space in the tables, -pieces of subtables can be labelled and referred to later. -It turns out, for example, -that rather large portions of the -the -.II efftab -table for the `=' and `=+' operators are identical. -Thus `=' has an entry -.DS -%[move3:] -%a,aw -%ab,a - IBE A2,A1 -.DE -while part of the `=+' table is -.DS -%aw,aw -% [move3] -.DE -Labels are written as `%[ ... : ]', -before the key specifications; -references -are written -with `% [ ... ]' -after the key. -Peculiarities in the implementation -make it necessary that labels appear before references to them. -.PP -The example illustrates the utility -of allowing separate keys -to point to the same code string. -The assignment code -works properly if either the right operand is a word, or the left operand -is a byte; -but since there is no `add byte' instruction the addition code -has to be restricted to word operands. //GO.SYSIN DD cdoc3 echo cdoc4 sed 's/.//' >cdoc4 <<'//GO.SYSIN DD cdoc4' -.SH -Delaying and reordering -.PP -Intertwined with the code generation routines are two other, -interrelated processes. -The first, implemented by a routine called -.II delay, -is based on the observation that -naive code generation for the expression -`a = b++' would produce -.DS -mov b,r0 -inc b -mov r0,a -.DE -The point is that the table for postfix ++ has to preserve -the value of -.II b -before incrementing it; -the general way to do this is to preserve its value in a register. -A cleverer scheme would generate -.DS -mov b,a -inc b -.DE -.II Delay -is called for each expression input to -.II rcexpr, -and it searches for postfix ++ and \-\- -operators. -If one is found applied to a variable, -the tree is patched to bypass the operator -and compiled as it stands; -then the increment or decrement itself is done. -The effect is as if `a = b; b++' had been written. -In this example, of course, the user himself could have done the same job, -but more complicated examples are easily constructed, for example -`switch (x++)'. -An essential restriction is that the condition codes not -be required. -It would be incorrect to compile -`if (a++) ...' -as -.DS -tst a -inc a -beq ... -.DE -because the `inc' destroys the required setting of the condition codes. -.PP -Reordering is a similar sort of optimization. -Many cases which it detects are useful -mainly with register variables. -If -.II r -is a register variable, -the expression `r = x+y' is best compiled -as -.DS -mov x,r -add y,r -.DE -but the codes tables would produce -.DS -mov x,r0 -add y,r0 -mov r0,r -.DE -which is in fact preferred if -.II r -is not a register. -(If -.II r -is not a register, -the -two sequences are the same size, but the -second is slightly faster.) -The scheme is to compile the expression as if it had been written -`r = x; r =+ y'. -The -.II reorder -routine -is called with a pointer to each tree that -.II rcexpr -is about to compile; -if it has the right characteristics, -the `r = x' tree is constructed and passed recursively -to -.II rcexpr; -then the original tree is modified to read `r =+ y' -and the calling instance of -.II rcexpr -compiles that instead. -Of course the whole business is itself recursive -so that more extended forms of the same phenomenon are -handled, like `r = x + y | z'. -.PP -Care does have to be taken -to avoid `optimizing' an expression like `r = x + r' -into `r = x; r =+ r'. -It is required that the right operand of the expression on the right -of the `=' be a ', distinct from the register variable. -.PP -The second case that -.II reorder -handles is expressions of the form `r = X' used as a subexpression. -Again, the code out of the tables for -`x = r = y' -would be -.DS -mov y,r0 -mov r0,r -mov r0,x -.DE -whereas if -.II r -were a register it would be better to produce -.DS -mov y,r -mov r,x -.DE -When -.II reorder -discovers that -a register variable is being assigned to -in a subexpression, -it calls -.II rcexpr -recursively to -compile the subexpression, then fiddles the tree passed -to it so that the register variable itself appears -as the operand instead of the whole subexpression. -Here care has to be taken to avoid an infinite regress, -with -.II rcexpr -and -.II reorder -calling each other forever to handle assignments to registers. -.PP -A third set of cases treated by -.II reorder -comes up when any name, not necessarily a register, -occurs as a left operand of an assignment operator other than `=' -or as an operand of prefix `++' or `\-\-'. -Unless condition-code tests are involved, -when a subexpression like `(a =+ b)' is seen, -the assignment is performed and the argument tree -modified so that -.II a -is its operand; -effectively -`x + (y =+ z)' is compiled as `y =+ z; x + y'. -Similarly, prefix increment and decrement are pulled out -and performed first, then the remainder of the expression. -.PP -Throughout code generation, -the expression optimizer is called whenever -.II delay -or -.II reorder -change the expression tree. -This allows some special cases to be found that otherwise -would not be seen. //GO.SYSIN DD cdoc4