In compiler optimization, register allocation is the process of multiplexing a large number of target program variables onto a small number of CPU registers. The goal is to keep as many operands as possible in registers to maximise the execution speed of software programs. Register allocation can happen over a basic block (local register allocation), over a whole function/procedure (global register allocation), or in-between functions as a calling convention (interprocedural register allocation). Most computer programs need to process large numbers of different data items. However, most CPUs can only perform operations on a small fixed number of "slots" called registers. Even on machines that support memory operands, register access is considerably faster than memory access. Variables not allocated to registers must be loaded in and out of RAM whenever they are used. Register spilling occurs where there are more live variables than the machine has registers. When a compiler is generating machine code and there are more live variables than the machine has registers, it has to transfer or "spill" some variables from registers to memory. This incurs a certain cost, as access from memory is typically slower than access from a register. In compilers, register pressure occurs when there are more variables to allocate than there are registers available. This typically results in register spilling.
[edit] ChallengesRegister allocation is an NP-complete problem.[1][2] The number of variables in a typical program is much larger than the number of available registers in a processor, so the contents of some variables have to be spilled (saved) into memory locations. The cost of such spilling is minimised by spilling the least frequently used variables first, but it is not easy to know which variables will be used the least. In addition to this the hardware and operating system may impose restrictions on the usage of some registers. [edit] Global register allocationLike most other compiler optimizations, register allocation is based on the result of some compiler analysis, typically one of live variable analysis, a form of data flow analysis, or the construction of static single assignment form, which encodes similar information into the name space of the code. Global register allocators, which consider an entire procedure, have been built using several paradigms, including bin packing (e.g., the DEC VAX compilers), priority-based graph coloring (e.g., Chow-style allocators), and bottom-up graph coloring (e.g., Chaitin-style allocators). Of these approaches, the Chaitin-style allocators appear to be most common. Chaitin's algorithm can be divided into two phases:
In phase two, an interference graph is constructed where nodes are symbolic registers (created in the previous phase) and an arc connects two nodes if they are alive at the same time. More precisely, if one variable is alive at the time the other is defined then they are said to interfere. If the graph can be colored with R colors then the variables can be stored in R registers. This insight was pointed out by John Cocke, "father of the RISC architecture". The problem is that coloring a graph is an NP-hard problem. The key insight to Chaitin’s algorithm is called the degree < R rule which is as follows. Given a graph G which contains a node N with degree less than R, G is R-colorable iff the graph G’, where G’ is G with node N removed, is R-colorable. The proof is obvious in one direction: if a graph G can be colored with R colors then the graph G’ can be created without changing the coloring. In the other direction, suppose we have an R-coloring of G’. Since N has a degree of less than R there must be at least one color that is not in use for a node adjacent to N. We can color N with this color.
While G cannot be R-colored
While graph G has a node N with degree less than R
Remove N and its associated edges from G and push N on a stack S
End While
If the entire graph has been removed then the graph is R-colorable
While stack S contains a node N
Add N to graph G and assign it a color from the R colors
End While
Else graph G cannot be colored with R colors
Simplify the graph G by choosing an object to spill and remove its node N from G
(spill nodes are chosen based on object’s number of definitions and references)
End While
This algorithm is O(n^2). This algorithm can be improved through subsumption which is the act of coalescing nodes which are the source and target of copy operations into a single node before running the algorithm. This reduces the number of nodes to color but can increase the degree of any coalesced node. This can only be done when the nodes do not interfere with each other, however, and aggressive coalescing can lead to uncolorable graphs. (Preston Briggs’ thesis work introduces safer methods to determine which nodes to coalesce and spill. Based on his improvements this algorithm is often called the Chaitin-Briggs algorithm.) The subsumption step is slow and is not done in fast register allocators. [edit] Recent developmentsGraph coloring allocators produce efficient code, but their allocation time is high. In cases of static compilation, allocation time is not a significant concern. In cases of dynamic compilation, such as just-in-time (JIT) compilers, fast register allocation is important. An efficient technique proposed by Poletto and Sarkar is linear scan allocation. This technique requires only a single pass over the list of variable live ranges. Ranges with short lifetimes are assigned to registers, whereas those with long lifetimes tend to be spilled, or reside in memory. The results are on average only 12% less efficient than graph coloring allocators. The linear scan algorithm follows:
Cooper and Dasgupta recently developed a "lossy" Chaitin-Briggs graph coloring algorithm suitable for use in a JIT [3]. The "lossy" moniker refers to the imprecision the algorithm introduces into the interference graph. This optimization reduces the costly graph building step of Chaitin-Briggs making it suitable for runtime compilation. Experiments indicate that this lossy register allocator outperforms linear scan on the majority of tests used. "Optimal" register allocation algorithms based on Integer Programming have been developed by Goodwin and Wilken for regular architectures. These algorithms have been extended to irregular architectures by Kong and Wilken. While the worst case execution time is exponential, the experimental results show that the actual time is typically of order O(n2.5) of the number of constraints[4]. The possibility of doing register allocation on SSA-form programs has become focus of recent research[5]. The interference graphs of SSA-form programs are chordal, and as such, they can be colored in polynomial time. [edit] References
Directorio de Enlaces Directorio dmoz Directorio espejo dmoz Pedro Bernardo |