MLRISC
MLRISC
Contributors
Requirements
How to Obtain MLRISC
Overview
Problem Statement
Contributions
MLRISC Based Compiler
MLRISC Intermediate Representation
MLRisc Generation
Back End Optimizations
Register Allocation
Machine Description
Garbage Collection Safety
System Integration
Optimizations
Graphical Interface
Line Counts
Systems Using MLRISC
Future Work
System
Architecture of MLRISC
The MLTREE Language
MLTree Extensions
MLTree Utilities
Instruction Selection
Assemblers
Machine Code Emitters
Delay Slot Filling
Span Dependency Resolution
The Graph Library
The Graph Visualization Library
Basic Compiler Graphs
The MLRISC IR
SSA Optimizations
ILP Optimizations
Optimizations for VLIW/EPIC Architectur...
Register Allocator
Back Ends
The Alpha Back End
The PA RISC Back End
The Sparc Back End
The Intel x86 Back End
The PowerPC Back End
The MIPS Back End
The TI C6x Back End
Basic Types
Annotations
Cells
Cluster
Client Defined Constants
Client Defined Pseudo Ops
Instructions
Instruction Streams
Label Expressions
Labels
Regions
Regmap

The Alpha Back End


The Alpha Back End
Trap Shadows, Floating Exceptions, and ...

Trap Shadows, Floating Exceptions, and Denormalized Numbers on the DEC Alpha

By Andrew W. Appel and Lal George, Nov 28, 1995

See section 4.7.5.1 of the Alpha Architecture Reference Manual.

The Alpha has imprecise exceptions, meaning that if a floating point instruction raises an IEEE exception, the exception may not interrupt the processor until several successive instructions have completed. ML, on the other hand, may want a "precise" model of floating point exceptions.

Furthermore, the Alpha hardware does not support denormalized numbers (for ``gradual underflow''). Instead, underflow always rounds to zero. However, each floating operation (add, mult, etc.) has a trapping variant that will raise an exception (imprecisely, of course) on underflow; in that case, the instruction will produce a zero result AND an exception will occur. In fact, there are several variants of each instruction; three variants of MULT are:

MULT s1,s2,d
truncate denormalized result to zero; no exception
MULT/U s1,s2,d
truncate denormalized result to zero; raise UNDERFLOW
MULT/SU s1,s2,d
software completion, producing denormalized result

The hardware treats the MULT/U and MULT/SU instructions identically, truncating a denormalized result to zero and raising the UNDERFLOW exception. But the operating system, on an UNDERFLOW exception, examines the faulting instruction to see if it's an /SU form, and if so, recalculates s1*s2, puts the right answer in d, and continues, all without invoking the user's signal handler.

Because most machines compute with denormalized numbers in hardware, to maximize portability of SML programs, we use the MULT/SU form. (and ADD/SU, SUB/SU, etc.) But to use this form successfully, certain rules have to be followed. Basically, d cannot be the same register as s1 or s2, because the opsys needs to be able to recalculate the operation using the original contents of s1 and s2, and the MULT/SU instruction will overwrite d even if it traps.

More generally, we may want to have a sequence of floating-point instructions. The rules for such a sequence are:

1. The sequence should end with a TRAPB (trap barrier) instruction. (This could be relaxed somewhat, but certainly a TRAPB would be a good idea sometime before the next branch instruction or update of an ML reference variable, or any other ML side effect.) 2. No instruction in the sequence should destroy any operand of itself or of any previous instruction in the sequence. 3. No two instructions in the sequence should write the same destination register.

We can achieve these conditions by the following trick in the Alpha code generator. Each instruction in the sequence will write to a different temporary; this is guaranteed by the translation from ML-RISC. At the beginning of the sequence, we will put a special pseudo-instruction (we call it DEFFREG) that ``defines'' the destination register of the arithmetic instruction. If there are K arithmetic instructions in the sequence, then we'll insert K DEFFREG instructions all at the beginning of the sequence. Then, each arithop will not only ``define'' its destination temporary but will ``use'' it as well. When all these instructions are fed to the liveness analyzer, the resulting interference graph will then have inteference edges satisfying conditions 2 and 3 above.

Of course, DEFFREG doesn't actually generate any code. In our model of the Alpha, every instruction generates exactly 4 bytes of code except the ``span-dependent'' ones. Therefore, we'll specify DEFFREG as a span-dependent instruction whose minimum and maximum sizes are zero.

At the moment, we do not group arithmetic operations into sequences; that is, each arithop will be preceded by a single DEFFREG and followed by a TRAPB. To avoid the cost of all those TRAPB's, we should improve this when we have time. Warning: Don't put more than 31 instructions in the sequence, because they're all required to write to different destination registers!

What about multiple traps? For example, suppose a sequence of instructions produces an Overflow and a Divide-by-Zero exception? ML would like to know only about the earliest trap, but the hardware will report BOTH traps to the operating system. However, as long as the rules above are followed (and the software-completion versions of the arithmetic instructions are used), the operating system will have enough information to know which instruction produced the trap. It is very probable that the operating system will report ONLY the earlier trap to the user process, but I'm not sure.

For a hint about what the operating system is doing in its own trap-handler (with software completion), see section 6.3.2 of ``OpenVMS Alpha Software'' (Part II of the Alpha Architecture Manual). This stuff should apply to Unix (OSF1) as well as VMS.


Lal George
Allen Leung
SML/NJ Validate this page
Generated by mltex2html
Last modified: Thu Jan 9 19:38:15 EST 2003 by leunga@slinky