result go directly to the integer register file rather than the
Lo register. The portion of the multiply that would have nor-
mally gone into the Hi register is discarded. For applica-
tions where it is known that the high half of the multiply
result is not required, using the MUL instruction eliminates
the necessity of executing an explicit MFLO instruction.
Table 2 gives the latencies of the floating-point instructions
in internal processor cycles.
Table 2: Floating-Point Instruction Cycles
Operation
Latency
Repeat Rate
fadd
fsub
fmult
4
1
4
1
The multiply-add instructions (MAD) multiplies two oper-
ands and adds the resulting product to the current contents
of the Hi and Lo registers. The multiply-accumulate opera-
tion is the core primitive of almost all signal processing
algorithms, allowing the RM5271 to eliminate the need for a
separate DSP engine in many embedded applications.
4/5
1/2
fmadd
fmsub
fdiv
4/5
1/2
4/5
1/2
21/36
19/34
fsqrt
21/36
19/34
Floating-Point Co-Processor
frecip
frsqrt
21/36
19/34
38/68
36/66
The RM5271 incorporates a high-performance fully pipe-
lined floating-point coprocessor which includes a floating-
point register file and autonomous execution units for multi-
ply/add/convert and divide/square root. The floating-point
coprocessor is a tightly coupled execution unit, decoding
and executing instructions in parallel with, and in the case
of floating-point loads and stores, in cooperation with the
integer unit. The superscalar capabilities of the RM5271
allow floating-point computation instructions to be issued
concurrently with integer instructions.
fcvt.s.d
fcvt.s.w
fcvt.s.l
fcvt.d.s
fcvt.d.w
fcvt.d.l
fcvt.w.s
fcvt.w.d
fcvt.l.s
fcvt.l.d
fcmp
4
6
6
4
4
4
4
4
4
4
1
1
1
1
1
1
3
3
1
1
1
1
1
1
1
1
1
1
1
1
Floating-Point Unit
The RM5271 floating-point execution unit supports single
and double precision arithmetic, as specified in the IEEE
Standard 754. The execution unit is broken into a separate
divide/square root unit and a pipelined multiply/add unit.
Overlap of the divide/square root and multiply/add instruc-
tion is supported.
fmov
fmovc
fabs
fneg
The RM5271 maintains fully precise floating-point excep-
tions while allowing both overlapped and pipelined opera-
tions. Precise exceptions are extremely important in object-
oriented programming environments and highly desirable
for debugging in any environment.
Floating-Point General Register File
The floating-point general register file (FGR) is made up of
thirty-two 64-bit registers. With the floating-point load dou-
ble (LDC1) and store double (SDC1) instructions, the float-
ing-point unit can take advantage of the 64-bit wide data
cache and issue a floating-point co-processor load or store
doubleword instruction in every cycle.
Floating-point operations include;
•
•
•
•
•
•
•
•
•
add
subtract
multiply
divide
The floating-point control register space contains two regis-
ters; one for determining configuration and revision infor-
mation for the coprocessor and one for control and status
information. These are primarily used for diagnostic soft-
ware, exception handling, state saving and restoring, and
control of rounding modes. To support superscalar opera-
tion, the FGR has four read ports and two write ports, and
is fully bypassed to minimize operation latency in the pipe-
line. Three of the read ports and one write port are used to
support the combined multiply-add instruction while the
fourth read and second write port allows a concurrent float-
ing-point load or store.
square root
reciprocal
reciprocal square root
conditional moves
conversion between fixed-point and floating-
point format
•
•
conversion between floating-point formats
floating-point compare
4
RM5271 Microprocessor, Document Rev. 1.3
Quantum Effect Devices
www.qedinc.com