A SERVICE OF

logo

310 Instruction Latencies Appendix C
25112 Rev. 3.06 September 2005
Software Optimization Guide for AMD64 Processors
FISTP [mem64int] DFh mm-111-xxx DirectPath FSTORE 4
FISTTP [mem16int] DFh mm-010-xxx DirectPath FSTORE 4
FISTTP [mem32int] DBh mm-010-xxx DirectPath FSTORE 4
FISTTP [mem64int] DDh mm-010-xxx DirectPath FSTORE 4
FISUB [mem32int] DAh mm-100-xxx Double - 11
FISUB [mem16int] DEh mm-100-xxx Double - 11
FISUBR [mem32int] DAh mm-101-xxx Double - 11
FISUBR [mem16int] DEh mm-101-xxx Double - 11
FLD ST(i) D9h 11-000-xxx DirectPath FADD/FMUL 2 1
FLD [mem32real] D9h mm-000-xxx DirectPath FADD/FMUL/
FSTORE
4
FLD [mem64real] DDh mm-000-xxx DirectPath FADD/FMUL/
FSTORE
4
FLD [mem80real] DBh mm-101-xxx VectorPath - 13
FLD1 D9h 11-101-000 DirectPath FSTORE 4
FLDCW [mem16] D9h mm-101-xxx VectorPath - 11
FLDENV [mem14byte] D9h mm-100-xxx VectorPath - 129
FLDENV [mem28byte] D9h mm-100-xxx VectorPath - 129
FLDL2E D9h 11-101-010 DirectPath FSTORE 4
FLDL2T D9h 11-101-001 DirectPath FSTORE 4
FLDLG2 D9h 11-101-100 DirectPath FSTORE 4
FLDLN2 D9h 11-101-101 DirectPath FSTORE 4
FLDPI D9h 11-101-011 DirectPath FSTORE 4
FLDZ D9h 11-101-110 DirectPath FSTORE 4
FMUL ST, ST(i) D8h 11-001-xxx DirectPath FMUL 4 1
FMUL ST(i), ST DCh 11-001-xxx DirectPath FMUL 4 1
Table 15. x87 Floating-Point Instructions (Continued)
Syntax
Encoding
Decode
type
FPU
pipe(s)
Latency Note
First
byte
Second
byte
ModRM byte
Notes:
1. The last three bits of the ModRM byte select the stack entry ST(i).
2. These instructions have an effective latency as shown. However, these instructions generate an internal NOP
with a latency of two cycles but no related dependencies. These internal NOPs can be executed at a rate of
three per cycle and can use any of the three execution resources.
3. This is a VectorPath decoded operation that uses one execution pipe (one ROP).
4. There is additional latency associated with this instruction. ā€œeā€ represents the difference between the exponents
of the divisor and the dividend. If ā€œsā€ is the number of normalization shifts performed on the result, then
n = (s+1)/2 where (0 <= n <= 32).
5. The latency provided for this operation is the best-case latency.
6. The three latency numbers represent the latency values for precision control settings of single precision, double
precision, and extended precision, respectively.