Yep, bus access does not interfer with the CPU's/FPU's internal operation. Be careful with what memory you're touching when feeding the FPU with data though.
------
All instructions are initially parsed by the CPU. If the FPU operation only works on FPU registers, the CPU will wait until the previous FPU operation has completed, then dispatch the current instruction to the FPU, then the CPU continues processing instructions.
FPU instructions on FP registers are classified as pOEP-but-allows-sOEP.
The FPU itself is not pipelined internally.
Check the 68060UM chapter 10 for timings (it is available as PDF from Freescale,
direct link ).
So, for instance, fadd/fsub/fmul.x fp0,fp1 takes 3 cycles, according to manual. Therefore a code sequence like this runs at 2 instructions per cycle:
fadd fp0,fp1 ; pOEP ; FPU op cycle 1
move.l d2,d3 ; sOEP
move.l d0,d1 ; pOEP ; FPU op cycle 2
move.l d2,d3 ; sOEP
move.l d0,d1 ; pOEP ; FPU op cycle 3
move.l d2,d3 ; sOEP
fadd fp2,fp3 ; pOEP ; FPU op cycle 1
... etc
However, FPU operations that have memory or register floating-point operands (like fadd.s (a0)+,fp0) may go a bit slower. Float<->int conversions (like fmove.w d0,fp0) take approx. 3 extra cycles (this is specified in 68060UM chapter 10) and I think that the CPU stalls for those 3 cycles (not sure).
Also:
The FPU and the CPU seem to not share any resources at all. So you can do FDIV and DIVS in parallell, as well as FMUL at the same time as MULU/MULS.
That's all I know off the top of my head. I suggest that you set up a timing harness and conduct some experiments of your own...