A.D.A. Amiga Demoscene Archive

        Welcome guest!

  

  

  

log in with SceneID

  

Demos Amiga Demoscene Archive Forum / Coding / trapf

 

Author Message
dalton
Member
#1 - Posted: 22 Mar 2011 21:22
Reply Quote
On the topic of micro-optimizations, here's a new (?) one.

I was reading the coldfire manual today, and I found that it suggested using trapf instead of unconditional bra.b in if-else type clauses with very short jump distance (+2 or +4).


cmp d0,d1
beq.b hej
moveq #1,d2
bra.b hopp
hej:
moveq #2,d2
hopp:


Here the "bra.b hopp" can be replaced by encoding the followin moveq as the argument to a trapf.w instruction. The moveq would not be executed, unless the beq.b is taken.

This should possible on 68k aswell, allthough its not mentioned in the PRM. But the CF manual does not say anything about the benefits of this. Could anyone say if this is really faster than bra.b?
Blueberry
Member
#2 - Posted: 23 Mar 2011 09:10
Reply Quote
According to Chapter 10 of the M68060 UM:

A branch instruction which is predicted as taken (which an unconditional branch typically will be after the first execution) takes 0 cycles, as long as it follows one or two standard instructions (to be executed in the primary and secondary execution pipelines simultaneously with the branch).

A TRAPF instruction takes 1 cycle and can execute in either execution pipeline.

In your example, the situation depends on which pipeline the CMP executes in.

If the CMP executes in the pOEP, it can pair with the MOVEQ #1, assuming the BEQ is predicted as not taken (page 10-7 bottom). The BRA is not able to execute simultaneously with the BEQ, so it takes 1 cycle. Total: 2 cycles.

In the TRAPF case, the TRAPF takes 1 cycle (for a total of 2 cycles), but it can pair with the instruction following the hopp label (assuming this instruction can execute in the sOEP). Thus, TRAPF is potentially better in this case.

If the CMP executes in the sOEP, the MOVEQ #1 pairs with the BRA, for a total of 2 cycles. In the TRAPF case, the MOVEQ #1 pairs with the TRAPF, again totalling 2 cycles. Thus the two versions are equally fast in this case.

In general, the situation depends on which instructions are nearby (especially branches). Sometimes TRAPF is faster, sometimes BRA is faster, sometimes they are the same.

This all assumes that instruction fetch and branch prediction do not get upset about a branch into the middle of an instruction. Some architectures do not like that. I don't know how the 68060 behaves here.
ZEROblue
Member
#3 - Posted: 26 Mar 2011 12:50 - Edited
Reply Quote
I always went with the shorter:

move #1, d2
cmp d0, d1
beq hej
move #2, d2
hej:


and if you can alter your code to work with $00 and $FF in the lower byte and guarantee/ignore the contents of the upper bytes then f.ex:

cmp d0, d1
seq d2


btw for the real M68K old-schoolers, the TRAPcc instructions are only available on 68020 and up.
dalton
Member
#4 - Posted: 6 Apr 2011 08:04
Reply Quote
Apparently, freescale later changed their minds and wrote an errata about the branch cache being corrupted in some cases when using this branch method. So with no obvious advantages I guess it's best to just leave it alone...

 

  Please log in to comment

  

  

  

 

A.D.A. Amiga Demoscene Archive, Version 3.0