
Author 
Message 
doom
Member 
Does not compute.

z5_
Member 
So i can use divs without feeling guilty then :) ?

noname
Member 
Target hardware is 060 nowadays so by all means use it if you need it!
As a personal rule of thumb I only started feeling guilty when the overall effect was running slower than 25 fps.

doom
Member 
Oh. You asked how to "avoid a division by a number which is not a power of 2". I don't really get what you mean by that.

doom
Member 
But yeah, division isn't too slow. On the 060 it doesn't pay to use division tables or shifting magic, except for constant divisors that are powers of two.

Kalms
Member 
Approximate division by 20 can be done by multiplying with the reciprocal:
If you don't need an exact result, then you can multiply by 65536/20, and just use the top 16 bits of the result.
For higher precision, adjust the multiplier by premultiplying it by 2^n until it lands in the range $8000 .. $ffff (assuming unsigned value), and compensate by shifting down those extra n bits after the multiplication.
But I usually just go with the ordinary division op; they're usually not common enough to warrant the hassle of replacement by reciprocalmultiplication.

winden
Member 
hmmm... maybe you recall from maths studying that multiplying and dividing are "commutative"... this means you can reorder the operations and the result will be the same. So we use that for our advantage...
(x / 20) =
(x * (1)) / 20 =
(x * (65536 / 65536)) / 20 =
(x * (65536 / 20)) / 65536 =
(x * (65536 / 20)) >> 16
where the ">>" is of course a right arithmetic shift, and (65536 / 20) can be precalced or even typed as is on the code (works in devpac for sure):
muls.l #(65536/20),d0
you could also use 256 as the base: (x * (256 / 20)) >> 8
the only problem with fixed point is to run out of bits... in the original example you X value should be <32767 else the result will be broken...
btw, google a bit for "fixed point arithmetic" and try to play a bit with it via singlespepping various simple formulas and watching the values change... this is one of the parts of lowpowerdevice assembler programming that is really enjoyable to learn and master :)

Blueberry
Member 
There are also exact ways of transforming division by an integer constant into multiplication. This paper describes some general methods:
http://swox.com/~tege/divcnstpldi94.pdf
Not exactly simple, though...

z5_
Member 
Is something to be gained from using immediate and address specific commando's cmpi, addi, subi, adda,... instead of just cmp, add,...?

doom
Member 
No. The assembler automatically translates for you. add <ea>,An will always translate to adda <ea>,An etc.
It's worth noting the differences in how the instructions behave, though, e.g. add.w d0,a0 doesn't behave like add.w d0,d1. And there's no adda.b instruction. And so on.

z5_
Member 
I was looking at some code from Winden/Network (the amycoders 3d starfield ) and noticed the following:
 is there any advantage in "eor.l d5,d5" instead of "clr.l d5" or "moveq #0,d5"?
 is it generally faster to do stuff between dataregisters (with the constant being loaded into one of the dataregisters) instead of constant,dataregister? Maybe useful in innerloops if you've got dataregisters free. I know this is the case with variables, but does the same apply with constants?
for example:
move.l #$0000ffff,d5
loop
and.l d5,d6
do_loop
instead of
loop
and.l #$0000ffff,d6
do_loop
 i have the impression that commands like or, and, not,... are faster than any other command?
for example:
"move.l #$0000ffff,d5"
"eor.l d5,d5
not.w d5"
Any difference in speed?

Kalms
Member 
 is there any advantage in "eor.l d5,d5" instead of "clr.l d5" or "moveq #0,d5"?
68000: moveq takes 4 cycles, clr.l takes 6 cycles, eor.l takes 8 cycles.
6802068030: they're all ~equally fast.
68060: moveq and clr.l can avoid some pipeline dependencies, which eor.l do not.
All three instructions are 2 bytes in size.
I suggest that you use moveq for clearing data registers, as that is what is traditionally used.
 is it generally faster to do stuff between dataregisters (with the constant being loaded into one of the dataregisters) instead of constant,dataregister? Maybe useful in innerloops if you've got dataregisters free. I know this is the case with variables, but does the same apply with constants?
If you preload the value into a data register, the instruction gets smaller. This helps on some of the 68k's.
On the other hand, it will cost you an extra register... which you might have needed for something else... so sometimes it is not worth it.
 i have the impression that commands like or, and, not,... are faster than any other command?
for example:
"move.l #$0000ffff,d5"
"eor.l d5,d5
not.w d5"
Any difference in speed?
The move.l takes 6 bytes; the eor/not combination takes 4 bytes. This might have been done in order to keep a routine small enough. Or perhaps winden enjoyed obfuscating his code the night he wrote that piece. ;)
On 68020+, there is a smallest time unit.
68020030: 2 cycles
68040: 1 cycle
68060: 1 cycle, pairable
Most "simple" operations (move clr add sub and or eor not cmp, and more) between a pair of registers take just this minimum amount of time.
If you write your code such that you only use these minimumtimeunit instructions inside your loops, optimizing the calculations becomes easy: fewer instructions > faster code.
(Optimizing the memory accesses is the other big thing.)
But, to get back to your example. On 68060, the move.l #imm,dn also takes one minimumtimeunit, so if the code is small enough to fit into the instruction cache, the move.l #imm,dn would be 1 minimumtimeunit quicker.

z5_
Member 
The move.l takes 6 bytes; the eor/not combination takes 4 bytes. This might have been done in order to keep a routine small enough. Or perhaps winden enjoyed obfuscating his code the night he wrote that piece. ;)
That's the downside of the amycoder examples, especially the compos. They are all doing so much tricks to get the code size small (especially in the shortest entry compo). For newbies, it's not always easy to follow :o)


