A.D.A. Amiga Demoscene Archive

Amiga Demoscene Archive Forum / Coding / Optimizing + good coding habits

Page: «« 1 2

Author	Message
doom Member	#1 - Posted: 11 May 2007 15:52 Reply Quote Does not compute.
z5_ Member	#2 - Posted: 11 May 2007 16:22 Reply Quote So i can use divs without feeling guilty then :) ?
noname Member	#3 - Posted: 11 May 2007 16:49 Reply Quote Target hardware is 060 nowadays so by all means use it if you need it! As a personal rule of thumb I only started feeling guilty when the overall effect was running slower than 25 fps.
doom Member	#4 - Posted: 11 May 2007 16:54 Reply Quote Oh. You asked how to "avoid a division by a number which is not a power of 2". I don't really get what you mean by that.
doom Member	#5 - Posted: 11 May 2007 17:00 Reply Quote But yeah, division isn't too slow. On the 060 it doesn't pay to use division tables or shifting magic, except for constant divisors that are powers of two.
Kalms Member	#6 - Posted: 11 May 2007 19:14 Reply Quote Approximate division by 20 can be done by multiplying with the reciprocal: If you don't need an exact result, then you can multiply by 65536/20, and just use the top 16 bits of the result. For higher precision, adjust the multiplier by pre-multiplying it by 2^n until it lands in the range $8000 .. $ffff (assuming unsigned value), and compensate by shifting down those extra n bits after the multiplication. But I usually just go with the ordinary division op; they're usually not common enough to warrant the hassle of replacement by reciprocal-multiplication.
winden Member	#7 - Posted: 12 May 2007 09:53 Reply Quote hmmm... maybe you recall from maths studying that multiplying and dividing are "commutative"... this means you can reorder the operations and the result will be the same. So we use that for our advantage... (x / 20) = (x * (1)) / 20 = (x * (65536 / 65536)) / 20 = (x * (65536 / 20)) / 65536 = (x * (65536 / 20)) >> 16 where the ">>" is of course a right arithmetic shift, and (65536 / 20) can be precalced or even typed as is on the code (works in devpac for sure): muls.l #(65536/20),d0 you could also use 256 as the base: (x * (256 / 20)) >> 8 the only problem with fixed point is to run out of bits... in the original example you X value should be <32767 else the result will be broken... btw, google a bit for "fixed point arithmetic" and try to play a bit with it via singlespepping various simple formulas and watching the values change... this is one of the parts of lowpower-device assembler programming that is really enjoyable to learn and master :)
Blueberry Member	#8 - Posted: 13 May 2007 16:09 Reply Quote There are also exact ways of transforming division by an integer constant into multiplication. This paper describes some general methods: http://swox.com/~tege/divcnst-pldi94.pdf Not exactly simple, though...
z5_ Member	#9 - Posted: 14 May 2007 20:20 Reply Quote Is something to be gained from using immediate and address specific commando's cmpi, addi, subi, adda,... instead of just cmp, add,...?
doom Member	#10 - Posted: 14 May 2007 20:39 Reply Quote No. The assembler automatically translates for you. add <ea>,An will always translate to adda <ea>,An etc. It's worth noting the differences in how the instructions behave, though, e.g. add.w d0,a0 doesn't behave like add.w d0,d1. And there's no adda.b instruction. And so on.
z5_ Member	#11 - Posted: 29 May 2007 21:20 Reply Quote I was looking at some code from Winden/Network (the amycoders 3d starfield ) and noticed the following: - is there any advantage in "eor.l d5,d5" instead of "clr.l d5" or "moveq #0,d5"? - is it generally faster to do stuff between dataregisters (with the constant being loaded into one of the dataregisters) instead of constant,dataregister? Maybe useful in innerloops if you've got dataregisters free. I know this is the case with variables, but does the same apply with constants? for example: move.l #$0000ffff,d5 loop and.l d5,d6 do_loop instead of loop and.l #$0000ffff,d6 do_loop - i have the impression that commands like or, and, not,... are faster than any other command? for example: "move.l #$0000ffff,d5" "eor.l d5,d5 not.w d5" Any difference in speed?
Kalms Member	#12 - Posted: 30 May 2007 00:37 - Edited Reply Quote - is there any advantage in "eor.l d5,d5" instead of "clr.l d5" or "moveq #0,d5"? 68000: moveq takes 4 cycles, clr.l takes 6 cycles, eor.l takes 8 cycles. 68020-68030: they're all ~equally fast. 68060: moveq and clr.l can avoid some pipeline dependencies, which eor.l do not. All three instructions are 2 bytes in size. I suggest that you use moveq for clearing data registers, as that is what is traditionally used. - is it generally faster to do stuff between dataregisters (with the constant being loaded into one of the dataregisters) instead of constant,dataregister? Maybe useful in innerloops if you've got dataregisters free. I know this is the case with variables, but does the same apply with constants? If you preload the value into a data register, the instruction gets smaller. This helps on some of the 68k's. On the other hand, it will cost you an extra register... which you might have needed for something else... so sometimes it is not worth it. - i have the impression that commands like or, and, not,... are faster than any other command? for example: "move.l #$0000ffff,d5" "eor.l d5,d5 not.w d5" Any difference in speed? The move.l takes 6 bytes; the eor/not combination takes 4 bytes. This might have been done in order to keep a routine small enough. Or perhaps winden enjoyed obfuscating his code the night he wrote that piece. ;) On 68020+, there is a smallest time unit. 68020-030: 2 cycles 68040: 1 cycle 68060: 1 cycle, pairable Most "simple" operations (move clr add sub and or eor not cmp, and more) between a pair of registers take just this minimum amount of time. If you write your code such that you only use these minimum-time-unit instructions inside your loops, optimizing the calculations becomes easy: fewer instructions -> faster code. (Optimizing the memory accesses is the other big thing.) But, to get back to your example. On 68060, the move.l #imm,dn also takes one minimum-time-unit, so if the code is small enough to fit into the instruction cache, the move.l #imm,dn would be 1 minimum-time-unit quicker.
z5_ Member	#13 - Posted: 30 May 2007 00:48 Reply Quote The move.l takes 6 bytes; the eor/not combination takes 4 bytes. This might have been done in order to keep a routine small enough. Or perhaps winden enjoyed obfuscating his code the night he wrote that piece. ;) That's the downside of the amycoder examples, especially the compos. They are all doing so much tricks to get the code size small (especially in the shortest entry compo). For newbies, it's not always easy to follow :o)

Page: «« 1 2

A.D.A. Amiga Demoscene Archive, Version 3.0