A.D.A. Amiga Demoscene Archive

Amiga Demoscene Archive Forum / Coding / Optimizing + good coding habits

Page: 1 2 »»

Author	Message
z5_ Member	#1 - Posted: 25 Apr 2007 19:01 - Edited Reply Quote ef3: tst.b wipe2_timer beq .wipe2_do sub.b #1, wipe2_timer rts .wipe2_do move.l wipe2_part,a0 moveq #0,d1 move.b (a0),d1 blt .wipe2_done lea screen,a0 cmp.l #4,d1 blo.s .upper_half add.l # 100*320,a0 sub.b #4,d1 .upper_half mulu.l #80,d1 add.l d1,a0 moveq #0,d0 move.w #100-1,d6 .wipe 2_vert move.w #2-1,d7 .wipe2_horiz move.l d0,(a0 )+ move.l d0,(a0)+ move.l d0,(a0)+ move.l d0,(a 0)+ move.l d0,(a0)+ move.l d0,(a0)+ move.l d0,( a0)+ move.l d0,(a0)+ move.l d0,(a0)+ move.l d0, (a0)+ dbra d7,.wipe2_horiz add.l #240,a0 dbra d 6,.wipe2_vert move.b #20,wipe2_timer add.l #1,wi pe2_part .wipe2_done rts wip e2_timer dc.b 0 even wipe2_part dc.l wipe2_scri pt wipe2_script dc.b 0,5,2,7,4,1,6,3,-1 even
z5_ Member	#2 - Posted: 25 Apr 2007 19:09 - Edited Reply Quote ok, what you see above is my attempt at a little "wipe effect" i saw in Supergroove. Basically, it wipes a 320200 chunky screen in 8 rectangles to black (just clearing 8 rectangles of 80100 after each other). The order how the rectangles are cleared is defined in the script. It's a miracle but it actually works. However, experienced coders will probably have a heart attack from seeing my code. So my question: any good tips/tricks/optimizing on how to do it properly + good coding habits... In short, how would you do such an effect? This is just a bit of fun because of a genuine interest in coding (so i will never "code" something for real... i suck too much at it).
noname Member	#3 - Posted: 25 Apr 2007 20:36 Reply Quote come on z5, give us an exe! :) your code looks ok, but: - think about replacing the rept of move.l d0,(a0)+ with a movem.l construct - if you want, you can optimize the mulu.l #80,d1
z5_ Member	#4 - Posted: 25 Apr 2007 21:59 - Edited Reply Quote ef3: tst.b wipe2_timer beq .wipe2_do sub.b #1, wipe2_timer rts .wipe2_do move.l wipe2_part,a0 moveq #0,d1 move.b (a0),d1 blt .wipe2_done lea screen,a0 cmp.l #4,d1 blo.s .upper_half add.l # 100*320,a0 sub.b #4,d1 .upper_half mulu.l #80,d1 add.l d1,a0 moveq #0,d0 move.w #100-1,d6 .wipe 2_vert move.w #2-1,d7 .wipe2_horiz move.l d0,(a0 )+ move.l d0,(a0)+ move.l d0,(a0)+ move.l d0,(a 0)+ move.l d0,(a0)+ move.l d0,(a0)+ move.l d0,( a0)+ move.l d0,(a0)+ move.l d0,(a0)+ move.l d0, (a0)+ dbra d7,.wipe2_horiz add.l #240,a0 dbra d 6,.wipe2_vert move.b #20,wipe2_timer add.l #1,wi pe2_part .wipe2_done rts wip e2_timer dc.b 0 even wipe2_part dc.l wipe2_scri pt wipe2_script dc.b 0,5,2,7,4,1,6,3,-1 even
z5_ Member	#5 - Posted: 25 Apr 2007 22:17 Reply Quote That "pre-formatted text" thingie i just added works well on my localhost and on the minibb forums, but for some strange reason, it messes up here... adding random spaces where there shouldn't be any. I'll have a look at it asap. @noname: thanks for the tips. I'll try them. About the exe, i have no effects except for 2 wipes. Wouldn't make an interesting exe :o) (btw. wos is great!)
doom Member	#6 - Posted: 25 Apr 2007 22:58 Reply Quote Ok. Just browsing it quickly: moveq #0,d1 move.b (a0),d1 blt .wipe2_done lea screen,a0 cmp.l #4,d1 - d1 is never larger than 255 here. cmp.b will do. add.l # 100*320,a0 - This compiles to the adda instruction which always extends your source operand to a longword (for free). IOW, use add.w (= adda.w) if your operand fits in a signed word, as it does here. mulu.l #80,d1 - You're sure d1 fits in an unsigned word, so use mulu.w here. 80 = 64+16 so the multiplication can be done with two shifts and an addition. It's not faster than a mulu.w, though. Except maybe on the 030. move.w #2-1,d7 - moveq please :) move.l d0,(a0 )+ move.l d0,(a0)+ move.l d0,(a0)+ move.l d0,(a 0)+ move.l d0,(a0)+ move.l d0,(a0)+ move.l d0,( a0)+ move.l d0,(a0)+ move.l d0,(a0)+ move.l d0, (a0)+ - This should get you the "no operand space allowed" error. ;) Unrolling is fine, though I think movem could help a little. movem to memory doesn't allow post-increment mode though so you'd have to write backwards. dbra d7,.wipe2_horiz - I'd normally do subq.w + bcc.b in this case. Same size but faster. add.l #240,a0 - When you have lots of free registers, avoid immediates. Do move.w #240,d2 before the loop and then add.w d2,a0 inside the loop. Less instruction fetching. That won't speed up your code noticably, but it's generally good advice. I didn't consider what your code actually does. :)
winden Member	#7 - Posted: 26 Apr 2007 07:38 Reply Quote my notes... on 060 a mul is 2cycles while most others are 1cycle each if you keep working on the same registers in following operations. keep on doing move -> moveq, having constants in registers and shortening the data lengths whenever possible... it will help towards "automating" your optimisation technique over time. free trick: when adding to an address register a value <32767 value, use lea x(An),An, it's fast not only on 060 but also helps on lower machines
z5_ Member	#8 - Posted: 26 Apr 2007 12:46 - Edited Reply Quote What is faster: moveq #0,d0 or clr.l d0? What is the best way to clear d0->d7: moveq #0,d0 moveq #0,d1 ... or moveq #0,d0 move.l d0,d1 ... About replacing multiple move.l d0,(a0)+ to movem.l, is this correct: moveq #0,d0 move.l d0,d1 move.l d0,d2 move.l d0,d3 ... move.l d0,d5 loop movem.l d0-d5,-(a0) loop_end That way, i would clear 24 pixels (instead of 40 with the move.l method) and thus i need more loops. So what is faster?
doom Member	#9 - Posted: 26 Apr 2007 13:21 Reply Quote moveq #0 is faster than clr I think. Something to do with the pipelines ;). To clear several registers I'd use moveq as well. Doesn't matter much since you'd rarely do that in an innerloop. Your use of movem is correct. To clear 40 bytes in your loop you'd do this: movem.l d0-d4,-(a0) movem.l d0-d4,-(a0) Or: move.l d0,a2 move.l d0,a3 move.l d0,a4 move.l d0,a5 . . movem.l d0-d5/a2-a5,-(a0)
z5_ Member	#10 - Posted: 26 Apr 2007 20:44 Reply Quote ef3: tst.b wipe2_timer beq .wipe2_do sub.b #1, wipe2_timer rts .wipe2_do move.l wipe2_part,a0 moveq #0,d1 move.b (a0),d1 blt .wipe2_done lea screen+80,a0 cmp.b #4,d1 blo.s .upper_half lea 100*320(a0),a0 sub.b #4,d1 .upper_half mulu.w #8 0,d1 add.w d1,a0 moveq #0,d0 move.l d0,d1 move .l d0,d2 move.l d0,d3 move.l d0,d4 move.w #100- 1,d7 .wipe2_vert movem.l d0-d4,-(a0) ;clear 20 bytes movem.l d0-d4,-(a0) movem.l d0-d4,-(a0) movem.l d0-d4,-(a0) lea 400(a0),a0 dbra d7,.wipe2_vert move.b #20,wipe2_timer add.l #1,wipe2_part .wipe2_done rts wipe2_part dc.l wipe2_script wipe2_script dc.b 0,5,2,7,4,1,6,3,-1 wipe2_timer dc.b 0 even
z5_ Member	#11 - Posted: 26 Apr 2007 20:46 - Edited Reply Quote Taking tips into account, above is the new version of the same routine. Is this a somewhat presentable routine that could actually deserve to be in a demo? (don't mind the random spaces, it's the preformatted text thingie not working properly yet)
z5_ Member	#12 - Posted: 26 Apr 2007 20:50 Reply Quote Pity that there isn't a movem with postincrement... complicates things abit and not really intuitive to start at the end and go back. One question: why can't i do add.b #1,wipe2_part, seeing that the number i'm adding is smaller than a byte?
doom Member	#13 - Posted: 26 Apr 2007 23:20 Reply Quote A little more quick browsing: beq .wipe2_do - The target address is in 8-bit range, so use beq.b. If you're not sure about the offset, use .b anyway and Asm-Pro will change it to a .w if necessary. sub.b #1, wipe2_timer - subq would work here as well. move.l wipe2_part,a0 - Usually a good idea to keep local variables like this close to your routine (like right before the routine itself). Then address it with wipe2_part(pc) lea 100320(a0),a0 - I'm pretty sure add.w #100320,a0 is better. move.w #100- 1,d7 - The range of moveq is -128..127 dbra d7,.wipe2_vert - Try subq.w #1,d7; bcc.b .wipe2_vert The reason movem to memory only has predecrement mode and movem from memory only has the postincrement mode is that it's (mostly?) intended used for pushing registers on the stack. The reason you can't do add.b #1,wipe2_part is that wipe2_part is a longword pointer. The .b applies to both source and destination so only the first (most significant) byte of wipe2_part is affected, which has the effect of adding 1<<24 to the pointer. You could do add.b #1,wipe2_part+3 to access the least significant byte, but then your addition still affects only that byte (overflow isn't carried into the next byte). addq.l #1,wipe2_part is what you want.
z5_ Member	#14 - Posted: 27 Apr 2007 12:41 Reply Quote Is there a diffence between: moveq.w #4,d0 moveq.l #4,d0
z5_ Member	#15 - Posted: 27 Apr 2007 12:59 Reply Quote - The target address is in 8-bit range, so use beq.b. If you're not sure about the offset, use .b anyway and Asm-Pro will change it to a .w if necessary. Is that the .s in Asm-One. Is if faster to use a .s for a branch? And is a .s for a branch where the destination code is located near the branch instruction. Usually a good idea to keep local variables like this close to your routine (like right before the routine itself). Then address it with wipe2_part(pc) In the example, my variables are just beneath my code (see source). Is it better to use var(pc) for every variable every time in that case? What is the benefit?
doom Member	#16 - Posted: 27 Apr 2007 13:51 Reply Quote .s and .b are the same. The "s" is supposedly for "short", which is a little vague, so I think .b is nicer. And yes, .b/.s is faster than the default (.w). Not 100% sure about the precise timing of the instruction fetch vs. the cache etc. though, but bxx.b will compile to a word, whereas bxx/bxx.w becomes a longword. That means the CPU reads less data from memory in order to execute your instruction. Your variables are close enough to the code for the PC-relative references. The range is a signed word. Your code itself will almost certainly fit completely within 32 kB, so as long as your variables are in the same section, just assume you can use (pc). The assembler will tell you if you're wrong, in which case you might consider reserving an address register to point to a structure holding your variables. The reason is the same: move.l variable,d0 ; 6 bytes + a relocation at loadtime move.l variable(pc),d0 ; 4 bytes, no relocation Most instructions allow a (pc) addressing mode for the source operand, but not all allow it for the destination. Check your 68k reference if in doubt, or just try compiling. Just optimize where it matters, though. Pure ASM projects can get impossible to manage and/or finish in time if you overdo it.
noname Member	#17 - Posted: 27 Apr 2007 15:05 - Edited Reply Quote Just optimize where it matters, though. Pure ASM projects can get impossible to manage and/or f inish in time if you overdo it. I second that. It is mandatory to know about optimizing, but in the end you can use an optimizing assembler to get the slavery-work (bra.s/w/jmp, pc-relative or not, etc.) done in the places where it doesn't really matter. Spend time thinking about your cpu-intensive parts (loops), though.
z5_ Member	#18 - Posted: 27 Apr 2007 21:40 - Edited Reply Quote The hardest part for me so far dealing with assembler is the distinction between .b,.w,.l on the instruction, combined with the various format options of both source and / or destination. I must say: the 68k reference doesn't seem helpful at all in such cases.
noname Member	#19 - Posted: 28 Apr 2007 09:30 Reply Quote You could try buying a book on this topic. Obviously you wouldn't get a new one, but maybe the 68000, 68010, 68020 Primer could help.
StingRay Member	#20 - Posted: 28 Apr 2007 16:01 Reply Quote Is there a diffence between: moveq.w #4,d0 moveq.l #4,d0 moveq doesn't have size extension, it always operates on a longword so moveq.w is nonsense.
StingRay Member	#21 - Posted: 28 Apr 2007 16:03 Reply Quote lea 100320(a0),a0 - I'm pretty sure add.w #100320,a0 is better. And I am pretty sure there is no difference. :) And lea offs(ax),ax looks cooler anyway. :D
Blueberry Member	#22 - Posted: 1 May 2007 10:05 Reply Quote They are equally fast if the contents of A0 are not needed until some time afterwards. However, with the lea, the assignment to A0 occurs earlier in the pipeline, so the result will be available earlier for subsequent instructions. You can use A0 for adressing right after the lea (on the following cycle), whereas if you try that with the add you will get a 1-cycle pipeline stall. The effect is visible in the other direction as well. If A0 is assigned just before the instruction, the lea will stall, whereas the add will not.
doom Member	#23 - Posted: 1 May 2007 12:06 Reply Quote I wish I still had my 68060 reference. The 68k ref. is great but it doesn't have anything on instruction timing.
rload Member	#24 - Posted: 1 May 2007 13:04 Reply Quote Is this what you are looking for? http://www.freescale.com/webapp/sps/site/prod_summ ary.jsp?code=MC68060&fsrch=1 There are user manuals and addendums some way down the page. http://www.freescale.com/files/32bit/doc/ref_manua l/MC68060UM.pdf http://www.freescale.com/files/32bit/doc/ref_manua l/MC68060UMAD.pdf http://www.freescale.com/files/32bit/doc/ref_manua l/MC68060UMAD2.pdf
doom Member	#25 - Posted: 1 May 2007 15:48 Reply Quote Oh neat. I made a hint of an effort to find those recently on the Motorola website, but :(. I'll try to look at those links when I come home. Wish they still gave out the free paperbacks though. That was so neat. And I had the 060 book once. I don't understand what happened to it.
xeron Member	#26 - Posted: 1 May 2007 17:03 Reply Quote @doom: The free paperbacks were great. I have a 68000UM, 68020UM, 68030UM, 68040UM and 68060UM as well as a bunch of PPC docs.
winden Member	#27 - Posted: 1 May 2007 20:52 Reply Quote Yeah I could not really believe when hanging out at #amycoders someone said Motorola would send the books for free.
rload Member	#28 - Posted: 10 May 2007 00:11 Reply Quote the norwegian customs didn't believe it either.
dalton Member	#29 - Posted: 10 May 2007 18:59 Reply Quote they would only send to companies when I had mine ordered... didn't seem to matter what bussiness though since I had them sent to a real estate firm =)
z5_ Member	#30 - Posted: 11 May 2007 12:02 Reply Quote How can i avoid a division by a number which is not a power of 2. For example divide by 20?

Page: 1 2 »»

A.D.A. Amiga Demoscene Archive, Version 3.0