A.D.A. Amiga Demoscene Archive

        Welcome guest!

  

  

  

log in with SceneID

  

Demos Amiga Demoscene Archive Forum / Coding / Optimizing + good coding habits
 Page:  1  2  »» 
Author Message
z5_
Member
#1 - Posted: 25 Apr 2007 19:01 - Edited
Reply Quote
ef3:
tst.b wipe2_timer
beq .wipe2_do
sub.b #1, wipe2_timer
rts

.wipe2_do
move.l wipe2_part,a0
moveq #0,d1
move.b (a0),d1
blt .wipe2_done
lea screen,a0
cmp.l #4,d1
blo.s .upper_half
add.l # 100*320,a0
sub.b #4,d1
.upper_half
mulu.l #80,d1
add.l d1,a0
moveq #0,d0
move.w #100-1,d6
.wipe 2_vert
move.w #2-1,d7
.wipe2_horiz
move.l d0,(a0 )+
move.l d0,(a0)+
move.l d0,(a0)+
move.l d0,(a 0)+
move.l d0,(a0)+
move.l d0,(a0)+
move.l d0,( a0)+
move.l d0,(a0)+
move.l d0,(a0)+
move.l d0, (a0)+
dbra d7,.wipe2_horiz
add.l #240,a0
dbra d 6,.wipe2_vert
move.b #20,wipe2_timer
add.l #1,wi pe2_part
.wipe2_done
rts

wip e2_timer
dc.b 0
even
wipe2_part
dc.l wipe2_scri pt
wipe2_script
dc.b 0,5,2,7,4,1,6,3,-1
even
z5_
Member
#2 - Posted: 25 Apr 2007 19:09 - Edited
Reply Quote
ok, what you see above is my attempt at a little "wipe effect" i saw in Supergroove. Basically, it wipes a 320*200 chunky screen in 8 rectangles to black (just clearing 8 rectangles of 80*100 after each other). The order how the rectangles are cleared is defined in the script.

It's a miracle but it actually works. However, experienced coders will probably have a heart attack from seeing my code. So my question: any good tips/tricks/optimizing on how to do it properly + good coding habits... In short, how would you do such an effect?

This is just a bit of fun because of a genuine interest in coding (so i will never "code" something for real... i suck too much at it).
noname
Member
#3 - Posted: 25 Apr 2007 20:36
Reply Quote
come on z5, give us an exe! :)

your code looks ok, but:
- think about replacing the rept of move.l d0,(a0)+ with a movem.l construct
- if you want, you can optimize the mulu.l #80,d1
z5_
Member
#4 - Posted: 25 Apr 2007 21:59 - Edited
Reply Quote
ef3:
	tst.b	wipe2_timer
	beq	.wipe2_do
	sub.b	#1,  wipe2_timer
	rts
.wipe2_do
	move.l	wipe2_part,a0
	 moveq	#0,d1
	move.b	(a0),d1
	blt	.wipe2_done
	lea	 screen,a0
	cmp.l	#4,d1
	blo.s	.upper_half
	add.l	# 100*320,a0
	sub.b	#4,d1
.upper_half
	mulu.l	#80,d1 
	add.l	d1,a0
	moveq	#0,d0
	move.w	#100-1,d6
.wipe  2_vert
	move.w	#2-1,d7
.wipe2_horiz
	move.l	d0,(a0 )+
	move.l	d0,(a0)+
	move.l	d0,(a0)+
	move.l	d0,(a 0)+
	move.l	d0,(a0)+
	move.l	d0,(a0)+
	move.l	d0,( a0)+
	move.l	d0,(a0)+
	move.l	d0,(a0)+
	move.l	d0, (a0)+
	dbra	d7,.wipe2_horiz
	add.l	#240,a0
	dbra	d 6,.wipe2_vert
	move.b	#20,wipe2_timer
	add.l	#1,wi pe2_part
.wipe2_done
	rts

wip e2_timer
	dc.b	0
	even
wipe2_part
	dc.l	wipe2_scri pt
wipe2_script
	dc.b	0,5,2,7,4,1,6,3,-1
	even
z5_
Member
#5 - Posted: 25 Apr 2007 22:17
Reply Quote
That "pre-formatted text" thingie i just added works well on my localhost and on the minibb forums, but for some strange reason, it messes up here... adding random spaces where there shouldn't be any. I'll have a look at it asap.

@noname: thanks for the tips. I'll try them. About the exe, i have no effects except for 2 wipes. Wouldn't make an interesting exe :o) (btw. wos is great!)
doom
Member
#6 - Posted: 25 Apr 2007 22:58
Reply Quote
Ok. Just browsing it quickly:

moveq #0,d1
move.b (a0),d1
blt .wipe2_done
lea screen,a0
cmp.l #4,d1

- d1 is never larger than 255 here. cmp.b will do.

add.l # 100*320,a0

- This compiles to the adda instruction which always extends your source operand to a longword (for free). IOW, use add.w (= adda.w) if your operand fits in a signed word, as it does here.

mulu.l #80,d1

- You're sure d1 fits in an unsigned word, so use mulu.w here. 80 = 64+16 so the multiplication can be done with two shifts and an addition. It's not faster than a mulu.w, though. Except maybe on the 030.

move.w #2-1,d7

- moveq please :)

move.l d0,(a0 )+
move.l d0,(a0)+
move.l d0,(a0)+
move.l d0,(a 0)+
move.l d0,(a0)+
move.l d0,(a0)+
move.l d0,( a0)+
move.l d0,(a0)+
move.l d0,(a0)+
move.l d0, (a0)+

- This should get you the "no operand space allowed" error. ;) Unrolling is fine, though I think movem could help a little. movem to memory doesn't allow post-increment mode though so you'd have to write backwards.

dbra d7,.wipe2_horiz

- I'd normally do subq.w + bcc.b in this case. Same size but faster.

add.l #240,a0

- When you have lots of free registers, avoid immediates. Do move.w #240,d2 before the loop and then add.w d2,a0 inside the loop. Less instruction fetching.

That won't speed up your code noticably, but it's generally good advice. I didn't consider what your code actually does. :)
winden
Member
#7 - Posted: 26 Apr 2007 07:38
Reply Quote
my notes...

on 060 a mul is 2cycles while most others are 1cycle each if you keep working on the same registers in following operations.

keep on doing move -> moveq, having constants in registers and shortening the data lengths whenever possible... it will help towards "automating" your optimisation technique over time.

free trick: when adding to an address register a value <32767 value, use lea x(An),An, it's fast not only on 060 but also helps on lower machines
z5_
Member
#8 - Posted: 26 Apr 2007 12:46 - Edited
Reply Quote
What is faster:
moveq #0,d0
or
clr.l d0?

What is the best way to clear d0->d7:
moveq #0,d0
moveq #0,d1
...

or
moveq #0,d0
move.l d0,d1
...

About replacing multiple move.l d0,(a0)+ to movem.l, is this correct:
moveq #0,d0
move.l d0,d1
move.l d0,d2
move.l d0,d3
...
move.l d0,d5

loop
movem.l d0-d5,-(a0)
loop_end

That way, i would clear 24 pixels (instead of 40 with the move.l method) and thus i need more loops. So what is faster?
doom
Member
#9 - Posted: 26 Apr 2007 13:21
Reply Quote
moveq #0 is faster than clr I think. Something to do with the pipelines ;). To clear several registers I'd use moveq as well. Doesn't matter much since you'd rarely do that in an innerloop.

Your use of movem is correct. To clear 40 bytes in your loop you'd do this:

movem.l d0-d4,-(a0)
movem.l d0-d4,-(a0)

Or:

move.l d0,a2
move.l d0,a3
move.l d0,a4
move.l d0,a5

.
.

movem.l d0-d5/a2-a5,-(a0)
z5_
Member
#10 - Posted: 26 Apr 2007 20:44
Reply Quote
ef3:
	tst.b	wipe2_timer
	beq	.wipe2_do
	sub.b	#1, wipe2_timer
	rts
.wipe2_do
	move.l	wipe2_part,a0
	 moveq	#0,d1
	move.b	(a0),d1
	blt	.wipe2_done
	lea	 screen+80,a0
	cmp.b	#4,d1
	blo.s	.upper_half
	lea	 100*320(a0),a0
	sub.b	#4,d1
.upper_half
	mulu.w	#8 0,d1
	add.w	d1,a0
	moveq	#0,d0
	move.l	d0,d1
	move .l	d0,d2
	move.l	d0,d3
	move.l	d0,d4
	move.w	#100- 1,d7
.wipe2_vert
	movem.l	d0-d4,-(a0)	;clear 20 bytes
	movem.l	d0-d4,-(a0)
	movem.l	d0-d4,-(a0)
	movem.l	d0-d4,-(a0)
	lea	400(a0),a0
	dbra	d7,.wipe2_vert
	move.b	#20,wipe2_timer
	add.l	#1,wipe2_part
.wipe2_done
	rts

wipe2_part
	dc.l	wipe2_script
wipe2_script
	dc.b	0,5,2,7,4,1,6,3,-1
wipe2_timer
	dc.b	0
	even
z5_
Member
#11 - Posted: 26 Apr 2007 20:46 - Edited
Reply Quote
Taking tips into account, above is the new version of the same routine. Is this a somewhat presentable routine that could actually deserve to be in a demo? (don't mind the random spaces, it's the preformatted text thingie not working properly yet)
z5_
Member
#12 - Posted: 26 Apr 2007 20:50
Reply Quote
Pity that there isn't a movem with postincrement... complicates things abit and not really intuitive to start at the end and go back.

One question: why can't i do add.b #1,wipe2_part, seeing that the number i'm adding is smaller than a byte?
doom
Member
#13 - Posted: 26 Apr 2007 23:20
Reply Quote
A little more quick browsing:

beq .wipe2_do

- The target address is in 8-bit range, so use beq.b. If you're not sure about the offset, use .b anyway and Asm-Pro will change it to a .w if necessary.

sub.b #1, wipe2_timer

- subq would work here as well.

move.l wipe2_part,a0

- Usually a good idea to keep local variables like this close to your routine (like right before the routine itself). Then address it with wipe2_part(pc)

lea 100*320(a0),a0

- I'm pretty sure add.w #100*320,a0 is better.

move.w #100- 1,d7

- The range of moveq is -128..127

dbra d7,.wipe2_vert

- Try subq.w #1,d7; bcc.b .wipe2_vert

The reason movem to memory only has predecrement mode and movem from memory only has the postincrement mode is that it's (mostly?) intended used for pushing registers on the stack.

The reason you can't do add.b #1,wipe2_part is that wipe2_part is a longword pointer. The .b applies to both source and destination so only the first (most significant) byte of wipe2_part is affected, which has the effect of adding 1<<24 to the pointer.

You could do add.b #1,wipe2_part+3 to access the least significant byte, but then your addition still affects only that byte (overflow isn't carried into the next byte). addq.l #1,wipe2_part is what you want.
z5_
Member
#14 - Posted: 27 Apr 2007 12:41
Reply Quote
Is there a diffence between:
moveq.w #4,d0
moveq.l #4,d0
z5_
Member
#15 - Posted: 27 Apr 2007 12:59
Reply Quote
- The target address is in 8-bit range, so use beq.b. If you're not sure about the offset, use .b anyway and Asm-Pro will change it to a .w if necessary.

Is that the .s in Asm-One. Is if faster to use a .s for a branch? And is a .s for a branch where the destination code is located near the branch instruction.

Usually a good idea to keep local variables like this close to your routine (like right before the routine itself). Then address it with wipe2_part(pc)

In the example, my variables are just beneath my code (see source). Is it better to use var(pc) for every variable every time in that case? What is the benefit?
doom
Member
#16 - Posted: 27 Apr 2007 13:51
Reply Quote
.s and .b are the same. The "s" is supposedly for "short", which is a little vague, so I think .b is nicer. And yes, .b/.s is faster than the default (.w). Not 100% sure about the precise timing of the instruction fetch vs. the cache etc. though, but bxx.b will compile to a word, whereas bxx/bxx.w becomes a longword. That means the CPU reads less data from memory in order to execute your instruction.

Your variables are close enough to the code for the PC-relative references. The range is a signed word. Your code itself will almost certainly fit completely within 32 kB, so as long as your variables are in the same section, just assume you can use (pc). The assembler will tell you if you're wrong, in which case you might consider reserving an address register to point to a structure holding your variables. The reason is the same:

move.l variable,d0 ; 6 bytes + a relocation at loadtime
move.l variable(pc),d0 ; 4 bytes, no relocation

Most instructions allow a (pc) addressing mode for the source operand, but not all allow it for the destination. Check your 68k reference if in doubt, or just try compiling.

Just optimize where it matters, though. Pure ASM projects can get impossible to manage and/or finish in time if you overdo it.
noname
Member
#17 - Posted: 27 Apr 2007 15:05 - Edited
Reply Quote
Just optimize where it matters, though. Pure
ASM projects can get impossible to manage and/or f
inish in time if you overdo it.



I second that. It is mandatory to know about optimizing, but in the end you can use an optimizing assembler to get the slavery-work (bra.s/w/jmp, pc-relative or not, etc.) done in the places where it doesn't really matter. Spend time thinking about your cpu-intensive parts (loops), though.
z5_
Member
#18 - Posted: 27 Apr 2007 21:40 - Edited
Reply Quote
The hardest part for me so far dealing with assembler is the distinction between .b,.w,.l on the instruction, combined with the various format options of both source and / or destination. I must say: the 68k reference doesn't seem helpful at all in such cases.
noname
Member
#19 - Posted: 28 Apr 2007 09:30
Reply Quote
You could try buying a book on this topic. Obviously you wouldn't get a new one, but maybe the 68000, 68010, 68020 Primer could help.
StingRay
Member
#20 - Posted: 28 Apr 2007 16:01
Reply Quote
Is there a diffence between:
moveq.w #4,d0
moveq.l #4,d0


moveq doesn't have size extension, it always operates on a longword so moveq.w is nonsense.
StingRay
Member
#21 - Posted: 28 Apr 2007 16:03
Reply Quote
lea 100*320(a0),a0

- I'm pretty sure add.w #100*320,a0 is better.


And I am pretty sure there is no difference. :) And lea offs(ax),ax looks cooler anyway. :D
Blueberry
Member
#22 - Posted: 1 May 2007 10:05
Reply Quote
They are equally fast if the contents of A0 are not needed until some time afterwards. However, with the lea, the assignment to A0 occurs earlier in the pipeline, so the result will be available earlier for subsequent instructions. You can use A0 for adressing right after the lea (on the following cycle), whereas if you try that with the add you will get a 1-cycle pipeline stall.

The effect is visible in the other direction as well. If A0 is assigned just before the instruction, the lea will stall, whereas the add will not.
doom
Member
#23 - Posted: 1 May 2007 12:06
Reply Quote
I wish I still had my 68060 reference. The 68k ref. is great but it doesn't have anything on instruction timing.
rload
Member
#24 - Posted: 1 May 2007 13:04
Reply Quote
doom
Member
#25 - Posted: 1 May 2007 15:48
Reply Quote
Oh neat. I made a hint of an effort to find those recently on the Motorola website, but :(. I'll try to look at those links when I come home. Wish they still gave out the free paperbacks though. That was so neat.

And I had the 060 book once. I don't understand what happened to it.
xeron
Member
#26 - Posted: 1 May 2007 17:03
Reply Quote
@doom:

The free paperbacks were great. I have a 68000UM, 68020UM, 68030UM, 68040UM and 68060UM as well as a bunch of PPC docs.
winden
Member
#27 - Posted: 1 May 2007 20:52
Reply Quote
Yeah I could not really believe when hanging out at #amycoders someone said Motorola would send the books for free.
rload
Member
#28 - Posted: 10 May 2007 00:11
Reply Quote
the norwegian customs didn't believe it either.
dalton
Member
#29 - Posted: 10 May 2007 18:59
Reply Quote
they would only send to companies when I had mine ordered...

didn't seem to matter what bussiness though since I had them sent to a real estate firm =)
z5_
Member
#30 - Posted: 11 May 2007 12:02
Reply Quote
How can i avoid a division by a number which is not a power of 2. For example divide by 20?
 Page:  1  2  »» 

  Please log in to comment

  

  

  

 

A.D.A. Amiga Demoscene Archive, Version 3.0