| 
		
| Author | 
Message | 
 
 
	
	d0DgE 
		Member		 | 
	
	
	
		Hi guys! It's this time of the year that I usually start looking again in my collection of primitives with the aim of optimising stuff. This time I thought about my pixel drawing routine. ATM I do this solely with the CPU, no Blitter usage involved and as "wasted years" (the twist-ribbon) showed, it barely ran on the A500. Those were 2x160 pixels to outline the ribbon which then was filled with the Blitter onto the screen. Of course in the Twist-routine I used an in-line modified version of the following code to  avoid unnecessary subroutine branches. This is the 1-Bitplane version....  WORD equ 15 drawPlanarPixel: 	; drawing plane => a0 ...could be a buffer, too 	; X 		=> d0 	; Y 		=> d1 	; SCRAP 	=> d2,d3 	movem.l	d0-d3/a0,-(a7)
  	; manage the Y position 	move.w	d1,d2		; copy y 	lsl.w	#4,d1		; multiply by 40 	lsl.w	#3,d2 	add.w	d1,d1 	add.w	d2,d1 	add.w	d1,a0		; enter y pos first
  	; manage the X position 	move.w	d0,d2		; copy x 	lsr.w	#3,d0		; divide by 8 to get the hardposition 	and.w	#$000f,d2	; mask the 4 lower bits for 0-15 softposition 	btst	#0,d0		; is the hardposition an odd value ? 	beq.s	.even		; nope ...skip the -1 action 	subq.w	#1,d0		; it's odd ...sub 1 to keep even steps (68000!) .even: 	sub.w	#WORD,d2 	neg.w	d2 	add.w	d0,a0		; move to X hardposition 	move.w	(a0),d3		; take current screendata word aligned 	bset	d2,d3		; set the softposition pixel 	move.w	d3,(a0)		; put back the modified word
  	movem.l	(a7)+,d0-d3/a0 	rts
 
 Please note, that it is done for convenience so that I just provide decimal X,Y coordinates and a buffer to write to and fire the thing to get my pixel. I'd very much would appreciate any speed-up/optimising tips on this one. I'm not exactly fluent in all the available instructions - especially regarding bitfields and such stuff -  so there must be ways to do this operation more elegantly and efficiantly. Is it a good idea to let the Blitter do this work ? If yes, is there a guide or an example to peek into regarding pixel plotting with the Blitter ? So far the Line-Drawing Mode gave me a headache -_- Thx in advance		  
	 | 
	 
 
	
	ZEROblue 
		Member		 | 
	
	
	
		By extending your bitplanes to f.ex 64 bytes wide for faster addressing you can do: moveq  #-$80, d2 ror.b  d0, d2 lsr.w  #3, d0 lsl.w  #6, d1 add.w  d0, d1 or.b   d2, (a0, d1.w)
   		 
	 | 
	 
 
	
	dalton 
		Member		 | 
	
	
	 
		If you're referring to screens 11-13 here at ADA, I'd suggest not plotting any pixels at all. Put any static gfx in odd bitplanes, and then put a triangle in the even. The triangle should be 1 pixel wide on row 1, and extend it's size by one pixel on each side downwards. Then you create a copper list that writes to even bitplane modulo on each scanline. Then simply write the modulo that corresponds to a certain row in the triangle to draw a horisontal line of desired width. Colors can of course also be set using the copper list.		 
	 | 
	 
 
	
	d0DgE 
		Member		 | 
	
	
	 
		No, dalton, the ribbon was just an example given on what I used the pixel routine. The colours were set using the copper ;) - it was a 1 bpl effect.
  There are of course a lot more occations you can use a fast pixel routine in.
  ZeroBlue:
  Interesting proposition. I'll try some of this. Thanks :)		 
	 | 
	 
 
	
	dalton 
		Member		 | 
	
	
	
		I suggest something like this for a one-bitplane plot. In principle it's the same as the one you did, only it uses more shortcuts for setting bits and addressing...  ; d0/d1 = x/y ; a0 = bitplane pointer
  asl.w  #6, d1         ; assuming bitplane is 64 bytes wide lea    (a0,d1.w), a0
  move.w d0, d1     ; copy x moveq  #%10000000, d2     ; this is the pixel =) and.w  #7, d0    ; mask out bit position  asr.w  #3, d1     ; get byte offset lsr.b  d0, d2     ; shift pixel on position or.b   d2, (a0,d1.w) ; put in place
 
 There is good tutorial here:  http://www.modermodemet.se/dalton/tut/DOTS.TXT(it's in swedish, but code is still code I guess)		  
	 | 
	 
 
	
	Vektor 
		Member		 | 
	
	
	 
		Another option. Who calculates the cycles per pixel?
  ; a0 = table with pixels to be plotted ; a1 = pre calculated screen multiply table ; a2 = pre calculated x division table ; a3 = screenpointer
  lea Position_table(pc),a0  lea Shift_Table_x(pc),a1 lea Mulu_Table_y(pc),a3 lea screen(pc),a2 moveq #0,d3  moveq #num_of_pixels-1,d7
  .loop
    move.w (a0)+,d0   move.w (a0)+,d1
    add.w  d1,d1	;y=y*2 to get an index   move.w (a2,d1.w),d1	;screen multiplication tables with multiply value
    move.b (a1,d0.w),d3	;add x-word    add.w d3,d1 	;add x word position
    not.w d0 		;shift bit    bset d0,(a3,d1.W) 	;plot the pixel
  dbf d7,.loop  rts		 
	 | 
	 
 
	
	Vektor 
		Member		 | 
	
	
	 
		Found a typing error:
  bset d0,(a3,d1.W) should be  bset d0,(a2,d1.W)		 
	 | 
	 
 
	
	d0DgE 
		Member		 | 
	
	
	 
		Nice hints Dalton, ZeroBlue. I've got it implemented and adapted. Also thanks Vektor but I need a rather flexible on-the-fly multi-purpose plotting routy. Maybe because I've become a custom to higher language methods like drawCircle();  ;)
  Edit:
  ... the Y multiplication table trick from Vektor is really neat :D		 
	 | 
	 
 
	
	dalton 
		Member		 | 
	
	
	 
		I see now that I posted basically the same routine as ZeroBlue, only his was better =) Should read more carefully I guess...		 
	 | 
	 
 
	
	coyote 
		Member		 | 
	
	
	 
		I'm sure you guys noticed that dalton & ZeroBlue wrote routines that won't work on 68000 because of odd address accesses. (probably doesn't matter anyway...)		 
	 | 
	 
 
	
	britelite 
		Member		 | 
	
	
	 
		Umm, I can't see any reading or writing .w or .l at odd addresses...		 
	 | 
	 
 
	
	coyote 
		Member		 | 
	
	
	 
		Yeah britelite. You are right. Sorry, I must have still been sleeping... Mea culpa.  O:-} My apologies to dalton and ZEROBlue.		 
	 | 
	 
 
	
	Vektor 
		Member		 | 
	
	
	 
		@d0DgE: Correct this routine plots a "simple" predefined array now but it can be used icw eg Bresenham to create a quite fast drawcircle routine. If you're interested I must have it somewhere in my old amiga sourcecodes.		 
	 | 
	 
 
	
	d0DgE 
		Member		 | 
	
	
	 
		Britez0r is quite right. ZeroBlue & Dalton's approach workes fine on the 68000. The only downside in the long run is the "or" itself, which makes it useful for  separate bitplane actions only.
  @Vektor: of course I'm interested. Send it to "dodge[ät]rowdyclub[döt].de" whenever you like :)		 
	 | 
	 
 
	
	Vektor 
		Member		 | 
	
	
	 
		Found it, this was the main plotting algo. I justed checked the entire source code with UAE, with A500 speed it runs in about 1/5 of a frame with a 260 pix wide circle.
  @doDgE: I will email you the entire sourcecode
  * a0 = x * a1 = y * a2 = screen * d0 = radius
  Draw_circle: 	moveq	#3,d5 	moveq	#6,d6
  	moveq	#0,d1			;x=0 	move.w	d0,d2
  	subq.w	#1,d2			;d=r-1 .loop: 	tst.w	d2 	bpl.b	.no_ydec
  	subq.w	#1,d0			;y=y-1
  	add.w	d0,d2			;d=d+y
  .no_ydec: 	move.w	a1,d3			;y 	sub.w	d0,d3			;y-r
  	lsl.w	d6,d3			;(y-r)*schermbreedte 	lea	(a2,d3.w),a3		;screen pointer + y-offset
  	move.w	a0,d3			;x 	add.w	d1,d3			;x+int x
  	move.w	d3,d4 	lsr.w	d5,d3 	not.w	d4 	bset	d4,(a3,d3.w)
  	move.w	a0,d3 	sub.w	d1,d3
  	move.w	d3,d4 	lsr.w	d5,d3 	not.w	d4 	bset	d4,(a3,d3.w)
  	move.w	a1,d3 	sub.w	d1,d3 		 	lsl.w	d6,d3 	lea	(a2,d3.w),a3
  	move.w	a0,d3 	add.w	d0,d3
  	move.w	d3,d4 	lsr.w	d5,d3 	not.w	d4 	bset	d4,(a3,d3.w)
  	move.w	a0,d3 	sub.w	d0,d3
  	move.w	d3,d4 	lsr.w	d5,d3 	not.w	d4 	bset	d4,(a3,d3.w) 		 	move.w	a1,d3 	add.w	d1,d3
  	lsl.w	d6,d3 	lea	(a2,d3.w),a3
  	move.w	a0,d3 	add.w	d0,d3
  	move.w	d3,d4 	lsr.w	d5,d3 	not.w	d4 	bset	d4,(a3,d3.w)
  	move.w	a0,d3 	sub.w	d0,d3
  	move.w	d3,d4 	lsr.w	d5,d3 	not.w	d4 	bset	d4,(a3,d3.w)
  	move.w	a1,d3 	add.w	d0,d3
  	lsl.w	d6,d3 	lea	(a2,d3.w),a3
  	move.w	a0,d3 	add.w	d1,d3
  	move.w	d3,d4 	lsr.w	d5,d3 	not.w	d4 	bset	d4,(a3,d3.w) 	 	move.w	a0,d3 	sub.w	d1,d3
  	move.w	d3,d4 	lsr.w	d5,d3 	not.w	d4 	bset	d4,(a3,d3.w) 	 	sub.w	d1,d2
  	subq.w	#2,d2 	addq.w	#1,d1
  	cmp.w	d0,d1 	bls.w	.loop 	rts		 
	 | 
	 
 
	
	z5_ 
		Member		 | 
	
	
	 
		go go go, dodge! :)
  @Vektor: any interest in rejoining the amigascene and code some stuff again? would be cool!		 
	 | 
	 
 
	
	d0DgE 
		Member		 | 
	
	
	 
		...by now it finally occured to me that one can create a quite convenient MACRO for this pixel plotting code ... D'OH
  well, you'll stop learning		 
	 | 
	 
 
	
	Vektor 
		Member		 | 
	
	
	 
		@z5_, If you have interesting idea's I'm always open to code / review some things but don't expect too much!		 
	 | 
	 
 
	
	Azure 
		Member		 | 
	
	
	 
		It has been a long time since I did this, but this looks awefully wasteful to me.
  Is this routine supposed to be optimized for 68000 or 68060? I dug around my old backups and found a 3d dotrotator I coded once. I don't think I have ever used it anywhere. It uses a similar approach as the one Mr. Pet did in roots, but may be slightly more optimized. 
  The innerloop performs 3D rotation, transformation into the screen space (perspectve) and pixel plotting.
 
  .bigloop
  	REPT	2 	move.l	(a3)+,d3 	move.l	(a3)+,d2 	         move.l  (a0,d0.w*4),d3  ;a0-a2 precalculated tables with         add.l   (a1,d2.w*4),d3  ;M-entries. 512 longwords each         add.l   (a2,d5.w*4),d3                                ;d0=00000000SyyyyyyySzzzzzzzSxxxxxxx         move.l  (a4,d3.w*4),d1  ;Perspective for x (SzzzzzzzSxxxxxxx)         bfset   (a6){d4:1}      ;setpixel (a6=planepointer)         lsr.l   #8,d3		;12 free cycles...         swap	d2         swap	d0         add.l   (a5,d3.w*4),d1  ;Perspective for y (SyyyyyyySzzzzzzz)                                 ;d1=Dotadress (pixnr)+planeoffset for                                 ;colors                                 ;d1 highword=0 	 	move.l	(a3)+,d5         move.l  (a0,d3.w*4),d3  ;a0-a2 precalculated tables with         add.l   (a1,d5.w*4),d3  ;M-entries. 512 longwords each         add.l   (a2,d2.w*4),d3                                 ;d0=00000000SyyyyyyySzzzzzzzSxxxxxxx         move.l  (a4,d3.w*4),d4  ;Perspective for x (SzzzzzzzSxxxxxxx)         bfset   (a6){d1:1}      ;setpixel (a6=planepointer)         lsr.l   #8,d3         swap	d5         add.l   (a5,d3.w*4),d4  ;Perspective for y (SyyyyyyySzzzzzzz)                                 ;d1=Dotadress (pixnr)+planeoffset for                                 ;colors                                 ;d1 highword=0 	ENDR		 
	 | 
	 
 
	
	Vektor 
		Member		 | 
	
	
	 
		@Azure: my routine is 68000 based. I looked at yours and except for the perspective precalc with the z coordinates in the upper word and the bfset (030+?) the aproach is basically the same, precalc everything, use the coordinates as index (which can be done within the instruction on 020+)  The only thing I don't get are your first (three) longword moves, the third overwrites the first?		 
	 | 
	 
 
	
	Azure 
		Member		 | 
	
	
	 
		...the first move should probably be to D0. I was not able to check whether the sourcecode was functional.
  bfset is very neat, as it allows to avoid separate shifting to calculate the address offset. There is really just a single instrution responsible for the plotting in this routine, the remaining instructions are for 3d calculations.		 
	 | 
	 
 
	
	Rebb 
		Member		 | 
	
	
	
		My version of the pixel plotter. Already got some good tips here (removing the mulu), but as this is my first plotter i guess there's still lot of room for improvement. plot: 	;takes d0=color,d1=x,d2=y,a0=bplane
 
  findy:   	; multiply y with 40 to get add factor for bitplane
  	move.w d2,d3 	lsl.w	#4,d2 	lsl.w   #3,d3 	add.w   d2,d2 	add.w   d3,d2 	add.w	d2,a0 	 checkplane: 	btst.l	#0,d0	; testbit on colorvalue to get planes to plot 	beq	plane2 	jsr	pixset
  plane2: 	lea	bplane,a0 ; bitplane address to a0 	add.l	d2,a0   ; start address for correct line 	add.l	#10240,a0 ; address of plane	 	btst.l  #1,d0	     	beq 	plane3 	jsr	pixset 					 plane3: 	lea	bplane,a0 	add.l	d2,a0   ; start address for correct line 	add.l	#20480,a0	 	btst.l	#2,d0 	beq	plane4 	jsr	pixset
  plane4:
  	lea	bplane,a0 	add.l	d2,a0  	add.l	#30720,a0	     btst.l  #3,d0 	beq     plane5 	jsr	pixset 	
 
  plane5:
  	lea	bplane,a0 	add.l	d2,a0   ; start address for correct line 	add.l	#40960,a0	 	btst.l	#4,d0 	beq 	out 	jsr	pixset
 
  out: 	rts
  pixset: 	move.l	d1,d4		; copy x to d4 	move.l	d1,d5		; and d5 	move.l	d1,d3		; and d3 	lsr.l	#3,d3		; divide with 8 to get number of byte 	add.l	d3,a0		; get to the byte we are changing
  	asl.l	#3,d3		; How many times did x fit in 8? 	cmp	#0,d3		; If zero, x is directly the bits to set 	beq	nolla		; 	sub.l   d3,d4		; Substract multiply of 8 from original x 	move.l	d4,d5		; to get pixel number nolla:   	 	move.l  #7,d6		; substract 7 from pixel number 	sub	d5,d6		; to get right bit 	bset	d6,(a0)		; set the "d6 th bit" on a0  
 
  	rts edit: What is a good way to "time out" routines like this, when optimising?		  
	 | 
	 
 
	
	pmc 
		Member		 | 
	
	
	
		Rebb:  edit: What is a good way to "time out" routines like this, when optimising? Do you mean: what's a good way to see how long the routine takes to execute? If so, then seeing how many raster lines it takes will give a good indication. To do that, before your routine wait for a screen position and change the background colour. At the end of your routine, change the background colour to what it was before you changed it at the start of your routine. The number of coloured lines you can see is now the number of raster lines your routine took. This code will do that for you: .wt_line:	cmp.b	#160,$dff006 		bne.s	.wt_line 		move.w	#$0fff,$dff180
  		<your routine here>
  		move.w	#$0000,$dff180  		 
	 | 
	 
 
	
	Vektor 
		Member		 | 
	
	
	 
		To time a routine the easiest way is just to write a color change (#0 or #$0fff) to the dff180. You will see how many raster lines your routine takes... (I see now PMC has already answered this one..)
  Maybe this gives some ideas!
 
 
  plot:	;takes d0=color,d1=x,d2=y,a0=bplane
  lea bplane(pc),a0 lea screenpointers(pc),a1 lea y_mulitply(pc),a2 lea x_words(pc),a3
  lea dot_to_plot_table(pc),a5
  moveq #0,d0 moveq #0,d1 moveq #0,d2 moveq #0,d3 moveq #0,d4 moveq #0,d7
  move.w (a5)+,d7 ;number of pixels to be plot
  .loop movem.w (a5)+,d0-d2;
  add.w d0,d0 ;(x2 ) add.w d0,d0 ;(twice x2 makes x4 to make an index) move.l (a1,d0),a0                     ;add the right value to the bplane pointer
  add.w d2,d2 ;y=y*2 to get an index move.w (a2,d2.w),d2 ;screen multiplication tables with multiply value
  move.b (a3,d1.w),d3 ;add x-word  add.w d3,d2 ;add x word position to the y position
  not.w d0 ;shift to ensure the right bit is set  bset d0,(a3,d2.W) ;plot the pixel
  dbf d7,.loop
  rts
  screenpointers:              dc.l bplane              dc.l bplane+10240              dc.l bplane+2*10240              dc.l bplane+3*10240              dc.l bplane+4*10240
  x_words: dc.b 0,0,0,0,0,0,0,0 dc.b 1,1,1,1,1,1,1,1 dc.b 2,2,2,2,2,2,2,2 dc.b 3,3,3,3,3,3,3,3 dc.b 4,4,4,4,4,4,4,4 dc.b etc
  y_multiply: dc.w 0 dc.w screen_width ; in bytes dc.w 1*screen_width ; in bytes dc.w 2*screen_width ; in bytes dc.w 3*screen_width ; in bytes dc.w 4*screen_width ; in bytes dc.w etc
  dot_to_plot_table: dc.w 4-1; number of pixels (minus 1) to be plot dc.w 0,0,200; plane,x,y dc.w 1, 200,200 dc.w 2, 200,0 dc.w 3,0,0
  bplane:  dcb.b 5*10240,0		 
	 | 
	 
 
	
	Kalms 
		Member		 | 
	
	
	 
		Rebb:
  your routine will invoke "pixset" multiple times. Most of the code in "pixset" will give the exact same result every time you invoke it. Thus those calculations can be moved out of the "pixset" routine.
  In order to get some simple metrics, consider these:
  * how many instructions do you execute when plotting a pixel with color 1? * how many instructions do you execute when plotting a pixel with color 31?
  Pick one or several metrics of the kind above, decide which are important to you, and try to improve those metrics.		 
	 | 
	 
 
	
	ZEROblue 
		Member		 | 
	
	
	 
		Make sure you have consistent DMA activity across the lines you are measuring over using the above method, or the result might be a completely wrong indication.
  A high amount of DMA activity (many bitplanes, sprites, audio, blitter running etc.) can halt the CPU severely, and going from 200 to 100 colored lines doesn't necessarily mean your routine is now twice as fast, and so this may be a very inexact method.
  However if you're just looking to see if your routine simply becomes faster or slower it will work fine. Typically you would then find f.ex how many dots you can plot in the context of your demo part and still maintain the same frame rate.		 
	 | 
	 
 
	
	noname 
		Member		 | 
	
	
	 
		I would generally try to avoid the use of a setPixel function by all means. In this respect, Azure's post has leading for me. Also, macros might come in handy to inline frequently used subroutines.		 
	 | 
	 
 
	
	d0DgE 
		Member		 | 
	
	
	
		Exactly. The massive amounts of subroutine branches (bsr setPixel) during a drawCircle for example really slowed down my first circle draw routines. JSR is even slower. So building a tiny MACRO with the very essential lines of setting a bit at an  X | Y position is a really neat thing to implement. @Rebb: By scanning through your code example it occured to me, that you only ask for planes to be drawn into, not those where you might clear a bit in order to set the right colour ( 0 - 31). That could result in less available colours or even distort the complete screen result. As you were showing your >1 Bitplane approach I can still give my version for a 5-Bitplane pixeldraw. Please note: this is still the totally bloated- slow as hell version with no improvements implemented that this nice thread offered. I used this thing to pre-render the pixelplasma-animation shown in Wasted Years' end screen and it is the  very reason I had to build a "werkkzeug" loaderbar :/ This routine is very convenient when it comes to the colour values. You just drop f.e. "17" to _drpColour, give the coordinates and screen and fire the damn thing, but it is in no way "real-time" fit  word	equ	15 _drpPlaneSize:	dc.l	plsize 	cnop	0,4 _drpColours: 	dc.b	0		; 180 : 00 	dc.b	%00000001	; 182 : 01 	dc.b	%00000010	; 184 : 02 	dc.b	%00000011	; 186 : 03 	dc.b	%00000100	; 188 : 04 	dc.b	%00000101	; 18a : 05 	dc.b	%00000110	; 18c : 06 	dc.b	%00000111	; 18e : 07 	dc.b	%00001000	; 190 : 08 	dc.b	%00001001	; 192 : 09 	dc.b	%00001010	; 194 : 10 	dc.b	%00001011	; 196 : 11 	dc.b	%00001100	; 198 : 12 	dc.b	%00001101	; 19a : 13 	dc.b	%00001110	; 19c : 14 	dc.b	%00001111	; 19e : 15 	dc.b	%00010000	; 1a0 : 16 	dc.b	%00010001	; 1a2 : 17 	dc.b	%00010010	; 1a4 : 18 	dc.b	%00010011	; 1a6 : 19 	dc.b	%00010100	; 1a8 : 20 	dc.b	%00010101	; 1aa : 21 	dc.b	%00010110	; 1ac : 22 	dc.b	%00010111	; 1ae : 23 	dc.b	%00011000	; 1b0 : 24 	dc.b	%00011001	; 1b2 : 25 	dc.b	%00011010	; 1b4 : 26 	dc.b	%00011011	; 1b6 : 27 	dc.b	%00011100	; 1b8 : 28 	dc.b	%00011101	; 1ba : 29 	dc.b	%00011110	; 1bc : 30 	dc.b	%00011111	; 1be : 31 _drpColour: 	dc.b	0 	cnop	0,4 drawRealPixel: 	; drawing plane => a0 	; X 		=> d0 	; Y 		=> d1 	; colour offset => d3 	; SCRAP 	=> d2-d5 	 	movem.l	d0-d7/a0/a1,-(a7) 	lea	_drpColours(pc),a1 	move.l	_drpPlaneSize(pc),d6
  	moveq	#0,d4 	tst.b	d3		; is there a colour? 	bne.s	.ok 	movem.l	(a7)+,d0-d7/a0/a1 	rts .ok: 	move.b	(a1,d3),d4	; colour byte 	moveq	#0,d5		; first bit in colour byte 	; move to position 	move.w	d1,d2		; copy y 	lsl.w	#4,d1		; multiply by 40 	lsl.w	#3,d2 	add.w	d1,d1 	add.w	d2,d1 	add.w	d1,a0		; y pos first 	; manage the X position 	move.w	d0,d2		; copy x 	lsr.w	#3,d0		; divide by 8 to get the hardposition 	and.w	#$000f,d2	; mask the 4 lower bits for 0-15 softposition 	btst	#0,d0		; is the hardposition an odd value ? 	beq.s	.even		; nope ...skip the -1 action 	subq.w	#1,d0		; it's odd ...sub 1 to keep even steps (A500!) .even: 	sub.w	#word,d2 	neg.w	d2 	add.w	d0,a0		; move to hardposition
  	moveq	#planes-1,d7 .drawlp: 	move.w	(a0),d3		; take current screendata word sized 	btst	d5,d4		; bit 0 or 1 (i.e. clear or draw) 	beq.s	.clear 	bset	d2,d3		; colour bit is lit -> set it lit in the data 	bra.s	.cont .clear: 	bclr	d2,d3		; colour bit is clr -> clear it in the data .cont: 	move.w	d3,(a0)		; write back the modified data 	addq.b	#1,d5		; prepare next colour bit to test 	add.l	d6,a0		; jump to next plane 	dbf	d7,.drawlp .end: 	movem.l	(a7)+,d0-d7/a0/a1 	rts
 
 
  		 
	 | 
	 
 
	
	sp_ 
		Member		 | 
	
	
	 
		Azure's example replaces a matrix multiplication and perspective transformation with a set of small lookuptables.
  9 multiplications and 2 divisions per pixel removed with Dynamic programming.
  In CodeTherory Matrix approximations have proven not to give the optimal codes. Dynamic programming might...
  There is a faster way to solve this problem on the a500. If I ever finnish my a500 demo I will show you. ;)		 
	 | 
	 
 
	
	Azure 
		Member		 | 
	
	
	 
		sp:
  On A500 you can simply hardcore all offsets to the lookup tables and completely unroll the loop. Graham has done something like this on C64 long long ago...		 
	 | 
	 
 
	
	
	
	
			
		 | 
		 |