A.D.A. Amiga Demoscene Archive

        Welcome guest!

  

  

  

log in with SceneID

  

Demos Amiga Demoscene Archive Forum / Coding / Planar Pixel Plotting
 Page:  1  2  »» 
Author Message
d0DgE
Member
#1 - Posted: 5 Dec 2009 21:39
Reply Quote
Hi guys!

It's this time of the year that I usually start looking again in my collection of primitives
with the aim of optimising stuff.
This time I thought about my pixel drawing routine.
ATM I do this solely with the CPU, no Blitter usage involved and as "wasted years" (the twist-ribbon) showed, it barely ran on the A500. Those were 2x160 pixels to outline the ribbon which then was filled with the Blitter onto the screen. Of course in the Twist-routine I used an in-line modified version of the following code to
avoid unnecessary subroutine branches.

This is the 1-Bitplane version....


WORD equ 15
drawPlanarPixel:
; drawing plane => a0 ...could be a buffer, too
; X => d0
; Y => d1
; SCRAP => d2,d3
movem.l d0-d3/a0,-(a7)

; manage the Y position
move.w d1,d2 ; copy y
lsl.w #4,d1 ; multiply by 40
lsl.w #3,d2
add.w d1,d1
add.w d2,d1
add.w d1,a0 ; enter y pos first

; manage the X position
move.w d0,d2 ; copy x
lsr.w #3,d0 ; divide by 8 to get the hardposition
and.w #$000f,d2 ; mask the 4 lower bits for 0-15 softposition
btst #0,d0 ; is the hardposition an odd value ?
beq.s .even ; nope ...skip the -1 action
subq.w #1,d0 ; it's odd ...sub 1 to keep even steps (68000!)
.even:
sub.w #WORD,d2
neg.w d2
add.w d0,a0 ; move to X hardposition
move.w (a0),d3 ; take current screendata word aligned
bset d2,d3 ; set the softposition pixel
move.w d3,(a0) ; put back the modified word

movem.l (a7)+,d0-d3/a0
rts


Please note, that it is done for convenience so that I just provide decimal X,Y coordinates and a buffer to
write to and fire the thing to get my pixel.
I'd very much would appreciate any speed-up/optimising tips on this one.
I'm not exactly fluent in all the available instructions - especially regarding bitfields and such stuff -
so there must be ways to do this operation more elegantly and efficiantly.

Is it a good idea to let the Blitter do this work ?
If yes, is there a guide or an example to peek into regarding pixel plotting with the Blitter ?
So far the Line-Drawing Mode gave me a headache -_-

Thx in advance
ZEROblue
Member
#2 - Posted: 6 Dec 2009 01:41
Reply Quote
By extending your bitplanes to f.ex 64 bytes wide for faster addressing you can do:

moveq  #-$80, d2
ror.b d0, d2
lsr.w #3, d0
lsl.w #6, d1
add.w d0, d1
or.b d2, (a0, d1.w)
dalton
Member
#3 - Posted: 6 Dec 2009 11:32
Reply Quote
If you're referring to screens 11-13 here at ADA, I'd suggest not plotting any pixels at all. Put any static gfx in odd bitplanes, and then put a triangle in the even. The triangle should be 1 pixel wide on row 1, and extend it's size by one pixel on each side downwards. Then you create a copper list that writes to even bitplane modulo on each scanline. Then simply write the modulo that corresponds to a certain row in the triangle to draw a horisontal line of desired width. Colors can of course also be set using the copper list.
d0DgE
Member
#4 - Posted: 6 Dec 2009 12:19
Reply Quote
No, dalton, the ribbon was just an example given on what I used the pixel routine.
The colours were set using the copper ;) - it was a 1 bpl effect.

There are of course a lot more occations you can use a fast pixel routine in.

ZeroBlue:

Interesting proposition. I'll try some of this. Thanks :)
dalton
Member
#5 - Posted: 6 Dec 2009 13:02 - Edited
Reply Quote
I suggest something like this for a one-bitplane plot. In principle it's the same as the one you did, only it uses more shortcuts for setting bits and addressing...


; d0/d1 = x/y
; a0 = bitplane pointer

asl.w #6, d1 ; assuming bitplane is 64 bytes wide
lea (a0,d1.w), a0

move.w d0, d1 ; copy x
moveq #%10000000, d2 ; this is the pixel =)
and.w #7, d0 ; mask out bit position
asr.w #3, d1 ; get byte offset
lsr.b d0, d2 ; shift pixel on position
or.b d2, (a0,d1.w) ; put in place


There is good tutorial here: http://www.modermodemet.se/dalton/tut/DOTS.TXT
(it's in swedish, but code is still code I guess)
Vektor
Member
#6 - Posted: 6 Dec 2009 15:12
Reply Quote
Another option. Who calculates the cycles per pixel?

; a0 = table with pixels to be plotted
; a1 = pre calculated screen multiply table
; a2 = pre calculated x division table
; a3 = screenpointer

lea Position_table(pc),a0
lea Shift_Table_x(pc),a1
lea Mulu_Table_y(pc),a3
lea screen(pc),a2
moveq #0,d3
moveq #num_of_pixels-1,d7

.loop

move.w (a0)+,d0
move.w (a0)+,d1

add.w d1,d1 ;y=y*2 to get an index
move.w (a2,d1.w),d1 ;screen multiplication tables with multiply value

move.b (a1,d0.w),d3 ;add x-word
add.w d3,d1 ;add x word position

not.w d0 ;shift bit
bset d0,(a3,d1.W) ;plot the pixel

dbf d7,.loop
rts
Vektor
Member
#7 - Posted: 7 Dec 2009 22:11
Reply Quote
Found a typing error:

bset d0,(a3,d1.W) should be bset d0,(a2,d1.W)
d0DgE
Member
#8 - Posted: 8 Dec 2009 21:13 - Edited
Reply Quote
Nice hints Dalton, ZeroBlue. I've got it implemented and adapted.
Also thanks Vektor but I need a rather flexible on-the-fly multi-purpose plotting routy.
Maybe because I've become a custom to higher language methods like drawCircle(); ;)

Edit:

... the Y multiplication table trick from Vektor is really neat :D
dalton
Member
#9 - Posted: 9 Dec 2009 07:12
Reply Quote
I see now that I posted basically the same routine as ZeroBlue, only his was better =) Should read more carefully I guess...
coyote
Member
#10 - Posted: 9 Dec 2009 09:36
Reply Quote
I'm sure you guys noticed that dalton & ZeroBlue wrote routines that won't work on 68000 because of odd address accesses. (probably doesn't matter anyway...)
britelite
Member
#11 - Posted: 9 Dec 2009 12:38
Reply Quote
Umm, I can't see any reading or writing .w or .l at odd addresses...
coyote
Member
#12 - Posted: 9 Dec 2009 18:12 - Edited
Reply Quote
Yeah britelite. You are right.
Sorry, I must have still been sleeping...
Mea culpa. O:-}
My apologies to dalton and ZEROBlue.
Vektor
Member
#13 - Posted: 9 Dec 2009 19:33
Reply Quote
@d0DgE: Correct this routine plots a "simple" predefined array now but it can be used icw eg Bresenham to create a quite fast drawcircle routine. If you're interested I must have it somewhere in my old amiga sourcecodes.
d0DgE
Member
#14 - Posted: 10 Dec 2009 09:36
Reply Quote
Britez0r is quite right.
ZeroBlue & Dalton's approach workes fine on the 68000.
The only downside in the long run is the "or" itself, which makes it useful for
separate bitplane actions only.

@Vektor: of course I'm interested. Send it to "dodge[ät]rowdyclub[döt].de" whenever you like :)
Vektor
Member
#15 - Posted: 10 Dec 2009 20:55
Reply Quote
Found it, this was the main plotting algo. I justed checked the entire source code with UAE, with A500 speed it runs in about 1/5 of a frame with a 260 pix wide circle.

@doDgE: I will email you the entire sourcecode

* a0 = x
* a1 = y
* a2 = screen
* d0 = radius

Draw_circle:
moveq #3,d5
moveq #6,d6

moveq #0,d1 ;x=0
move.w d0,d2

subq.w #1,d2 ;d=r-1
.loop:
tst.w d2
bpl.b .no_ydec

subq.w #1,d0 ;y=y-1

add.w d0,d2 ;d=d+y

.no_ydec:
move.w a1,d3 ;y
sub.w d0,d3 ;y-r

lsl.w d6,d3 ;(y-r)*schermbreedte
lea (a2,d3.w),a3 ;screen pointer + y-offset

move.w a0,d3 ;x
add.w d1,d3 ;x+int x

move.w d3,d4
lsr.w d5,d3
not.w d4
bset d4,(a3,d3.w)

move.w a0,d3
sub.w d1,d3

move.w d3,d4
lsr.w d5,d3
not.w d4
bset d4,(a3,d3.w)

move.w a1,d3
sub.w d1,d3

lsl.w d6,d3
lea (a2,d3.w),a3

move.w a0,d3
add.w d0,d3

move.w d3,d4
lsr.w d5,d3
not.w d4
bset d4,(a3,d3.w)

move.w a0,d3
sub.w d0,d3

move.w d3,d4
lsr.w d5,d3
not.w d4
bset d4,(a3,d3.w)

move.w a1,d3
add.w d1,d3

lsl.w d6,d3
lea (a2,d3.w),a3

move.w a0,d3
add.w d0,d3

move.w d3,d4
lsr.w d5,d3
not.w d4
bset d4,(a3,d3.w)

move.w a0,d3
sub.w d0,d3

move.w d3,d4
lsr.w d5,d3
not.w d4
bset d4,(a3,d3.w)

move.w a1,d3
add.w d0,d3

lsl.w d6,d3
lea (a2,d3.w),a3

move.w a0,d3
add.w d1,d3

move.w d3,d4
lsr.w d5,d3
not.w d4
bset d4,(a3,d3.w)

move.w a0,d3
sub.w d1,d3

move.w d3,d4
lsr.w d5,d3
not.w d4
bset d4,(a3,d3.w)

sub.w d1,d2

subq.w #2,d2
addq.w #1,d1

cmp.w d0,d1
bls.w .loop
rts
z5_
Member
#16 - Posted: 11 Dec 2009 17:22 - Edited
Reply Quote
go go go, dodge! :)

@Vektor: any interest in rejoining the amigascene and code some stuff again? would be cool!
d0DgE
Member
#17 - Posted: 15 Dec 2009 19:28
Reply Quote
...by now it finally occured to me that one can create a quite convenient MACRO for this pixel plotting code ... D'OH

well, you'll stop learning
Vektor
Member
#18 - Posted: 16 Dec 2009 19:31
Reply Quote
@z5_, If you have interesting idea's I'm always open to code / review some things but don't expect too much!
Azure
Member
#19 - Posted: 21 Dec 2009 13:19
Reply Quote
It has been a long time since I did this, but this looks awefully wasteful to me.

Is this routine supposed to be optimized for 68000 or 68060? I dug around my old backups and found a 3d dotrotator I coded once. I don't think I have ever used it anywhere. It uses a similar approach as the one Mr. Pet did in roots, but may be slightly more optimized.

The innerloop performs 3D rotation, transformation into the screen space (perspectve) and pixel plotting.


.bigloop

REPT 2
move.l (a3)+,d3
move.l (a3)+,d2

move.l (a0,d0.w*4),d3 ;a0-a2 precalculated tables with
add.l (a1,d2.w*4),d3 ;M-entries. 512 longwords each
add.l (a2,d5.w*4),d3
;d0=00000000SyyyyyyySzzzzzzzSxxxxxxx
move.l (a4,d3.w*4),d1 ;Perspective for x (SzzzzzzzSxxxxxxx)
bfset (a6){d4:1} ;setpixel (a6=planepointer)
lsr.l #8,d3 ;12 free cycles...
swap d2
swap d0
add.l (a5,d3.w*4),d1 ;Perspective for y (SyyyyyyySzzzzzzz)
;d1=Dotadress (pixnr)+planeoffset for
;colors
;d1 highword=0

move.l (a3)+,d5
move.l (a0,d3.w*4),d3 ;a0-a2 precalculated tables with
add.l (a1,d5.w*4),d3 ;M-entries. 512 longwords each
add.l (a2,d2.w*4),d3
;d0=00000000SyyyyyyySzzzzzzzSxxxxxxx
move.l (a4,d3.w*4),d4 ;Perspective for x (SzzzzzzzSxxxxxxx)
bfset (a6){d1:1} ;setpixel (a6=planepointer)
lsr.l #8,d3
swap d5
add.l (a5,d3.w*4),d4 ;Perspective for y (SyyyyyyySzzzzzzz)
;d1=Dotadress (pixnr)+planeoffset for
;colors
;d1 highword=0
ENDR
Vektor
Member
#20 - Posted: 21 Dec 2009 20:53
Reply Quote
@Azure: my routine is 68000 based. I looked at yours and except for the perspective precalc with the z coordinates in the upper word and the bfset (030+?) the aproach is basically the same, precalc everything, use the coordinates as index (which can be done within the instruction on 020+)
The only thing I don't get are your first (three) longword moves, the third overwrites the first?
Azure
Member
#21 - Posted: 22 Dec 2009 01:43
Reply Quote
...the first move should probably be to D0. I was not able to check whether the sourcecode was functional.

bfset is very neat, as it allows to avoid separate shifting to calculate the address offset. There is really just a single instrution responsible for the plotting in this routine, the remaining instructions are for 3d calculations.
Rebb
Member
#22 - Posted: 29 Dec 2009 00:53 - Edited
Reply Quote
My version of the pixel plotter. Already got some good tips here (removing the mulu), but as this is my first plotter i guess there's still lot of room for improvement.
plot:
;takes d0=color,d1=x,d2=y,a0=bplane


findy:
; multiply y with 40 to get add factor for bitplane

move.w d2,d3
lsl.w #4,d2
lsl.w #3,d3
add.w d2,d2
add.w d3,d2
add.w d2,a0

checkplane:
btst.l #0,d0 ; testbit on colorvalue to get planes to plot
beq plane2
jsr pixset

plane2:
lea bplane,a0 ; bitplane address to a0
add.l d2,a0 ; start address for correct line
add.l #10240,a0 ; address of plane
btst.l #1,d0
beq plane3
jsr pixset

plane3:
lea bplane,a0
add.l d2,a0 ; start address for correct line
add.l #20480,a0
btst.l #2,d0
beq plane4
jsr pixset

plane4:

lea bplane,a0
add.l d2,a0
add.l #30720,a0
btst.l #3,d0
beq plane5
jsr pixset



plane5:

lea bplane,a0
add.l d2,a0 ; start address for correct line
add.l #40960,a0
btst.l #4,d0
beq out
jsr pixset


out:
rts

pixset:
move.l d1,d4 ; copy x to d4
move.l d1,d5 ; and d5
move.l d1,d3 ; and d3
lsr.l #3,d3 ; divide with 8 to get number of byte
add.l d3,a0 ; get to the byte we are changing

asl.l #3,d3 ; How many times did x fit in 8?
cmp #0,d3 ; If zero, x is directly the bits to set
beq nolla ;
sub.l d3,d4 ; Substract multiply of 8 from original x
move.l d4,d5 ; to get pixel number
nolla:

move.l #7,d6 ; substract 7 from pixel number
sub d5,d6 ; to get right bit
bset d6,(a0) ; set the "d6 th bit" on a0


rts


edit: What is a good way to "time out" routines like this, when optimising?
pmc
Member
#23 - Posted: 29 Dec 2009 21:30
Reply Quote
Rebb:
edit: What is a good way to "time out" routines like this, when optimising?

Do you mean: what's a good way to see how long the routine takes to execute?

If so, then seeing how many raster lines it takes will give a good indication. To do that, before your routine wait for a screen position and change the background colour. At the end of your routine, change the background colour to what it was before you changed it at the start of your routine. The number of coloured lines you can see is now the number of raster lines your routine took.

This code will do that for you:

.wt_line:	cmp.b	#160,$dff006
bne.s .wt_line
move.w #$0fff,$dff180

<your routine here>

move.w #$0000,$dff180
Vektor
Member
#24 - Posted: 29 Dec 2009 21:39 - Edited
Reply Quote
To time a routine the easiest way is just to write a color change (#0 or #$0fff) to the dff180. You will see how many raster lines your routine takes... (I see now PMC has already answered this one..)

Maybe this gives some ideas!



plot: ;takes d0=color,d1=x,d2=y,a0=bplane

lea bplane(pc),a0
lea screenpointers(pc),a1
lea y_mulitply(pc),a2
lea x_words(pc),a3

lea dot_to_plot_table(pc),a5

moveq #0,d0
moveq #0,d1
moveq #0,d2
moveq #0,d3
moveq #0,d4
moveq #0,d7

move.w (a5)+,d7 ;number of pixels to be plot

.loop
movem.w (a5)+,d0-d2;

add.w d0,d0 ;(x2 )
add.w d0,d0 ;(twice x2 makes x4 to make an index)
move.l (a1,d0),a0 ;add the right value to the bplane pointer

add.w d2,d2 ;y=y*2 to get an index
move.w (a2,d2.w),d2 ;screen multiplication tables with multiply value

move.b (a3,d1.w),d3 ;add x-word
add.w d3,d2 ;add x word position to the y position

not.w d0 ;shift to ensure the right bit is set
bset d0,(a3,d2.W) ;plot the pixel

dbf d7,.loop

rts

screenpointers:
dc.l bplane
dc.l bplane+10240
dc.l bplane+2*10240
dc.l bplane+3*10240
dc.l bplane+4*10240

x_words:
dc.b 0,0,0,0,0,0,0,0
dc.b 1,1,1,1,1,1,1,1
dc.b 2,2,2,2,2,2,2,2
dc.b 3,3,3,3,3,3,3,3
dc.b 4,4,4,4,4,4,4,4
dc.b etc

y_multiply:
dc.w 0
dc.w screen_width ; in bytes
dc.w 1*screen_width ; in bytes
dc.w 2*screen_width ; in bytes
dc.w 3*screen_width ; in bytes
dc.w 4*screen_width ; in bytes
dc.w etc

dot_to_plot_table:
dc.w 4-1; number of pixels (minus 1) to be plot
dc.w 0,0,200; plane,x,y
dc.w 1, 200,200
dc.w 2, 200,0
dc.w 3,0,0

bplane:
dcb.b 5*10240,0
Kalms
Member
#25 - Posted: 29 Dec 2009 23:12 - Edited
Reply Quote
Rebb:

your routine will invoke "pixset" multiple times. Most of the code in "pixset" will give the exact same result every time you invoke it. Thus those calculations can be moved out of the "pixset" routine.

In order to get some simple metrics, consider these:

* how many instructions do you execute when plotting a pixel with color 1?
* how many instructions do you execute when plotting a pixel with color 31?

Pick one or several metrics of the kind above, decide which are important to you, and try to improve those metrics.
ZEROblue
Member
#26 - Posted: 30 Dec 2009 02:51 - Edited
Reply Quote
Make sure you have consistent DMA activity across the lines you are measuring over using the above method, or the result might be a completely wrong indication.

A high amount of DMA activity (many bitplanes, sprites, audio, blitter running etc.) can halt the CPU severely, and going from 200 to 100 colored lines doesn't necessarily mean your routine is now twice as fast, and so this may be a very inexact method.

However if you're just looking to see if your routine simply becomes faster or slower it will work fine. Typically you would then find f.ex how many dots you can plot in the context of your demo part and still maintain the same frame rate.
noname
Member
#27 - Posted: 30 Dec 2009 10:33
Reply Quote
I would generally try to avoid the use of a setPixel function by all means. In this respect, Azure's post has leading for me. Also, macros might come in handy to inline frequently used subroutines.
d0DgE
Member
#28 - Posted: 30 Dec 2009 14:30 - Edited
Reply Quote
Exactly. The massive amounts of subroutine branches (bsr setPixel) during a drawCircle for example really slowed down my first circle draw routines. JSR is even slower.
So building a tiny MACRO with the very essential lines of setting a bit at an X | Y position is a really neat thing to implement.

@Rebb:

By scanning through your code example it occured to me, that you only ask for planes to be drawn into, not
those where you might clear a bit in order to set the right colour ( 0 - 31).
That could result in less available colours or even distort the complete screen result.

As you were showing your >1 Bitplane approach I can still give my version for a 5-Bitplane pixeldraw.
Please note: this is still the totally bloated- slow as hell version with no improvements implemented that this
nice thread offered.
I used this thing to pre-render the pixelplasma-animation shown in Wasted Years' end screen and it is the
very reason I had to build a "werkkzeug" loaderbar :/

This routine is very convenient when it comes to the colour values.
You just drop f.e. "17" to _drpColour, give the coordinates and screen and fire the damn thing, but
it is in no way "real-time" fit


word equ 15
_drpPlaneSize: dc.l plsize
cnop 0,4
_drpColours:
dc.b 0 ; 180 : 00
dc.b %00000001 ; 182 : 01
dc.b %00000010 ; 184 : 02
dc.b %00000011 ; 186 : 03
dc.b %00000100 ; 188 : 04
dc.b %00000101 ; 18a : 05
dc.b %00000110 ; 18c : 06
dc.b %00000111 ; 18e : 07
dc.b %00001000 ; 190 : 08
dc.b %00001001 ; 192 : 09
dc.b %00001010 ; 194 : 10
dc.b %00001011 ; 196 : 11
dc.b %00001100 ; 198 : 12
dc.b %00001101 ; 19a : 13
dc.b %00001110 ; 19c : 14
dc.b %00001111 ; 19e : 15
dc.b %00010000 ; 1a0 : 16
dc.b %00010001 ; 1a2 : 17
dc.b %00010010 ; 1a4 : 18
dc.b %00010011 ; 1a6 : 19
dc.b %00010100 ; 1a8 : 20
dc.b %00010101 ; 1aa : 21
dc.b %00010110 ; 1ac : 22
dc.b %00010111 ; 1ae : 23
dc.b %00011000 ; 1b0 : 24
dc.b %00011001 ; 1b2 : 25
dc.b %00011010 ; 1b4 : 26
dc.b %00011011 ; 1b6 : 27
dc.b %00011100 ; 1b8 : 28
dc.b %00011101 ; 1ba : 29
dc.b %00011110 ; 1bc : 30
dc.b %00011111 ; 1be : 31
_drpColour:
dc.b 0
cnop 0,4
drawRealPixel:
; drawing plane => a0
; X => d0
; Y => d1
; colour offset => d3
; SCRAP => d2-d5

movem.l d0-d7/a0/a1,-(a7)
lea _drpColours(pc),a1
move.l _drpPlaneSize(pc),d6

moveq #0,d4
tst.b d3 ; is there a colour?
bne.s .ok
movem.l (a7)+,d0-d7/a0/a1
rts
.ok:
move.b (a1,d3),d4 ; colour byte
moveq #0,d5 ; first bit in colour byte
; move to position
move.w d1,d2 ; copy y
lsl.w #4,d1 ; multiply by 40
lsl.w #3,d2
add.w d1,d1
add.w d2,d1
add.w d1,a0 ; y pos first
; manage the X position
move.w d0,d2 ; copy x
lsr.w #3,d0 ; divide by 8 to get the hardposition
and.w #$000f,d2 ; mask the 4 lower bits for 0-15 softposition
btst #0,d0 ; is the hardposition an odd value ?
beq.s .even ; nope ...skip the -1 action
subq.w #1,d0 ; it's odd ...sub 1 to keep even steps (A500!)
.even:
sub.w #word,d2
neg.w d2
add.w d0,a0 ; move to hardposition

moveq #planes-1,d7
.drawlp:
move.w (a0),d3 ; take current screendata word sized
btst d5,d4 ; bit 0 or 1 (i.e. clear or draw)
beq.s .clear
bset d2,d3 ; colour bit is lit -> set it lit in the data
bra.s .cont
.clear:
bclr d2,d3 ; colour bit is clr -> clear it in the data
.cont:
move.w d3,(a0) ; write back the modified data
addq.b #1,d5 ; prepare next colour bit to test
add.l d6,a0 ; jump to next plane
dbf d7,.drawlp
.end:
movem.l (a7)+,d0-d7/a0/a1
rts

sp_
Member
#29 - Posted: 30 Dec 2009 23:07
Reply Quote
Azure's example replaces a matrix multiplication and perspective transformation with a set of small lookuptables.

9 multiplications and 2 divisions per pixel removed with Dynamic programming.

In CodeTherory Matrix approximations have proven not to give the optimal codes. Dynamic programming might...

There is a faster way to solve this problem on the a500. If I ever finnish my a500 demo I will show you. ;)
Azure
Member
#30 - Posted: 5 Jan 2010 01:53
Reply Quote
sp:

On A500 you can simply hardcore all offsets to the lookup tables and completely unroll the loop. Graham has done something like this on C64 long long ago...
 Page:  1  2  »» 

  Please log in to comment

  

  

  

 

A.D.A. Amiga Demoscene Archive, Version 3.0