A.D.A. Amiga Demoscene Archive

  Welcome guest! Please register a new account or log in




log in with SceneID


Demos Amiga Demoscene Archive Forum / Coding / Bresenham line drawing routine


Author Message
#1 - Posted: 24 May 2007 22:24 - Edited
Reply Quote
Because i thought that line drawing would be pretty important at some point and open doors to many effects, i tried to translate the bresenham line algorithm into assembler. I used the optimised (purely integer) version and came up with this result (plots the line on a chunky 256 color screen).

Questions about the routine:
- the routine as i did it, has problems showing lines moving to the left with a slope > 1. It shows them with a slope of 1. Anyone see a reason?
- as this is my first "real" assembler routine, can somebody give some points, hints, good coding habits,... What did i do wrong and what did i do right (if any).
- is somebody brave enough to help rewrite the routine in a better way + optimize + explain the changes done
- this routine seems so long. Ways to do it shorter?

General questions:
- is this a good direction for cpu linedrawing or am i barking up the wrong tree here?
#2 - Posted: 25 May 2007 01:40
Reply Quote
I can throw some thoughts at you:

move.b #0,steep - use one of the data registers you just cleared, e.g. move.b d0,steep

cmp.w #0,d4 - tst.w d4 is smaller/faster. But the subtraction just before it sets the flags you need:

sub.w d0,d4 ; sets carry flag if d0>d4, i.e. if (d4-d0)<0
bcc.b .dy_pos ; "branch if carry clear", i.e. branch if result is not negative

move.w d4,dy - since you have free address registers, you can put these variables into a structure and address it with move.w d4,dy(a5). "dy" would then be a 16-bit offset rather than a 32-bit absolute address.

On a more general note, it's important to use the registers as much as possible, especially in the innerloop. Reading a variable from memory inside the innerloop is very naughty. If you've run out of data registers, you could use one of the address registers, or consider making two versions of the innerloop, since in your case you only have two possible values of ystep.

There's one thing that's worse than memory access in your innerloop of course, and that's calling a subroutine. :) BSR+RTS alone take time to push the program counter onto the stack, then branch, then pop it back up, and branch again. Using a general routine for a general task is fine in principle, but the overhead is enormous in the case of plotting one pixel.

If you start by replacing the call to plot_pixel by the subroutine itself (inlining it), you've not only saved a few cycles, but you'll also notice that some of what plot_pixel does can safely be put outside the innerloop. The next thing to notice is that using a general pixel plotting routine like that doesn't take advantage of the fact that every pixel you plot is right next to the previous pixel. A better way to structure the routine would be:

- do the setup
- get initial x,y
- get chunky-buffer offset of (320*y+x) of first pixel -> offs
- start plotting directly to (offs)+ or -(offs)
- instead of adding ystep of +/- 1, add +/- 320 to offs

This will replace all of what plot_pixel does save for the move.b at the end. Of course, when doing that you won't be able to draw all lines with the same innerloop. So when the slope is larger than 1, you'd have to use a second version rather than just switching x and y. That's OK though, because it also makes the innerloop(s) shorter, which is always good.

Now, first of all, good job getting it to work. I have to say that before revealing the awful truth about Breshenham: it's not a very good line-drawing algorithm. It's good for very limited CPUs, such as the 6502, but not the 680x0. Here's a simpler approach:

if x0>x1, swap (x0,y0) with (x1,y1)

let slope = dy/dx, in 0.16 fixed-point (assumes slope < 1)
let y_frac = 0, in 0.16 fixed-point
let offs = 320*y0+x0
let y_step = 320*sgn(slope)

for x = x0 to x1
- plot pixel at (offs)+
- y_frac += slope (discarding integer part of result, but noting the carry flag)
- if last operation overflowed, offs += y_step
next x

This works in the case where |dx|>=|dy|. If |dx|<|dy| you'd have a similar routine with the roles of x and y reversed. And of course you have to mind special cases such as a zero-length line, but with that out of the way, the innerloop is:

move.b d0,(a1)+
add.w d1,d2
bcc.b .skip
add.w d3,a1
subq.w #1,d7
bne.b .innerloop

It's not the fastest way to draw lines, but it's getting there. Next step could be to predict the length of each horizontal span (between points where ystep is added) and draw it with a one-instruction loop.

That took too long to write so I can't fix your routine now :). But I did notice that this:

add.b ystep(pc),d1 ;y=y+ystep

Isn't doing what you want. ystep is a byte, and adding it as a byte to d1 won't affect d1 as a word. You get this behaviour: If d1 = $00ff, d2=$0002,

- add.w d2,d1 produces d1 = $0101
- add.b d2,d1 produces d1 = $0001

Easy fix is to redefine ystep as a word.

Anyway, hope it all helps. KTHX
#3 - Posted: 25 May 2007 01:59 - Edited
Reply Quote
Many good points by doom in the above post. I would just like to add a few comments about register allocation.

Register allocation is the process of figuring out "which variables shall I keep in which registers?". The reason why you keep variables in registers is because it can make your code execute faster. Register allocation can be thought of as a sort of "caching".

To make register allocation simple, do the following:
1. Look through a section of your code and try to identify some variables that you access many times. During the setup calculations, you seem to touch x0,x1,y0,y1,dx,dy several times.
2. Assign one register to hold the value of one matching variable. Perhaps: d0 - x0, d1 - y0, etc.
3. At some place in the code, where you have loaded the variables into corresponding registers, make a comment which shows exactly which variables are kept in which registers at that place in the code. Example:
	lea	screenbase,a0
	move.w	x0(pc),d0
	move.w	  y0(pc),d1

;	d0.w	x0
;	d1.w	y0
;	a0	screenbase

	. .. use d0, d1 instead of x0, y0 here ...

4. When the registers<->variables mapping is no longer valid (perhaps because have re-used a few registers for other purposes), place a new comment in your code.

I suggest you have one comment near the beginning of the line drawing routine, and another comment before the beginning of the actual pixel-loop.

Once you have done such a register allocation, try to avoid loading/storing those registers unnecessarily. Generally, you would need to store out the value in a register only once: just before you are about to re-use the register for caching another variable (or for performing some temporary calculations).
#4 - Posted: 25 May 2007 12:52
Reply Quote
@kalms & doom: thanks for the, once again, really good explanation! Believe it or not, but i actually tried to map the variables into registers as much as possible but with only 7 registers, it just didn't seems possible in some situations. I'm not sure yet as to how i can use address registers in this case, but i'll have a think about it. Can i move any value into an address register and use it as i would with a dataregister?
#5 - Posted: 25 May 2007 14:17 - Edited
Reply Quote
Operations which use an address register as source and/or destination can only be performed on word- or longword-sized operands (i.e. no "move.b #1,a0").
The MOVE/ADD/SUB etc machine-instructions do not support having address registers as destination.
Instead, the MOVEA/ADDA/SUBA etc instructions have to be used.

Your assembler will do the translation for you, so "move.w d0,a0" will be translated into "movea.w d0,a0" during assembly, without any complaints.

The reason why it is important to know the distinction between xxx and xxxA instructions is:

* xxxA instructions can only work with word- and longword-sized operands:
	move.b	d0,a0		; assembly will fail
	move.w	d0,a0		; OK
	move.l	d0,a0		; OK

* An xxxA.w instruction will sign-extend the 1st operand 16->32bit, and after that the rest of the operation will be performed in 32 bits:
	; a0 = $12345678
	; d0 = $abcdffff
	add.w	d0,a0
	; a0 = $12345677
	move.w	d0,a0
	; a0 = $ffffffff
	move.l	d0,a0
	; a0 = $abcdffff

* xxxA instructions do not affect flags:
	cmp.w	#3,d0
	sub.l	a0,a1
	beq.s	blah	; jump if d0 == 3

Also, there are no xxxA versions of any logical/bitwise operations (such as Scc, AND/OR/EOR/NOT, shifts/rotates). You can generally just do MOVE,ADD,SUB to the address registers.
#6 - Posted: 25 May 2007 14:22
Reply Quote
By the way, did you get the routine to run for lines with slope > 1 yet? If not:

* Pick an example (x0,y0,x1,y1) pair with slope < 1. Work through the algorithm, by hand, and see what the deltas become. Pay attention to how one of (dx,dy) needs to be bigger than the other in the pixel-loop -- otherwise the line will *always* be drawn at a 45 degree angle.

* Pick another example, with slope > 1. Work it through and check if the (dx,dy) still have the desired relationship. If not, work backwards through the code until you figure out where things have gone wrong.

To make debugging easy, make sure that the line is going down-right for both tests. Perhaps use the following two lines:

10,10 .. 30, 15

10, 10 .. 15, 30
#7 - Posted: 25 May 2007 19:27
Reply Quote
You can find a new version here. I've taken onboard most of the comments (the ones i didn't are the ones i don't understand yet :o)).

It seems to work in all directions now but i didn't find my error. It was probably doom's advice on the add.b ystep?

Using dataregisters as much as possible (instead of variables) does reduce the readability of the code a lot in my experience. But if there's one thing i've learned so far, it's that assembler isn't about readability...

Comments are welcome. In the end, it's still a straightforward bresenham conversion though from a newbie coder :)
#8 - Posted: 26 May 2007 10:56
Reply Quote
Updated the routine: discovered "exg" to swap the contents of two dataregisters :o)
#9 - Posted: 26 May 2007 14:14
Reply Quote
You only need to calculate dx and dy once.
#10 - Posted: 26 May 2007 15:24
Reply Quote
add.w a2,d6 ;error=error+dy
tst.w d6 ;error>0?
ble.s .loop

The tst.w instruction isn't needed. ADD always sets a condition code you can use to conditionally branch right afterwards. ADDA doesn't, as Kalms pointed out, but ADDA is only used when the destination is an address register. You can do this:

add.w a2,d6 ;error=error+dy
bpl.b .loop ; error>0?

About readability, there's lots you can do to keep ASM code readable. Much of it is about doing stuff the right way. Like you discovered EXG - 68k in particular has a beautiful instruction set which is more high-level than C in many ways. If you use it right some routines will be more readable in 68k ASM than in any HLL.

Sometimes you need to consider macros too. It's true that you only need to calculate dy and dx once each, but even then, you might increase readability (it's a subjective thing after all) if you define a macro to compute the absolute difference between two registers:

sub.w \1,\2
subx.w \3,\3
eor.w \3,\2
sub.w \3,\2

(Kalms explained the method itself in the other thread.) Now, to use that you'd simply enter the macro like any other instruction, specifying a source register, a destination register and an available scratch register like so:

ABS_W d1,d2,d6 ; d2.w <- abs(d1.w-d2.w), trashes d6.w

With good macros that are properly documented, and conventions like keeping macro names in caps, or prefixing with "M_", or whatever, your code can become much easier to manage. It is an abstraction, of course, so if you end up overdoing it you might as well be using a HLL.

Now, about the innerloop. It should be three-four times as fast now as in the first version, but there are still things you could change. The control structure you have is this:

let x=x0
if x=x1 goto endloop
goto beginloop

The ideal structure would usually be:

let x=x0
let count=x1-x0
if count=0 goto skiploop
if count!=0 goto beginloop

If you consider how that translates to ASM, you should see that the latter is less convoluted (more readable) and makes a shorter innerloop.

Also, you still have one innerloop plotting to either (x,y) or (y,x) depending on a condition that doesn't change during the course of the loop:

if (flag) then
- plot_x = y
- plot_y = x
- plot_x = x
- plot_y = y
end if
plot at 320*plot_y+plot_x

You'd be better off testing the flag right before the loop and jumping to one of two different versions of the loop. That saves three instructions and makes each version more readable than the combined version you have now.
#11 - Posted: 26 May 2007 16:18
Reply Quote
I've taken on board the "dx and dy only need to be calculated once" + bmi instead of tst+ble + ditched dx and dy variables. It's getting shorter :o)

I'll try to optimize the loop now...
#12 - Posted: 26 May 2007 21:14
Reply Quote
#13 - Posted: 27 May 2007 12:40 - Edited
Reply Quote
That's much nicer already. If you want to continue optimizing, the next thing you should look at is removing the screen offset calculation from the innerloop. For a slope<1, you always either:

- step one pixel to the right, or
- step one pixel to the right and one line up/down

So, your first loop could look like this:

;small steep => plot(x,y)

        move.w  d1,d3
	mulu.w  #320,d3
        add.l   d3,a0       ;screenptr+=320*y
        add.w   d0,a0       ;screenptr+=x


        move.b  #1,(a0)+    ;plot pixel/increase  
x add.w a2,d6 ;error=error+dy bmi.s .lp_sm ;error < 0 add.w d4,a0 ;y=y+ystep sub.w a1,d6 ;error=error-dx .lp_sm subq.w #1,d5 ;decrease loop counter bne.s .lp_small_steep

Of course in this case your ystep should be +/- 320 rather than +/- 1.
#14 - Posted: 28 May 2007 18:42
Reply Quote
@doom: changed it here. While the code has grown with the last few steps, the innerloop is much smaller.

Haven't tested it with all directions again but i hope it works. There still is something wrong with both this and the circle routine. When i change the coords (x0,y0,...), save, assemble and run, then sometimes i get trash on screen (grey screen and a few white dots) + asmone returns with an error. When i reset and run again with the same coords, it works.
#15 - Posted: 28 May 2007 18:43 - Edited
Reply Quote
Going a bit further, are there any easy effects where i can apply this routine. Drawing one line is fine but actually doing something with it would be cool :o) ... even if it's not the best line drawing algo.
#16 - Posted: 28 May 2007 19:05
Reply Quote
hmmm, you could try to clone the line tunnel from gevalia/polka brothers?
#17 - Posted: 28 May 2007 19:20
Reply Quote
You can make a volumelight-ish effect if you draw lines from one point through each pixel along the edge of a shape.

- Load up some shape in PPaint and save the outline to a chunky picture.
- Scan through the chunky picture and put the coordinates of every "on" pixel in an array
- Define an origin (point) for your light, A
- For each point B in your edge point array, get the vector C=B-A, then draw a line from B to B+C
- When you draw the lines, use the loop counter as the pixel colour, and add it to the picture instead of just writing it.

Now when you move A, you should get a fun pseudo-3D effect.
#18 - Posted: 28 May 2007 21:03
Reply Quote
@doom: sounds do-able. Any demo where i can see this in action?
#19 - Posted: 29 May 2007 00:06
Reply Quote
Abecedarian has a similar effect:

You may need to improve the line drawing routine along the way, specifically you want clipping if you want to move the light around with a lot of freedom. And you can improve it along the way by normalizing the "C" vector, and stuff like that. But you should be able to get something vaguely similar to that screenshot on screen without too much trouble.
#20 - Posted: 4 Jun 2007 23:40 - Edited
Reply Quote
As a general rule, never thrust or try to learn from the code i put online :o) (there's a bug in the last line drawing version, when i moved the offset screen calcul outside the loop).

@doom: i just started trying this effect. I was wondering: in a case like this, how do i define the size of my array with coordinates? Can i get the number of pixels from a paint program? The array will be 320*200*2 words at max, but it seems pointless to reserve more than the number of pixels*2 (which will always be much smaller).

Also, i trace the picture with a simple asm routine, but is there a way i can save the result of this in a text file or something. Seems pointless to trace the pic "at runtime" when this can be done before.
#21 - Posted: 5 Jun 2007 10:00
Reply Quote
First of all make sure you want to. A 320x200 chunky picture that only has a few white pixels in a simple shape will compress to almost nothing using Stonecracker or some such. Your array will probably take up more space.

Anyway, run the effect once and write the table out as a binary file. It's the "wb" command at the prompt. You specify filename and start and stop addresses subsequently, so to write 1500 points, each two words (4 bytes), of a table called my_points, do:

- wb
- my_points
- my_points+1500*4

That'll save from my_points up to but not including my_points+1500*4. Ie. the output is exactly 6000 bytes. Then remove the code that generated the table and replace your points array buffer with

my_points incbin "thefileyoujustcreated.bin"

To get the number of points after you've first built the table, define a counter as a variable in memory which you can then read off with the "h" command after execution.

Since you may need to generate the table again, it makes sense to put the code for that in a little source file of its own. I'd typically do this:

lea input,a0
lea output,a1
<your code here, reads from (a0) or whatever, writes to (a1)+>
; now a1 points to end of output
lea output,a0 ; now a0 points to beginning of output

input incbin "inputpicture.chk"
output ds.b 1024*1024

Don't worry about sections or efficiency or anything, just make sure the output buffer is more than big enough. And then run it and write out the data between A0 and A1.

For a simple array you can read the no. elements from the filesize. Include the file like this:

no_points = (my_points_end - my_points) / 4


my_points incbin "thefile.bin"
#22 - Posted: 5 Jun 2007 19:31
Reply Quote
#23 - Posted: 5 Jun 2007 19:31 - Edited
Reply Quote
Bottom picture should be the result of tracing the top picture (just a test pic), with A being somewhere in the middle of the pic. Is this still correct?

Thinking a bit further, i have a couple questions:
- if you move A, i presume you need to clear the screen each time to remove the previous lines?
- generally speaking, do you add clipping to the line draw routine or do you make sure the coords are "on-screen" before passing them to the line draw routine?
#24 - Posted: 5 Jun 2007 21:35
Reply Quote
Yes, that looks right. And yes, if you move A, then you need to clear the screen or redraw the background. That's the case with any effect that doesn't redraw every pixel in the area it works on.

You can do clipping either way. The best way (generally) is to append a clipping routine to the beginning of the line drawer, so that by the time you start drawing the line you're sure both endpoints are inside the screen rectangle.
#25 - Posted: 6 Jun 2007 19:28 - Edited
Reply Quote
In the meantime, i've got this running but it needs more work (clipping, color, faster,...). I don't really understand this though:

When you draw the lines, use the loop counter as the pixel colour, and add it to the picture instead of just writing it.

I can't really imagine the result/purpose of that line.

btw. you can see this effect in mnemonics, i think.

" style="width:150px

Looks cool!
#26 - Posted: 6 Jun 2007 20:22
Reply Quote
I've never coded this effect but I'm guessing it's to fade the line out to black as it comes towards the screen.
#27 - Posted: 7 Jun 2007 01:02
Reply Quote
Right now you're drawing a pixel of colour 1 in your innerloop, and d5 is your loop counter. If you do this instead, you'll get a gradient along the line:

move.b d5,(a0)

Quick and simple. Of course, it may not be the effect you want, but it's a place to start. By adding rather than moving, you get each line overlaid on any lines it overlaps. Since you will have overlapping lines, you have to deal with it one way or another and for volumelight, addition is appropriate. So:

add.b d5,(a0)

You may or may not run into some ugly overflow then, but then you just fix that. :)
#28 - Posted: 7 Jun 2007 09:42
Reply Quote
The effect quoted from Mnemonics works slighlty different. It is a rendered in two passes:
1.) render polygons (not lines) in greyscale to an offscreen buffer
2.) combine that buffer with the colour background to generate the final result


  Please register a new account or log in to comment





A.D.A. Amiga Demoscene Archive, Version 3.0