A.D.A. Amiga Demoscene Archive

  Welcome guest! Please register a new account or log in

  

  

  

log in with SceneID

  

Demos Amiga Demoscene Archive Forum / Coding / I like pixel coding.

 

Author Message
krabob
Member
#1 - Posted: 19 Jan 2005 11:38
Reply Quote
.loopx

move.l d7,d5
move.w d1,d6
swap d5 ;_UGg
move.b d7,d6 ; __VU

move.b (a0,d6.l),d5 ; a0 texture image, d5.w=(goureau,texturecolor)
move.b (a1,d5.w),(a2)+ ; a1 colortable, a2 chunky screen

add.l d4,d1 ; u_Vv ; increment U,V and G vector
addx.l d3,d7 ; Gg_U

dbf.s d2,.loopx

:-) What is it ? :-)
... Some simple texture mapping loop with a goureau on it !
rload
Member
#2 - Posted: 21 Jan 2005 20:27
Reply Quote
cute :) there are so many dialects in asm.. never seen an "dbf.s" before.. but, is this gouraud smooth? seems like you thrash some bits.
rload
Member
#3 - Posted: 21 Jan 2005 20:28
Reply Quote
orrr.. maybe not :)
krabob
Member
#4 - Posted: 25 Jan 2005 11:37
Reply Quote
agin some note that could interest some people about inner loops vectors:

8 bit "after the point" are enough if you don't "zoom" the texture too much. As my whole routines are resolution independant, 800x600 means less UV precision than with 320x240. Note, that these vectors and start values are calculated with 16:16 pecision, and are only downgraded to 8:8 for the inner loop, to use the less register possible.
(oh yes, dbf must be "decrease branch false", equ. to others dbxx. )

But this is not interesting. this is interesting :-)


.loopx

move.b (a0,d1.w),d2 ; a0 start of the texture line
beq.s .nowrite
move.b d2,(a1) ; a1 screen
.nowrite
addx.l d0,d1 ; vector X: xxXX
addq.l #1,a1 ; next screen pixel.

dbf.s d2,.loopx

I like this one very much. It is my 2D sprite zoom pixel writing:
if color of the sprite are 1-255 the pixels are writen, if 0 it is not.
a0 points the texture, but in fact the start of a line. a1 point the screen writen. As there is only a need for one vector to go on, one addx is enough. the texture can have any width (up to 32767 pixels)

What I like is the fact that the vector need its own previous extra bit "x". This one is not trashed because the addq is on a adress register, so the "x" is kept in all the loop. the same way, move ...,d2 update the equality bit wich means no test for the "beq". Also, my x8 unrolled version only do one addq.l #8,a1 for 8 pixels :-) This kind of loop really makes me thing 680x0 rules.
rload
Member
#5 - Posted: 14 Feb 2005 03:26
Reply Quote
Cool!
how about some mipmaps for 2d sprites? When a large sprite is zoomed a lot there will be plenty of cache trashing, but with progressively smaller textures, more of the accessed pixels of a line will fit in the cache when zooming out.. .. aight..
krabob
Member
#6 - Posted: 1 Mar 2005 11:38
Reply Quote
ah ah ! OK, I'm a bit oldschool with my loops. Good cache idea.
winden
Member
#7 - Posted: 1 Nov 2005 11:31
Reply Quote
I can give a nice pixel-code trick which works like a charm... saturating add... it is useful for example when doing a 256-color 2d-metaballs routine and you need to make sure that adding data doesn't go over 255. I'll give the example for adding two pictures:

.lx
move.b (a0)+,d0
add.b (a1)+,d1
subx.b d2,d2
or.b d2,d1
move.b d1,(a2)+
dbra d7,.lx

trick lies squarely on subx... it gives $00 when value didn't overflow and $ff when it did, nice!
rload
Member
#8 - Posted: 1 Nov 2005 18:23
Reply Quote
@winden : coolness :)
winden
Member
#9 - Posted: 2 Nov 2005 22:17
Reply Quote
adapting the trick for saturating 6bit-per-channel RGB data all-in-one-go is left as an excercise for the reader...

hint: %01000000 - %00000001 == $%00111111
rload
Member
#10 - Posted: 2 Nov 2005 22:43 - Edited
Reply Quote
@winden : I guess d2 has to be zeroed in the above too?
edit : no it zeroes itself of course...
rload
Member
#11 - Posted: 2 Nov 2005 22:44
Reply Quote
or no.. it zeroes itself :)
krabob
Member
#12 - Posted: 12 Dec 2005 16:35
Reply Quote
HEY, BUT.... THe FORUM IS BACK ???
-> YAHOO !

winden the genius wrote:

> subx.b d2,d2
> or.b d2,d1

arrrgh ! OK, it took me 2 seconds to understand the subx, and 4 minutes to understand or.b :-)
z5_
Member
#13 - Posted: 12 Dec 2005 17:19
Reply Quote
@Krabob:
The forum is back since a while now :) I was wondering where you were. In fact, it seems to have grown somewhat in activity.
StingRay
Member
#14 - Posted: 20 Dec 2005 17:17
Reply Quote
Yeah, now that Krabob is back, I'd like to ask a question: which stupid assembler allows you to write dbf.s? I just wonder... :) It's nearly as evil as to write moveq.w (no names mentioned ;D).
krabob
Member
#15 - Posted: 21 Dec 2005 10:42
Reply Quote
devpac ! ( and yes, it has some bugs here and there..)

OK, I knew for moveq.w, but I didn't knew dbf.s could be "assembled as something else".
However, for all branching instructions, the dot.size refers to the "jumpable domain" (.s -> [-128,127]), not to the register length. (always .w for dbX ) I am right ? (not sure.) From this point of view, It should be safe. ... And actually, this dbf.s was took on a web example at the time I wrote this post, If I remember well.
dalton
Member
#16 - Posted: 21 Dec 2005 12:40
Reply Quote
@winden

The saturation trick is cool. I've tried to adapt it for 6bit chunky. The fastest I could come up up with is also the most ugly; simply shifting up 2 bits wich detects the overflow and then use the 8bit saturation before shifting down again. I was kind of hoping to find a more brilliant solution =)

As far as I can see you have to use at least one extra instruction to even detect the overflow. So how is it possible to do it as fast as the 8bit saturation? Maybe you could give a hint? =)
winden
Member
#17 - Posted: 21 Dec 2005 17:38 - Edited
Reply Quote
@dalton

if you are doing it for less than 8bits, you can do it with 4 pixels at the same time by emulating the subx, I never timed if it was as fast as the 8bit one, but surelly it's really bcos even if you need about ten instructions, they calculate 4 pixels in one go.

it's easier to explain in 4bit... if you get this result after adding:

$15 == %00010101

you can see 5th bit is the overflowed bit... now you mask it:

%00010101 and %00010000 == %00010000

and get it clean

this can be adapted to 4pixels in one go:

$05151505 == %00000101 00010101 00010101 00000101

%00000101 00000101 00010101 00000101 (pixels after adding)
and
%00010000 00010000 00010000 00010000 (overflow mask)
==
%00000000 00010000 00010000 00000000 (overflow value)

this cleaned-up value is then usable for computing the "or-value":

%00000000 00001111 00001111 00000000 (or value)

converting the overflow value into the or value is then easy as pie ;)

btw, this last trick has a long story: i386 version was by kalms and then peskanov recoded it for ppc, and then I recoded it for m68k.


yes i almost forgot... maybe this shifting way is faster on 060 (1cycle and pairable, so 1cycle per 2 pixels) than on 030 (4cycles)

 

  Please register a new account or log in to comment

  

  

  

 

A.D.A. Amiga Demoscene Archive, Version 3.0