A.D.A. Amiga Demoscene Archive

Amiga Demoscene Archive Forum / Coding / I like pixel coding.

Author	Message
krabob Member	#1 - Posted: 19 Jan 2005 11:38 Reply Quote .loopx move.l d7,d5 move.w d1,d6 swap d5 ;_UGg move.b d7,d6 ; __VU move.b (a0,d6.l),d5 ; a0 texture image, d5.w=(goureau,texturecolor) move.b (a1,d5.w),(a2)+ ; a1 colortable, a2 chunky screen add.l d4,d1 ; u_Vv ; increment U,V and G vector addx.l d3,d7 ; Gg_U dbf.s d2,.loopx :-) What is it ? :-) ... Some simple texture mapping loop with a goureau on it !
rload Member	#2 - Posted: 21 Jan 2005 20:27 Reply Quote cute :) there are so many dialects in asm.. never seen an "dbf.s" before.. but, is this gouraud smooth? seems like you thrash some bits.
rload Member	#3 - Posted: 21 Jan 2005 20:28 Reply Quote orrr.. maybe not :)
krabob Member	#4 - Posted: 25 Jan 2005 11:37 Reply Quote agin some note that could interest some people about inner loops vectors: 8 bit "after the point" are enough if you don't "zoom" the texture too much. As my whole routines are resolution independant, 800x600 means less UV precision than with 320x240. Note, that these vectors and start values are calculated with 16:16 pecision, and are only downgraded to 8:8 for the inner loop, to use the less register possible. (oh yes, dbf must be "decrease branch false", equ. to others dbxx. ) But this is not interesting. this is interesting :-) .loopx move.b (a0,d1.w),d2 ; a0 start of the texture line beq.s .nowrite move.b d2,(a1) ; a1 screen .nowrite addx.l d0,d1 ; vector X: xxXX addq.l #1,a1 ; next screen pixel. dbf.s d2,.loopx I like this one very much. It is my 2D sprite zoom pixel writing: if color of the sprite are 1-255 the pixels are writen, if 0 it is not. a0 points the texture, but in fact the start of a line. a1 point the screen writen. As there is only a need for one vector to go on, one addx is enough. the texture can have any width (up to 32767 pixels) What I like is the fact that the vector need its own previous extra bit "x". This one is not trashed because the addq is on a adress register, so the "x" is kept in all the loop. the same way, move ...,d2 update the equality bit wich means no test for the "beq". Also, my x8 unrolled version only do one addq.l #8,a1 for 8 pixels :-) This kind of loop really makes me thing 680x0 rules.
rload Member	#5 - Posted: 14 Feb 2005 03:26 Reply Quote Cool! how about some mipmaps for 2d sprites? When a large sprite is zoomed a lot there will be plenty of cache trashing, but with progressively smaller textures, more of the accessed pixels of a line will fit in the cache when zooming out.. .. aight..
krabob Member	#6 - Posted: 1 Mar 2005 11:38 Reply Quote ah ah ! OK, I'm a bit oldschool with my loops. Good cache idea.
winden Member	#7 - Posted: 1 Nov 2005 11:31 Reply Quote I can give a nice pixel-code trick which works like a charm... saturating add... it is useful for example when doing a 256-color 2d-metaballs routine and you need to make sure that adding data doesn't go over 255. I'll give the example for adding two pictures: .lx move.b (a0)+,d0 add.b (a1)+,d1 subx.b d2,d2 or.b d2,d1 move.b d1,(a2)+ dbra d7,.lx trick lies squarely on subx... it gives $00 when value didn't overflow and $ff when it did, nice!
rload Member	#8 - Posted: 1 Nov 2005 18:23 Reply Quote @winden : coolness :)
winden Member	#9 - Posted: 2 Nov 2005 22:17 Reply Quote adapting the trick for saturating 6bit-per-channel RGB data all-in-one-go is left as an excercise for the reader... hint: %01000000 - %00000001 == $%00111111
rload Member	#10 - Posted: 2 Nov 2005 22:43 - Edited Reply Quote @winden : I guess d2 has to be zeroed in the above too? edit : no it zeroes itself of course...
rload Member	#11 - Posted: 2 Nov 2005 22:44 Reply Quote or no.. it zeroes itself :)
krabob Member	#12 - Posted: 12 Dec 2005 16:35 Reply Quote HEY, BUT.... THe FORUM IS BACK ??? -> YAHOO ! winden the genius wrote: > subx.b d2,d2 > or.b d2,d1 arrrgh ! OK, it took me 2 seconds to understand the subx, and 4 minutes to understand or.b :-)
z5_ Member	#13 - Posted: 12 Dec 2005 17:19 Reply Quote @Krabob: The forum is back since a while now :) I was wondering where you were. In fact, it seems to have grown somewhat in activity.
StingRay Member	#14 - Posted: 20 Dec 2005 17:17 Reply Quote Yeah, now that Krabob is back, I'd like to ask a question: which stupid assembler allows you to write dbf.s? I just wonder... :) It's nearly as evil as to write moveq.w (no names mentioned ;D).
krabob Member	#15 - Posted: 21 Dec 2005 10:42 Reply Quote devpac ! ( and yes, it has some bugs here and there..) OK, I knew for moveq.w, but I didn't knew dbf.s could be "assembled as something else". However, for all branching instructions, the dot.size refers to the "jumpable domain" (.s -> [-128,127]), not to the register length. (always .w for dbX ) I am right ? (not sure.) From this point of view, It should be safe. ... And actually, this dbf.s was took on a web example at the time I wrote this post, If I remember well.
dalton Member	#16 - Posted: 21 Dec 2005 12:40 Reply Quote @winden The saturation trick is cool. I've tried to adapt it for 6bit chunky. The fastest I could come up up with is also the most ugly; simply shifting up 2 bits wich detects the overflow and then use the 8bit saturation before shifting down again. I was kind of hoping to find a more brilliant solution =) As far as I can see you have to use at least one extra instruction to even detect the overflow. So how is it possible to do it as fast as the 8bit saturation? Maybe you could give a hint? =)
winden Member	#17 - Posted: 21 Dec 2005 17:38 - Edited Reply Quote @dalton if you are doing it for less than 8bits, you can do it with 4 pixels at the same time by emulating the subx, I never timed if it was as fast as the 8bit one, but surelly it's really bcos even if you need about ten instructions, they calculate 4 pixels in one go. it's easier to explain in 4bit... if you get this result after adding: $15 == %00010101 you can see 5th bit is the overflowed bit... now you mask it: %00010101 and %00010000 == %00010000 and get it clean this can be adapted to 4pixels in one go: $05151505 == %00000101 00010101 00010101 00000101 %00000101 00000101 00010101 00000101 (pixels after adding) and %00010000 00010000 00010000 00010000 (overflow mask) == %00000000 00010000 00010000 00000000 (overflow value) this cleaned-up value is then usable for computing the "or-value": %00000000 00001111 00001111 00000000 (or value) converting the overflow value into the or value is then easy as pie ;) btw, this last trick has a long story: i386 version was by kalms and then peskanov recoded it for ppc, and then I recoded it for m68k. yes i almost forgot... maybe this shifting way is faster on 060 (1cycle and pairable, so 1cycle per 2 pixels) than on 030 (4cycles)

A.D.A. Amiga Demoscene Archive, Version 3.0