|
Author |
Message |
Kalms
Member |
z5: put up an archive containing all files necessary for assembling, and me or someone else will test it out shortly.
|
z5_
Member |
I've making a archive with the necessary code to explain my problem/question. If any experienced coder wants to spend a bit of time going through my code and help me out, drop me a mail (see info page on a.d.a. for my email). I don't want to put it up for general download here because i don't think my code sets an example.
The code + demosystem is quite small and there's nothing difficult going on so it should be easy. I'll comment it as much as possible.
I'm just too curious if there isn't a solution.
|
z5_
Member |
One for the "good coding practise": assuming that you have a subroutine (for example a circle drawing subroutine) which uses almost all data and adress registers.
Now you want to put this in a loop to draw several circles. You get the info for the different circles from tables and feed it to the subroutine. Since all my dataregisters are used in my subroutine, can i do something like that:
- init loop counter d7
- start loop
- get necessary info for circle
- put d0-d7/a0-a6 on stack
- jump to subroutine "draw circle" (in which my loop counter d7 is used for otehr stuff)
- get d0-d7/a0-a6 from stack
- go to "start loop" until loopcounter d7 is done
So basically, i'm using the stack to store my d7 loop counter.
It works but is there a better way to handle this?
|
ZEROblue
Member |
There's nowhere else you can save the registers but in RAM, and the stack is pretty ideal since it's used frequently and will probably provide more cache hits than saving the variables in another nearby memory location will.
Some would say it's better to have the subroutine itself save all registers it touches, or at least the ones that are not used for input, so the user doesn't have to bother doing this at every calling occasion.
I guess you could reduce the cost a bit by only saving the registers whose values are of concern to you, even if the subroutine itself touches all registers.
|
noname
Member |
To support zeroblue's argument, some say that it is the AmigaOS convention that d0/d1/a0/a1 are regarded as scratch registers when calling an OS subroutine and d2-d7/a2-a6 are always safe. You can make your life a bit easier if you just inherit this convention in your own code most of the time and only break it if you have a good reason.
Also, did I already mention the "How To Code" guide?
|
Rebb
Member |
Having problems with Asmone/Asmpro debugger on real Amiga 500 (KS 1.3/2.0, A590 + 2 MB fast).
Entering debugger will give following:
asm-pro v1.14 blank grey screen
asm-one v1.20 Address error at $00240810 Accessing $000149E7 Type WI1 instruction $22cb
Any ideas what is causing this? Debugger should also work on 68000 machines?
Also tried asm-one_v1.49 release candidate 2 which crashes after screen mode selector (error #80000003)
|
ZEROblue
Member |
It seems the v1.20 debugger will not run on anything less than a 68020. I really don't know what causes the other errors, but I'm guessing 68020 and/or KS 3.x is required.
|
z5_
Member |
@rebb: if you need more help, you can always make a separate thread to give the question more attention from visitors.
@all:
There seems to be a problem with my palette fade routine. It works on certain occasions, i fails at others. It has always been like that.
Here's the code for blue:
; work on blue
move.b 3(a0),d0 ;source B value
move.b 3(a1),d1 ;destination B value
move.b d1,d2
sub.b d0,d2
subx.b d3,d3
eo r.b d3,d2
sub.b d3,d2
cmp.b d2,d4 ;absolute delta > fade speed?
bls.s .delta_blue_done ;no => ok to use fade spe ed
move.b d2,d4 ;use delta as fade speed
.delta_blue_done
cmp.b d1,d0
beq.s .blue_done ;source=dest
blo.s .blue_fade_in ;source<dest
sub.b d4,d0 ;source>dest
bra.s .blue_new_value
.blue_fade_in:
add.b d4,d0
.blue_new_value
move.b d0,3(a0)
.blue_done
a0= source, a1= destination, d4= delta fade
So basically, i'm first calculating the absolute value of the delta between source and destination. If that delta is bigger than the fade delta that i want, then limit the fade delta (for example, if the value between source and destination is 20 and i need to fade in steps of 30, then limit this to 20 as to not "overtake" my destination). Then i check what i need to do with that fade delta: add or substract according to the fade direction i need to take to reach the destination.
I'm doing the same with red and green.
I now have a palette with a couple of colors ranging from lighter to darker and i'm fading in from white. The darkest colors turn white sometimes during the fade while the lighter colors fade ok. In the end, each color reaches it's correct destination.
I have the feeling that the routine is far too complicated for what it does :)
|
Kalms
Member |
The algorithm itself looks sound. d4 is getting modified though; you are reloading that register for each computation, right?
|
z5_
Member |
I am reloading d4. But thanks for pointing out that the routine should work. That narrows down the search scope in finding the bug :)
|
d0DgE
Member |
Hello folk,
my special friend and I have got some communication issues again ;)
This time it's the cookie cut.
The story so far:
I wrote some routines to first draw a dotted circle, fill the little prick
with the blitter to a different chipbuffer, only to finally cookie-blit
the result onto the final screenbuffer...
All worked alright until the cookie setup, which looks as follows:
_ccWidth: dc.w ; object width in multiples of 16
_ccHeight: dc.w ; object height
_ccSize: dc.w ; calculated bltsize value
screenWidth equ 40
COOKI equ $fca ; use & Minterm
cookieCut:
; X in word steps => d0
; Y x screenWidth => d1
; object mask adr => a0
; object source adr => a1
; merge source => a2
; destination => a3
add.l d1,a2 ; add Y to merge src
add.l d1,a3 ; add Y to destination
add.l d0,a2 ; add X to merge source
add.l d0,a3 ; same for destination
; determine source modulo
move.w _ccWidth(pc),d0
move.w d0,d1
moveq #screenWidth,d2
lsr.w #3,d0
sub.w d0,d2
; enter addresses to DMA channels
bsr WaitBlit
move.l a0,$50(a6) ; mask => bl tApth
move.l a1,$4c(a6) ; the object => bl tBpth
move.l a2,$48(a6) ; merge src => blt Cpth
move.l a3,$54(a6) ; destination => b ltDpth
; rest of the blitshit
move.w d2,bltAmod(a6)
move.w d2,bltBmod(a6)
move.w d2,bltCmod(a6)
move.w d2,bltDmod(a6)
move.w #COOKI,$40(a6) ; bltcon0
clr.w $42(a6) ; bltcon1
move.l #$ffffffff,$44(a6) ; first & las t word mask
move.w _ccSize(pc),d2
; punch it
move.w d2,$58(a6) ; bltsize
rts
The result on the screen works only partially i.e. the filled circle
appears "striped", only every second line of the circle makes it to the screen o_O.
Can someone tell me, what's wrong with the routine ?
Detailed information:
- Mask & object source are the same (is that ok?)
- Merge and destination are the same
- All the buffers are 320 bits wide (that's why the modulos are set)
|
d0DgE
Member |
oops, forgot the values for the labels...
_ccWidth: dc.w 64
_ccHeight: dc.w 64*screenWidth
_ccSize: dc.w (160*64)+(screenWidth/2)
|
ZEROblue
Member |
You've got your modulos mixed up. You're blitting 320 pixel wide source data but your modulos are based on ccWidth which is 64.
Also, since both your source and mask are the exact same and you just want to OR the circle into the bitplane, there's no need to use a third channel for a mask as it will just slow the operation down by another 33%. Instead you could set source to A, B and D to the destination, and use the $FC minterm.
You might want to change clr.w $42(a6) also since it will cause both a read and a write on the 68000, not that I've ever heard of weird effects from reading from the write only blitter registers though.
EDIT: actually it's your bltsize value which is mixed up to be precise. You've set the width to screensize when you were supposed to set it to ccWidth.
|
d0DgE
Member |
...see, another pair of eyes looking at the code you've stared at for hours just can make the difference :)
It was the blitsize of course ... thanks ZEROblue. All shiny again.
|
z5_
Member |
another general question thingie.
When you paste a chunky brush onto a chunky background, you usually want the background color in the brush to be transparant (not plotted). I usually end up with a loop checking if the color in my brush is 0. If not 0, then put the pixel on top of my background. If 0, then skip to the next pixel in the brush.
However, since i need to check each pixel, it usually amount in a considerable amount of loops (= number of pixels in the brush).
Is there any cunning trick to reduce the number of loops and convert the routine in something word or longword size, taking two or four pixels into consideration?
|
ZEROblue
Member |
You can remove the branches and reduce the number of loops if you read the background and process 4 pixels at once, though it implies some additional overhead for unaligned cases, and if it's faster or not probably depends on what system it runs on.
; brush in D0
; background in D1
rept 4
tst.b d0
seq d2
ror.l #8, d0
ror.l #8, d2
endr
and.l d2, d1
or.l d1, d0 ; final pixels in D0
|
dalton
Member |
z5: I've actually been working with the same problem lately.
I think a good solution is to pack the sprite so that blank pixels at the beginning and end of rows are excluded. So for each row you would store a "modulo" and a pixel count (and the actual pixels of course).
|
ZEROblue
Member |
The pixel count works though it needs to be improved for arbitrary graphics, with a single pixel count it will only work for convex shapes.
If you've got large graphics, a good method might be to break the graphics down into rows of one or more offsets with pixel counts, do the necessary masking for the beginning and end of the segment, and do 32 bit transfers in between.
|
Kalms
Member |
Encoding a sprite as a set of "jump forward X bytes, then write the next Y bytes there, and here are the byte values to copy" commands is commonly referred to as Run-Length Encoding (RLE).
It works well for sprites which have large contiguous blocks of non-transparent pixels. Less so if you have a lot of transparent<->nontransparent transitions (worst case: checkerboard pattern). For lots of transitions you are better off doing some sort of masked blit (which is what you were doing previously).
Two ways of performing masked blit 4 pixels at a time:
1) Use only 128 colours, both in the source sprite and the destination buffer. In the source sprite, highest bit set in a byte means that the pixels is non-transparent. THe 7 lower bits carry the color of the pixel.
To figure out how the algorithm works, convert it to .b operations and study what it does on just a single byte.
move.l (a0)+,d0
move.l (a1),d1
move.l d0,d2
l sr.l #7,d2
and.l #$01010101,d2
or.l #$80808080,d 2
sub.l #$01010101,d2
and.l d2,d1
not.l d2
and .l d2,d0
or.l d1,d0
move.l d0,(a2)+
2) Use 256 full colours both for the sprite and the destination buffer. For each pair of 4 pixels, pre-compute a 4-bit mask value which indicates which pixels are transparent and which are not.
Also, create a 16-entry table which expands the mask values into masks.
move.l (a0)+,d0 ; Fetch 4 pixels
move.b (a1)+,d1 ; Fetch mask value
; (0..15) for pixels
move.l (a2),d2 ; Fetch 4 pixels
; from destination buffer
move.b (a3,d1.l*4),d3 ; Expand mask value
; into actual mask
and.l d3,d0 ; ... apply ...
not.l d3
and.l d3,d2
or.l d2,d0 ; ... done.
move.l d0,(a2)+
maskExpansionTable:
dc.l $00000000
dc.l $000000FF
dc.l $0000FF00
dc.l $0000FFFF
dc.l $00FF0000
dc.l $00FF00FF
dc.l $00FFFF00
dc.l $00FFFFFF
dc.l $FF000000
dc.l $FF0000FF
dc.l $FF00FF00
dc.l $FF00FFFF
dc.l $FFFF0000
dc.l $FFFF00FF
dc.l $FFFFFF00
dc.l $FFFFFFFF
|
z5_
Member |
Just a small question. I've got a new pc with amigaforever 2009. Copied all my old amigaforever files over to the new pc. I'm now trying to compile my "old sources" in asm-one but it isn't working. It has been a while so i have forgotten how it's all supposed to work.
All my intro files are under a dir (work:sources/intro2) and i also have a data dir there (work:sources/intro2/data). In the code, there is an incdir "work:sources/intro2/" and at the bottom of the code are the incbin statements (incbin "data/logo.pal") but when i compile, asmone is giving me: file error on "data/logo.pal".
Any idea? It would be nice if i could at least compile those two intros again for nostalgia reasons :)
|
dalton
Member |
this might be a long shot but did you try removing the trailing slash from the incdir statement?
|
Rebb
Member |
Didn't want to start new thread to this, so putting it here (Do we have "how did they do this" thread already?)
Puzzled about "twirl" effect,as found on many Speedo coded haujobb demos/intros. Easily seen in Haupex screenshot 3 and in Radikal screen 6. Is this just some table tweaking? Or how it is done?
|
noname
Member |
Hi Rebb, off the top of my head the effect in Haupex screenshot #3 is a movetable and the effect in Radikal screenshot #6 is interference.
|
ZEROblue
Member |
Rebb,
The effect in Radikal is simply 2+2 bitplanes moving around eachother. Each group of 2 bitplanes contain a static spiral-like pattern, and the palette is set up so the colors from the two layers combine into an interesting gradient.
The black and white circles from the same demo, the circle effects in State of the Art, and the smooth looking blobs and Commodore logo effect in Lola by Fresh Prince, are all the same effect but with different color combinations and bitplane patterns.
The effect in Haupex is what's called a "plane deformation" and is basically a transformation/displacement of coordinates.
You start with a pixel-by-pixel copy of a texture and then insert new steps to alter the coordinate used for the texture lookup according to some formula, f.ex:
u = sin(x)*y v = cos(y)*x screen(x, y) = texture(u+xoffset, v+yoffset)
Usually the transformation is pretty costly so you precalculate it and store in a lookup table like you said. You can combine several transformations to get more interesting effects like the ones in Humus IV, or use an image as a displacement map to f.ex do the fake lightsourced bumpmaps.
|
ZEROblue
Member |
Here are source files and exes for two simplified examples of these effects: http://filebin.ca/bnkmhj/dispblob.lha
|
Rebb
Member |
ZEROblue: Here are source files and exes for two simplified examples of these effects Many thanks! Will study these sources!
|
z5_
Member |
Out of curiosity, can somebody explain how this is done: I love this effect, especially in that demo where it is ued in various scenes.
|
Kalms
Member |
the particles are rendered with additive blending, and the addition saturates at the max color value.
If you had a 32bit ARGB framebuffer you would do, for each pixel that you draw:
dest.R = min(source.R + dest.R, 255) dest.G = min(source.G + dest.G, 255) dest.B = min(source.B + dest.B, 255)
In the case of an 8bit indexed framebuffer, if the palette is monochromatic (it has the same hue/saturation, only the intensity changes ... i.e. it is just a gradient between two colours), then you can do it similarly:
dest = min(source + dest, 255)
but if the palette contains multiple hues/saturations, then you have to precompute all possible combinations to get a good result:
dest = brightnesslookup[src][dest]
That is, you have precomputed, "for every pair of palettized colors, if I would add their corresponding RGBs together and clamp the result against white, which palette colour would match the resulting RGB best?"
|
|
|