Author |
Message |
z5_
Member |
@noname: can we expect patches from your productions in full 320*200? (just trying another cunning plan to get you back into coding :o))
|
Kalms
Member |
@noname:
You probably have FMODE set to 64bit fetches. In that mode, DMA will fetch 64 pixels during one fetch cycle. DDFSTRT/DDFSTOP control which fetch cycles are active. In LORES, a new fetch cycle begins every $20 buscycles and takes $8 buscycles (so max 25% bus utilization). With the given DDFSTRT/DDFSTOP values, there are either 5 (= 320 pixels) or 6 (= 384 pixels) fetch cycles performed.
If you have BPLxMOD=0, you have 5 fetches; if you have BPLxMOD=-8 you have 6 fetches. (The -8 is to have the last fetch on the current line fetch the same data, as the first fetch on the next line.)
Adjust DDFSTRT/DDFSTOP such that you see the entire image even with BPLxMOD set to 0, then you have no extraneous DMA fetches.
|
noname
Member |
@kalms: thanks, will try it one day!
@z5: not really worth patching that, eh? ;)
|
z5_
Member |
So basically, you are fetching 64/8 = 8 words into your next line (but not shown on screen because of a 320 wide display window) and thus you need to return 8 words (negative modulo) into your bitplane to start at the second line ?
|
Kalms
Member |
z5: correct.
|
z5_
Member |
The last thing in my c2p/double buffering/320*200/p61 "demosystem" seems to be the double buffering aspect. Thinking about it, my solution would be to define two chunky buffers in fast mem and one in chip mem, all 320*200. The bitplane pointers are intialised on the chip mem buffer once at startup (well, once written to the copperlist and then refreshed by said copperlist because the bitplane pointers are changing when the data is written to the screen, thus need rewriting each time) and remain on that buffer. I would then swap the chunky buffer pointers each loop and feed one to the c2p.
So:
chunky (fast mem) -> c2p -> screen (chip mem)
where chunky is pointing at either one of the chunky buffers (each loop swapped) and my code is outputting to the other one.
I'm sure somebody will have a better solution though :o)
|
TheDarkCoder
Member |
I really would like to write a small tutorial on how to set data fetch start/stop avoiding to waste fetches, but I can't find the time to do some coding, let alone to write tutorials! :-((
anyway, I quickly write the (I believe) optimal formulas for LO-RES case.
Let say you have a screen W pixel wide, and let X be the value in DIWSTRT
(assuming you want to display all pixel you fetch, i.e. no scrolling/resizing of the Display Window)
DDFSTRT = floor(X/2 - 8.5)
DDFSTOP = DDFSTRT + 8*(W/16) - Y
where Y=8 for FMODE=0, Y=16 for FMODE=1 or 2, Y=32 for FMODE=3.
The cases of HI and SH res are a bit different.
To explain those formulas, I would need to write the tutorial :-(
shame on me
|
noname
Member |
@z5: no, that's not right! you want to have 2 buffers in chip mem! you could also sometimes have 2 buffers in fast mem, but that would only be needed by certain effects, not by double buffering itself.
double buffering is needed to prevent visible artifacts that occur when the display hardware fetches data from buffers that are not completly updated, yet. the idea is to first completely draw the buffer (in your case: perform the c2p to chip mem) and then tell the display hardware to show it. more precisely you would do the c2p to chip mem and then swap the pointers to the chip buffers during the next vertical blank.
|
doom
Member |
I usually have about 15 buffers in chipmem. :)
|
doom
Member |
... and about 200 in fastmem.
|
z5_
Member |
I had a look at the amycoders compo entries for double buffering. It's easier to understand due to the name of the pointers:
screen 1 in chipmem (320*200)
screen 2 in chipmem (320*200)
chunky in fastmem (320*200)
screen_logical: dc.l 0
screen_physical: dc.l 0
start program:
move pointer of screen 1 in screen_logical
move pointer of screen 2 in screen_physical
main:
- chunky -> c2p -> screen_logical
- swap pointers
=> screen_physical becomes screen_logical
=> screen_logical becomes screen_physical
- init bitplane pointers to screen_physical (=> shown on screen)
- execute own code and always write to chunky
loop to main
Does this make sense? And how does (or should) this relate to the vertical blank?
|
ZEROblue
Member |
That's the way to do it, though personally I always found the physical-logical terminology a bit confusing, and used front-back to describe what's seen and what's in the back being drawn on.
If you never draw directly to the visible screen and use the copper list to set the bitplane pointers, which you should, then you don't need to wait for the VBL to do the swapping, any line is good with possibly the exception of that one raster line where the copper list will execute the fetches to set the bitplane pointers, in the very remote case that your code is writing one register while another is being fetched, and thus get an erroneous bitplane pointer for that frame.
Waiting for the VBL or any other fixed raster line at some point in your code will drop your effective frame rate, sometimes significantly, but stabilize it.
|
noname
Member |
correct. use the vbl to get the smooth look.
|
doom
Member |
- chunky -> c2p -> screen_logical
- swap pointers
You need a delay in between there (or right afterwards) to avoid potential nasty flicker. But a delay is wasted CPU time, which you really don't want if you're going for a high framerate. Enter triplebuffering:
screen_1 in chipmem
screen_2 in chipmem
screen_3 in chipmem
chunky in fastmem
screen_showing, ponter to screen_1 initially
screen_upcoming, ponter to screen_2 initially
screen_previous, ponter to screen_3 initially
mainloop:
- render effect into chunky
- C2P chunky -> screen_upcoming
- delay (see [1] below)
- rotate buffers:
screen_showing <- screen_upcoming
screen_upcoming <- screen_previous
screen_previous <- screen_showing
- effect of rotation will be seen at the beginning of next VBL but you can start writing to the new screen_upcoming straight way
- goto mainloop
At any given time, screen_showing is the address of the next frame that will be fetched by the display hardware. You can do that either by loading the address in the VBL interrupt, or by updating your copperlist with this address right after rotating the buffers.
Since we want to allow buffer swap to occur at any time (except in the middle of register update, good reason for not using the copper), we need to protect the previous screen_showing buffer for up to one frame, which is what screen_previous is for. For this reason the delay [1] needs to wait until the beam is done with screen_previous. The easy way to do that:
.delay
move.l screen_last_started, d0
cmp.l screen_showing, d0
bne.b .delay
And somewhere in the VBL interrupt:
move.l screen_showing, screen_last_started
Basically doublebuffering will only allow framerates of 50/n where n is integer. Which means if your effect could run at 49 FPS, you'll see it running at 25 FPS instead. If it's 24 FPS it'll be limited to 16.7 FPS, and so on. Triplebuffering doesn't have that issue.
On the other hand with doublebuffering the framerate isn't just lower, it's also steadier.
On the third hand, triplebuffering can become slower and steadier too if you just move the delay to right after the rotation instead of right before it.
So there.
|
z5_
Member |
I compared my code in the c2p/double buffering setup with exactly the same code i had in wickedos (c2p/triple buffering) and it definately looks smoother/steadier in wickedos... fascinating.
|
hiphop
Member |
ZEROblue
i tried to compil your sample code effects...but 'ive an error like this : indefined symbol
9 move.w d0,bplpt+6 (for the 3D sierpinski exmple...à
|
d0DgE
Member |
hmm... I'd say either you have some Case Sensitivity settings breaking in during the pass or the entire label "bplpt" is missing. A close look down to the copperlist where the bitplane pointers are declared or setting the ASM editor to "non case sensitive" in the preferences might help.
|
z5_
Member |
Why doesn't this work:
moveq #0,d5
moveq #0,d7
move.b param,d5
move.b d5,d7
subq.b #1,d7
.loop
blabla
sub.w d5,a0
dbra d7,.loop
param: dc.b 10
Whereas this does work:
move.w d5,d7
subq.w #1,d7
@hiphop: sorry to interrupt your question. Keep asking here or even make a separate topic if you remain stuck.
|
d0DgE
Member |
@z5_: don't know what's wrong with it. I just tested the byte sized version and it worked ... well, it worked after I changed the loop label that it is not a local anymore. Did you get any error messages while you used byte sized expressions or did the loop ran "longer" than you expected ;)
|
Kalms
Member |
hiphop: Yes, dodge is correct; the source needs to be assembled with case sensitivity turned off.
The following line:
... refers to a symbol named "Bplpt"...
... and the following line:
BplPt dc.l $00e00000,$00e20000
... declares a symbol named "BplPt".
Notice the difference in case.
|
z5_
Member |
I forgot the most interesting part (see the code inside the loop) although i still can't understand why this doesn't work.
|
d0DgE
Member |
awww...that's why it DID work for me ...ehehe I forgot to add the fatal line "sub.w d5,a0" xD ... well I'm writing some precalc ATM ... and I think it's time to go to bed now. FWIW, In such a case as you described it I wouldn't mind using words here to be on the safe side.
|
z5_
Member |
I still don't see the error in my example. I could understand it not working when param is negative (due to the sign bit) but not when param remains 10. Anyone?
|
bigJz
Member |
have you tried with dbeq ?
|
Kalms
Member |
z5: Your example should work (both the .b and .w versions). Show us the entire function.
|
z5_
Member |
moveq #64-1,d6
.next_rect
move.l a4,a0
move.b (a1)+,d0
move.w (a2)+,d1
move.w (a3)+,d2
move. w d2,d4
mulu.w #320,d2
add.l d2,a0
add.w d1,a0
cmp.w a5,d4
blt.s .outline
.no_outline
move.w d 5,d7
subq.w #1,d7
.next_line
move.w d5,d4
subq. w #1,d4
.next_pixel
move.b d0,(a0)+
dbra d4,.nex t_pixel
add.w #320,a0
sub.w d5,a0
dbra d7,.next _line
dbra d6,.next_rect
with d0: color, d1: x-coord, d2: y-coord
draws a grid of rectangles btw
arrggg... that pretext thingie sucks
|
Kalms
Member |
The setup code for d5/a1-a3/a5 is not included in there. The issue is probably with some portion of a register that isn't getting cleared at the appropriate time.
You can run this code through the debugger. At the very beginning of your program (even before demosystem-init), load registers with suitable init-values, and then insert a jump to the function. Assemble, and run in debugger. Then you can single-step through the algorithm. You won't be able to see what happens on-screen but you can see what happens in the registers.
|
z5_
Member |
moveq #0,d0
moveq #0,d1
moveq #0,d2
moveq #0,d 3
moveq #0,d4
moveq #0,d5
moveq #0,d6
moveq #0 ,d7
lea grid_color,a1
lea grid_x,a2
lea grid_y ,a3
lea chunky,a4
move.w grid_outline_start_y(pc ),a5
move.b grid_block_size(pc),d5
This preceeds "moveq #64-1,d6".
From the moment that i change move.w d5,d7 into move.b d5,d7, it doesn't work anymore. Btw, what happens with a dbra when the register is negative?
Note that this isn't a really important issue. I could use it as it is. However, i can learn something from finding out. Avoiding stuff isn't always the best solution for learning purposes :o) I'll try the debugger.
|
Blueberry
Member |
The dbra instruction counts word-sized. After the final iteration of a dbra loop, the counter register contains the value $ffff.
If you only write to the lower byte of d7, then the upper byte of the word will, on the next iteration of the d6 loop, still contain $ff, causing the d7 loop to loop $ff0a (65290) times instead of ten.
|
d0DgE
Member |
that's what I meant with "did the loop ran 'longer' than you expected ;)"
I experienced similar problems (as almost all did sometimes, I guess).
Thus, stick to word sized counters :)
|