A.D.A. Amiga Demoscene Archive

        Welcome guest!

  

  

  

log in with SceneID

  

Demos Amiga Demoscene Archive Forum / Coding / WickedOS - public release
 Page:  ««  1  2  3  4  5  
Author Message
z5_
Member
#1 - Posted: 18 Jun 2007 00:26
Reply Quote
;--- include headers
incdir includes:
include wos.i

;--- mandatory init macro
INITWOS

;--- select mode 1 320x200
SETMODE #1,#Buffer,#Cols,#100

;--- c2p it twice (because of triple buffering)
DISPLAY2
DISPLAY2

lp
;--- wait for mouse click
CHECKEXIT
beq lp

;--- quit
EXITWOS


This was the first example to display a picture. I must admit never really understanding how the DISPLAY macro works or why it was written twice, nor do i understand how the double buffering is achieved this way (i understand the double buffering principle, but i don't see how i should swap screens within WOS).

From the manual, it says:
- DISPLAY 1: c2p with triple buffering, with 50 fps framesync
- DISPLAY 2: c2p with triple buffering, with 25 fps framesync
- DISPLAY: c2p with triple buffering, without framesync.

Can't say i understand which one i should take. Should i assume that every c2p effect needs double buffering because of the slowness of c2p? Meaning in short: chunky screenmode => at least double buffering?

On top of that, i don't see how (and if) buffering is done. Shouldn't that need two screenbuffers?

Any help on that matter would be helpful. I've got the feeling that this is indeed the bottleneck i'm searching for.
z5_
Member
#2 - Posted: 18 Jun 2007 00:30 - Edited
Reply Quote
On a last note, i can make out from the docs that a 320*200 screen needs the DISPLAY 2 macro. DISPLAY 1 is for 320*100 screens (i can imagine that the higher the framerate you want to achieve, the lower the "display area" has to be).

So the remaining question is: do i put one or two DISPLAY2's?
doom
Member
#3 - Posted: 18 Jun 2007 02:05
Reply Quote
Ofcourse, i'm beginning to understand that it is quite ridiculous to use c2p and chunky mode (let alone double, triple buffering...) for such simple effects that would run on an A500.

It's not ridiculous at all if the target platform is 060 Amigas. You could write a non-chunky version and spend a lot of time integrating it into the rest of the (chunky) effects, only to find that the CPU ends up idle 98% of the time. It may be satisfying to know that the effect is optimally coded, but it's even more satisfying to optimize something that needs it. ;)

If it looks ok at 25 fps, leave it at 25 fps for now. My advice anyway.
doom
Member
#4 - Posted: 18 Jun 2007 03:15
Reply Quote
The reason for doublebuffering in a hardware-chunky environmant (unlike the Amiga) is to let you build up each frame in an offscreen buffer, so you never show an incomplete frame on screen.

With C2P on the Amiga you wouldn't need double-buffering if you could make your C2P work as the mechanism that put the next fully rendered frame on screen. But C2P is much too slow for that of course. Buffer switching/presenting has to outrun the raster beam or there'll be flicker. So instead, C2P is treated as part of the rendering process: The final step in building each frame is to convert it to a format the hardware can work with, and place it in a buffer in chip memory, so it can be switched in by just passing the address to the display hardware.

In the simplest case, it looks like this:

- render your effect to chunky buffer
- do c2p from chunkybuffer to offscreen planar buffer
- switch pointers so offscreen buffer becomes onscreen buffer and vice versa

And the level 3 interrupt handler OR the copperlist then copies the address of the onscreen buffer to the display hardware during every vblank, so the raster will always paint the last frame that was fully built and C2P'd.

What's lacking there is a delay to make sure that we don't switch buffers twice during one video frame. That'd have you modifying the planar picture currently being displayed and that'd cause flicker. So you introduce a wait loop which checks to make sure that the address of the current onscreen buffer has been passed to the display hardware at least once, because then it's safe to assume that the offscreen buffer is no longer being used.

- render your effect to chunky buffer
- wait for onscreen buffer to start showing <----
- do c2p from chunkybuffer to offscreen planar buffer
- switch pointers so offscreen buffer becomes onscreen buffer and vice versa

That's all there is to frame-synchronized doublebuffering. The next problem is that waiting for a buffer to start showing can take anywhere from 0s to 1/50 s, during which the CPU is just idle.

This is addressed by triple-buffering which works by adding a second offscreen buffer so rather than switching two buffers, you're rotating three. That way, once the next frame up has been C2P'd, you can immediately start working on a new one without waiting. You still have to synchronize it so the framerate won't exceed 50 fps, but you won't have a 49 fps effect degraded to 25 fps as might happen with doublebuffering.

(WOS then apparently has an option to limit framerate to 25 fps, with triplebuffering, that's because a framerate of 25 fps often looks nicer than framerates between 25 and 50.)

Now, depending on how the triplebuffering is implemented, the display is sometimes always "lagging" one frame behind. If DISPLAY2 had only been called once in that example, the picture wouldn't be shown. It'd be left in one of the offscreen buffers as the next frame to be shown (as soon as another one is ready). So the only reason to do buffering twice is to get the picture into the right buffer before entering a wait loop.

Since you repeatedly call DISPLAY2 in your effect, you only need to do it once per frame.

There. Not a whole lot of time to proofread that and confirm that it makes sense. :) Let's assume it does. Goodnight.
z5_
Member
#5 - Posted: 18 Jun 2007 19:25 - Edited
Reply Quote
Great explanation! A couple of things still puzzle me though. I'm always drawing to the same screen buffer, unless WOS magically swaps them for me. In fact, SETMODE #1,#screen,... is the only screenbuffer i write too. How does that fit into this story.

Also, i assume that if you draw without clearing the screen first, you are never drawing on the last (newest) version of the screen, but the one before that?

I could notice the "lagging" one frame behind actually. At one point i had to do something eight times. The result of the last "action" wasn't shown on screen when i changed from 2 to 1 DISPLAY2 macro's. It was only shown at the beginning of the next effect, after calling the DISPLAY2 macro again.

Now about the timing problem, i did some tests on my Amiga. With two DISPLAY2 commands, i could only clear blocks with 10 vbi counts between the start of each effect (so cmp #10,vbi_timer). With one DISPLAY2, i can clear blocks with 5 vbi counts (cmp #5, vbi_timer). Less than 5 and the routine is stuck. Meaning that i still need 5 vbi counts for my effect to finish, which still seems far too much (shouldn't i achieve 2 vbi counts as the fram rate is limited at 1/25s)?
z5_
Member
#6 - Posted: 18 Jun 2007 20:43 - Edited
Reply Quote
Some last test results: i deleted my routine and replaced it with a simple color flash. As i don't need the DISPLAY2 macro for that, i tested it without it first and "cmp #1,vbi_timer" still works (it flashes as i those techno demos :o)). When i add a DISPLAY2, it only works from "cmp #5,vbi_timer".

So regardless of the routine, whenever i use ONE DISPLAY2 command, my effect needs 5 vbi's to finish on my real Amiga. That seems to be the conclusion.

(yes, i'm going on about this but i have feeling that this is very important and that there is no point in going on without at least understanding what is going on).

Edit: with protracker replayer instead of thx, it goes down to "cmp #4,vbi_timer".
z5_
Member
#7 - Posted: 19 Jun 2007 00:12 - Edited
Reply Quote
ok, as a last ultimate test, i deleted all my code and do this:

(assuming that vbi_timer is incremented in vbi)

init:
(shows a picture)
- setmode #1,#screen,...
- DISPLAY2
- DISPLAY2

main:
cmp.w #4, vbi_timer
if equal, then .do_flash
rts

.do_flash
clr.w vbi_timer
flip boolean
if boolean = 0 => make background color black
if boolean = 1 => make background color white
HERE COMES ONE DISPLAY2 FOR MY TEST
rts

go to main

(don't mind the pseude assembler, whatever syntax)

That's all. No music, no vbi routine other than incrementing the counter, nothing else.

Results:
- without DISPLAY2 in my .do_flash routine, the screen still flashes at "cmp.w #1,vbi_timer"
- with DISPLAY2 in my .do_flash routine, the screen stops flashing at "cmp.w #3,vbi_timer" (so cmp #4 is max).

I don't think i can test anything more. It's probably something i don't get with the triple buffering, but the way it is now: 4 vbi timers seem the minimum time required if DISPLAY2 is involved.
noname
Member
#8 - Posted: 19 Jun 2007 00:45
Reply Quote
Without having read all of the previous posts:
- DISPLAY2 takes 2 vertical blanks
- 2x DISPLAY2 takes 2x2=4 vertical blanks

4 vbi timers seem the minimum time required if DISPLAY2 is involved.
This is the case in your example as you called the DISPLAY2 macro twice.

I skimmed over Doom's explanation and I looked spot on. Maybe read it again to understand the principle of triple buffering? It is essential to call DISPLAY2 only once per loop. The reason why I called it twice in the initial example was to show a kind of "optimization":
- Cycle the picture through the offscreen buffers until it gets displayed on the monitor
- Then you don't have to call DISPLAY2 in the loop
This kind of "trick" is probably a bit misleading. I was assuming everybody understood the principle of the triple buffers which is of course not the case.

Apart from that, maybe your problem is buried in the way you control the flow of your routine? I do not understand why you would call the DISPLAY2 macro from a subroutine. Because whatever I did as an effect, I usually ended back at the top level, calling the relevant DISPLAY(1|2) macro, calling CHECKEXIT and doing the loop if allowed to do so.
z5_
Member
#9 - Posted: 19 Jun 2007 00:55 - Edited
Reply Quote
I know it's something trivial but i can't find it and it's really important (i'm getting half of 25 frames/s at the moment).

Look at my previous post. It's all i'm doing (but written in proper asm). Wheter i insert DISPLAY2 in my subroutine or in my mainroutine seems completely the same to me in this case. And for my "4 vbi test", i'm only inserting one DISPLAY2 just before the rts (note that the DISPLAY2 has no reason to be here as i'm just changing colors. It's just there to test the speed difference).

aaarrrrggghhhhh :o)
doom
Member
#10 - Posted: 19 Jun 2007 01:55 - Edited
Reply Quote
I don't think i can test anything more. It's probably something i don't get with the triple buffering, but the way it is now: 4 vbi timers seem the minimum time required if DISPLAY2 is involved.

You probably said this already, but what CPU are you running it on? C2P could take 4/50 s on an 030.

On the 060, you can expect C2P of a 320x256 screen to take very close to 1/50 s, if everything is as it should be. Lots of stuff can slow it down, such as odd-aligned buffers or code, bad display DMA settings (FMODE), or gamma radiation.

About buffers, yes/no. You have to distinguish between the chunky buffer and the 8 bitplanes output by the C2P. The chunky buffer is never shown as it looks in fast memory, it can only be shown from chipmem and after C2P conversion.

Here's diagram! :)

                     .- -> chip buffer    -.
                     |                     |
eff. -> chunky bfr. -+-    chip buffer -> -+-> ra 
ster | | `- chip buffer -' [C2P] [RAMDAC]


The fork thingies are supposed to illustrate how the pointers are rotated to switch between buffers. Hope it helps.
doom
Member
#11 - Posted: 19 Jun 2007 01:58
Reply Quote
Also.. text in [ pre ] tags wraps a little too soon, don't you think. :)
z5_
Member
#12 - Posted: 19 Jun 2007 12:03 - Edited
Reply Quote
The [pre] tags are a bit bogus but it's the only solution i have.

I tested it on 68060. I will post the entire code tonight. That should clear things up and hopefully put an end to this mis(t)ery.
z5_
Member
#13 - Posted: 19 Jun 2007 19:15 - Edited
Reply Quote
Test code.

Can't make head nor tail from it. It seems to depend on where the DISPLAY2 command is or something. The idea behind putting the DISPLAY2 in the subroutine itself was because i remember reading that you only need to do a DISPLAY2 the moment you change something on screen pixelwise (in other words, no DISPLAY2 needed when colors change or nothing changes on screen).

Putting it in the main routine doesn't seem to help either (the ; DISPLAY2).

The strangest thing: sometimes it goes faster, sometimes it doesn't. I managed cmp.w #2 or even #1 at some point.
doom
Member
#14 - Posted: 19 Jun 2007 21:55
Reply Quote
Is "screen" your chunky buffer? The C2P routine copies the image while it converts it, so the source (chunky) buffer doesn't have to be in chip memory, and in fact it'll all run a lot better with the chunky buffer in fast memory.

So change your bss_c into a bss_f and see if that doesn't make a big difference.
z5_
Member
#15 - Posted: 20 Jun 2007 12:43
Reply Quote
It seems to work now. I manage "cmp.w #2,vbi_timer" on my real Amiga. It seems various things were to blame. Putting the buffer in fast memory definately is faster (one vbi_timer) and i would never have found it.

It was frustrating but i've learnt a lot from it. Thanks for all the help.
StingRay
Member
#16 - Posted: 21 Jun 2007 01:07
Reply Quote
So change your bss_c into a bss_f and see if that doesn't make a big difference.

I just want to comment the bss_f thing. I would not use _f at all, that way, on machines with fastram, your section goes into fastram whereas it would still work on machines with chipram only. Probably not very important these days as I suppose everyone has fastram but still nice to know I think. :)
z5_
Member
#17 - Posted: 2 Jul 2007 19:20 - Edited
Reply Quote
There's a gigantic bug somewhere in either my code or wickedos (not likely) or a combination of both or combined with running on winuae.

I assume it's in my code but i haven't got the faintest idea where to look. The error code i get is: "illegal instruction raised at $00000006".

Now, the thing is: it doens't always happen. Usually, when it does, i just need to run the assembled source again, which gives an error aswell. Third time, all is fine again.

When do i get it? that is the trouble: i don't know. Sometimes, i get the error after including a cnop 0,2 (for example to align incbin's). Sometimes, i get it when changing a var to var(pc). Sometimes, i just get it. In the end, i just don't know when or why. I have stripped down all code and started from scratch again, but i never managed to determine when it was introduced in my code.

It's extremly annoying and taking away any confidence in what i have done so far.

One thing is sure: i never had this error when i was doing DISPLAY2 twice and in my subroutines. It all started, i think, when introducing DISPLAY2 into my main loop and doing it once. Strange? yep... i just don't know anymore.

Edit: one thing i forgot to mention: i always get this error before anything is displayed on screen. Somehow makes me believe that it is indeed an error at init.
StingRay
Member
#18 - Posted: 2 Jul 2007 23:56
Reply Quote
Hmm, sounds like you trash random memory. Or, maybe you just forgot to save registers and so you mess up the stack etc. There can be many reasons... I dunno about the DISPLAY2 macro, but try saving all registers before you use the macro and restore them afterwards. Or, maybe the WOS macro expects to have WOSbase in a6 and you didn't supply that? Hard to say without knowing the code.
z5_
Member
#19 - Posted: 11 Sep 2007 11:19
Reply Quote
I've got a question about wos, aimed primeraly at noname but if somebody other could help, then great:

Here is a small routine that is programmed to be used with WickedOS. The routine sets up a screen, makes color 1 white and then moves a square (10*10) around the screen (at the end of the screen, it just continues on the next line at the other side). I always clear the previous block before drawing the next one. I execute the routine 25 times/sec (i count two vbi times).

When i run this, i don't have smooth movement. I see glitches. It is as if the rectangle gets drawn when the screen is been drawn on the monitor. Does anybody know where i'm going wrong.

The routine uses display2, which is a 320*200 screen and if i remember correctly triple buffering.
noname
Member
#20 - Posted: 11 Sep 2007 21:16
Reply Quote
Haven't got time to test your code at the moment. One thing I noticed is that you used a bss_c section which is not supported by WickedOS. Although in the case of your example it shouldn't cause problems if that section went into public memory instead.

Sidenote: you are overdoing it a little with that many different sections. There is no need to put a single variable (effect_timer) into its own section.
z5_
Member
#21 - Posted: 11 Sep 2007 22:44
Reply Quote
bss_c isn't supported by WickedOS? I'm at a loss here... i used it in a previous example on the forum, where for example you reserve a planar screen buffer of two planes to superimpose onto a chunky screen. The planes had to be in chipmem because you were writing (in vbi) their address directly into the bpl-registers...?

What does one have to use to define an area in chipmem then?
noname
Member
#22 - Posted: 12 Sep 2007 01:37
Reply Quote
You should use the ALLOCCHIP macro for this purpose. Then copy the data over from public memory or decrunch directly into the allocated chip memory buffer.
 Page:  ««  1  2  3  4  5  

  Please log in to comment

  

  

  

 

A.D.A. Amiga Demoscene Archive, Version 3.0