A.D.A. Amiga Demoscene Archive

  Welcome guest! Please register a new account or log in

  

  

  

log in with SceneID

  

Demos Amiga Demoscene Archive Forum / Coding / AGA only C2P

 

Author Message
NovaCoder
Member
#1 - Posted: 17 Dec 2009 23:50
Reply Quote
Hiya,

Let me start off by saying that I'm a C++ coder, assembler is something I can spell and that's about it.

I'm working on an AGA only port of a PC VGA game (320x200x8bit) and need a fast C2P routine. I'm currently using the one from ADOOM, can someone have a look at it for me and answer my questions?

[url=http://home.iprimus.com.au/novacoder/c2p_030. s][/url]

Questions:

1) Is it still considered the best way of doing it?
2) Is it really 030 optimised or will it work just as well with an 020?
3) Is it AGA optimised, is there even such a optimisation possible/relevant?
4) Is it optimised for an 8bit 320x200 screen, does it support different resolutions ok?

The one problem I have with the routine at the moment is that it appears there is no way of only updating part of the screen (eg just the dirty rectangle(s)).

I've noticed that some C2P routines use a compare buffer instead of dirty rectangles, what do you guys think about that approach?
Kalms
Member
#2 - Posted: 18 Dec 2009 01:06 - Edited
Reply Quote
1) Depends on your target platform.

For 020 & 030, a hybrid CPU+blitter solution is usually preferable to a pure-CPU solution. You might find that you can get better performance by rendering your chunkypixels in a pre-scrambled format (thereby lessening the amount of work necessary to do during the C2P conversion); that will make your other rendering code more complicated though.

For 68040+, a CPU-only solution will run at full speed, and the only result a CPU+blitter solution will yield is that you waste chipmem bandwidth and artificially limit the max framerate (as compared to a CPU-only solution).

When using a CPU+blitter solution, you are supposed to avoid touching chipmem while the blitter pass is running. If you need to do a lot of chipmem access during that period, you would probably be better off using a CPU-only solution even on 020/030.

It would probably be preferable for you to have two C2P routines (one for 020/030, one for 040/060) and select which to use during startup depending on the current CPU.

2) 030 optimized routines will run well on 020 too.

3) It is "AGA optimized" in the sense that it always performs 32bit writes to chipmem. On OCS/ECS, it would be better to have a routine that performs 16bit writes.

4) It is not optimized for any particular screen resolution. It requires the horizontal resolution to be a multiple of 32 pixels. It supports at least 320x256 pixels output resolution. A brief look indicates that it should support much higher output resolutions too.


As for C2P converting a rectangular area of the screen, download [url=http://amycoding.redline.ru/main/sources/kalm sc2p.lha][/url] and give either c2p/special/c2p_rect.s or c2p/bitmap/c2p1x1_8_c5_bm.s a try. Both are CPU-only solutions because it is tricky to handle the bitmap modulos efficiently in a CPU+blitter solution.

I'm not that much a fan of comparison-based C2Ps, because their worst-case (all 32pix segments have changed) is worse than that of an ordinary C2P. Are you sure that you can't do better with a dirty rectangles-based approach?
NovaCoder
Member
#3 - Posted: 18 Dec 2009 08:38
Reply Quote
Hiya Kalms,

Yep this routine is only meant for 030 or greater.

My code already takes care of detecting which dirty rectangles to draw (or if a full redraw is required) but the problem is that this particular C2P routine only appears to be able to redraw the full screen each time (unless I'm missing something?).

OK. I've re-read your post....looks like you are giving me some routines to do exactly that....thanx for all of the info :)

Thanks,
Chris
NovaCoder
Member
#4 - Posted: 11 Jan 2010 11:48
Reply Quote
Hiya Kalms,

I'm having trouble downloading your source code....is something up with your site?

Chris
Kalms
Member
#5 - Posted: 11 Jan 2010 11:55 - Edited
Reply Quote
it's not my site. seems the archive is available here as well though: http://jpv.wmhost.com/files/temp/kalmsc2p.lha
NovaCoder
Member
#6 - Posted: 11 Jan 2010 23:57
Reply Quote
ok, thanks I've got it :)
NovaCoder
Member
#7 - Posted: 13 Jan 2010 00:42 - Edited
Reply Quote
Hiya Kalms,

I had a look at the code, I think the c2p_rect.s is close to what I need.


To give you some background, this is my currently my main render loop:


static byte *chunkyBackBuffer;
chunkyBackBuffer = (byte*)AllocMem(64000, MEMF_FAST);



void updateBackBuffer(byte *chunkyPixels, int x, int y, int w, int h) {

byte *dst;

dst = (byte*) chunkyBackBuffer + y*320 + x;

do {
CopyMemQuick(chunkyPixels, dst, w);
dst += 320;
chunkyPixels += 320;
} while (--h);
}



This method can be called multiple times for each update of the game 'world' eg. sometimes it is just the dirty rectangles and sometimes it is a full screen redraw (in which case it will only be called once obviously).


I then call this method once per render loop to shown the updated screen.


video_which = 1 - video_which; // render to the hidden bitmap
c2p1x1_cpu3blit1_queue_stub(chunkyBackBuffer, video_raster[video_which]);


A flip task is then generated by the C2P routine to show the updated screen.

This is not a very efficient way of doing things when only part of the screen gets updated and it seems to currently result in about 12fps on my 030 50mhz machine (320x200 8bit).


What I would like to do is something like this:


void updateBackBuffer(byte *chunkyPixels, int x, int y, int w, int h) {

if (gameEngine->fullScreenRedraw) {
c2p1x1_cpu3blit1_queue_stub(chunkyPixels, video_raster[video_which]);
} else {
C2p_Rect(chunkyPixels, video_raster[video_which], x, y, w, h);
}
}


And then manually trigger the screen flip task if I need to each loop.
Kalms
Member
#8 - Posted: 13 Jan 2010 01:59
Reply Quote
1. Performance - throughput

320x200 pixels with cpu3blit1 will take roughly 0.8 frames of CPU processing + 1.6 frames of Blitter processing.
320x200 pixels with an 030 tuned cpu-only routine will take roughly 1 frame of CPU processing.
320x200 pixels with the c2p_rect routine will probably take like 1.8 frames of CPU processing.

So. If you can overlap CPU work (which does not access chipram) with the blitter pass, then the cpu3blit1 will give you the highest number of frames per second. Otherwise, a CPU-only routine is preferable.

Also, notice that since the CPU+blitter routine takes a longer time to complete, the max framerate is lower for that routine than for the CPU-only routines.

2. Performance - latency

The CPU+blitter routine has higher latency than CPU-only routines. Since you're making a game, if the input latency (time before the player's input actions cause response on-screen) is important to you, it might be worthwhile to go with a CPU-only routine even if that would mean having a slightly lower overall framerate.

3. Using c2p_rect with a custom framebuffer

Create a dummy BitMap structure, where the plane ptrs point into your own custom framebuffer, and pass that to c2p_rect.
NovaCoder
Member
#9 - Posted: 13 Jan 2010 02:35 - Edited
Reply Quote
Hiya,

Regarding point 3.

I think I am already using a dummy bitmap (none of this is my code btw):


if ((video_raster[i] = (PLANEPTR)AllocRaster (SCREENWIDTH, video_depth * SCREENHEIGHT)) == NULL) {
error ("AllocRaster() failed");
}

memset (video_raster[i], 0, video_depth * RASSIZE (SCREENWIDTH, SCREENHEIGHT));
InitBitMap (&video_bitmap[i], video_depth, SCREENWIDTH, SCREENHEIGHT);

for (depth = 0; depth < video_depth; depth++) {
video_bitmap[i].Planes[depth] = video_raster[i] + depth * RASSIZE (SCREENWIDTH, SCREENHEIGHT);
}


I've just noticed that you wrote the C2P routine I'm currently using...


; This routine based on Mikael Kalms' 030-optimised CPU3BLIT1
; Mikael Kalms' email address is kalms@vasa.gavle.se


Not sure if this is the latest version though....would I be better off using one from that link you posted eariler?

The problem with c2p_rect is that it says that it's optimized for 040+....is this going to work ok with just an 030?

Also it says that it does C2P between equally sized chunky and destination buffers. In my case my destination buffer will always be 320x200 but my 'dirty rectangle' will be smaller....will this still work ok?

Sorry to ask silly questions....I'm still learning this stuff (only been Amiga coding for a short while).

Chris
Kalms
Member
#10 - Posted: 13 Jan 2010 18:11 - Edited
Reply Quote
point 3:
oops, my bad. the routine just wants a ptr to the beginning of a contiguous set of bitplanes (like the other c2p routines).

040+ optimized c2p routines will work on any 68020+, it's just a matter of performance difference. THe perf figures stated above were for running the routines on a 50MHz 68030.

If you wonder more about the performance, measure it yourself.

yes, c2p_rect.s is able to do what you want it to. Given your function prototype, you should call it sorta like this:

void updateBackBuffer(byte *chunkyPixels, int x, int y, int w, int h) {
if (gameEngine->fullScreenRedraw) {
c2p1x1_cpu3blit1_queue_stub(chunkyPixels, video_raster[video_which]);
} else {
d0 = x & 0xffe0; // round downward to next mod-32 boundary
d1 = y;
d2 = ((x + w + 31) & 0xffe0) - d0; // round upward to next mod-32 boundary
d3 = h;
d4 = 320;
d5 = 320/8;
d6 = 320*200/8;
a0 = chunkyPixels;
a1 = video_raster[video_which];
C2p_Rect(send in register values specified above);
}
}
NovaCoder
Member
#11 - Posted: 13 Jan 2010 23:46
Reply Quote
Cool, thanks for that Kalms :)

 

  Please register a new account or log in to comment

  

  

  

 

A.D.A. Amiga Demoscene Archive, Version 3.0