A.D.A. Amiga Demoscene Archive

  Welcome guest! Please register a new account or log in

  

  

  

log in with SceneID

  

Demos Amiga Demoscene Archive Forum / Coding / coding tutorial: chunky to planar

 

Author Message
z5_
Member
#1 - Posted: 9 Dec 2004 17:24
Reply Quote
Heard much about it, know nothing about it :)

Somebody can give a general explanation, easy to understand, what c2p is all about?
Cyf
Member
#2 - Posted: 9 Dec 2004 21:04 - Edited
Reply Quote
pc/mac use chunky pixel - amiga use planar (bitplanes) - cd32 have a hardware c2p (akiko)
some effets are more simples in chunky, and others in bitplanes

"refer to various ways of recording graphic information in the memory. The images managed in a computer are in the form of a grid of pixels, each one of them being represented by a number defining its color.
For example, with an image 4 colors:

Amiga records this image in the bitplane mode. I.e. that it's represented by several plans of bits (binary digits 0 or 1).
It's an image 4 colors, therefore each color can be represented by two bits. It there thus 2 bitplans:
00100110 bitplane 0
00101011 bitplane 1
-----------
00302132 result of the binary addition

Now, here is another way of recording this image. they cuts the bits of data in small pieces (chunk):
00 00 11 00 01 10 11 01 = 00302132

It's the principle of the mode chunky pixel.

Each method of recording is perfectly logical, and nobody can say that is better than the other.
However, certain technical aspects cause advantages and disadvantages according to the required goal.
Macintosh and PC use the mode chunky pixel.
Cyf
Member
#3 - Posted: 19 Dec 2004 11:52
Reply Quote
on AmyCoders ( http://membres.lycos.fr/amycoders/ ), there is a good tuto about Chunky to Planar (and lot of others tutos)
krabob
Member
#4 - Posted: 21 Dec 2004 11:49
Reply Quote
mathematically, here is a way to view about it:
the whole problem is about effecting a 90 rotation to a pixel mass.
Imagine you see your bitmap from 'upon' the screen: the Y coord on the scheme below, is the bitplane 'DEPTH'. WE got 8 bitplanes: If we want the "planar" bitmap like this:

00000000
01111110
00000000
00000000
00000110
00000110
00000000

It means we want the 8 chunky byte like this:

00000000
01000110
01000110
01000000
01000000
01000000
00000000

...So it IS a 90 rotation of the bits. How to do it quicly ?

so here is another different problem, it has nothing to see with the previous for the moment: How to invert a bit order in a ... let's say 32 bit integer ?

11100000 01010101 11110000 1000001

We can rotate it, swap its bytes, but how to really invert the bit order ?
It is easy: for a 2^n bitfield, you need "n" swaping at "n" levels. this 5 levels are:

16bit
8bit
4bit
2bit
1bit.

to swap at a 16 bit level, use the asm "swap", for 8/4/2/1 bit use these masks:
11111111 00000000 11111111 00000000
11110000 11110000 11110000 11110000
11001100 11001100 11001100 11001100
10101010 10101010 10101010 10101010

So to make inversion at a level, and-ise your value, make 2 shifts, and re-or-ise the bits, for each level: you will get:

1000001 00001111 10101010 00000111

From record, a level inversion can be done in 6 asm command, with and,or and shifts. so more or less to invert a 32 bit value, 5*6 asm asm lines can be enough. Note the level order the swapping are done doesn't matter.

So what is a C2P ? a 2D application of what is done in 1D with the inversion. chunky2planar do inversion with this kind of masks:

10
01

1100
1100
0011
0011

11110000
11110000
11110000
11110000
00001111
00001111
00001111
00001111

... an this way, the whole bitplane is "90 rotated". The differences between the differents C2P algo are then in the order of the swap and
the synchronisation with video ram writing. (some kalms c2p can do all bit shifting in the time memory is written, bringing a c2p to... classic video copy speed !)
kufa
Member
#5 - Posted: 10 Jan 2005 14:46
Reply Quote
i reaaaaaalllllllllyyyy have to update(well upload) the amycoders website..
Will do that this week if i have time with work..
rload
Member
#6 - Posted: 10 Jan 2005 18:32
Reply Quote
yeah right.. how about the 4k coop ? :)
z5_
Member
#7 - Posted: 29 Dec 2005 12:39
Reply Quote
How do you guys handle fonts in chunky mode? Is there a converter to chunky? Seems like a lot of space waiste to convert a one bitplane font to chunky?

Or do you use a raw bitplane format and insert it into the chunky screen. The one way i can think of doing this would be to test each bit of a font: if bit is 1, then do a OR of chunky screen with #%00010000 (if in this example you want to insert the font in bitplane 5). But that would take a loop of 16*8 (if font is in word format and 8 high) for one simple letter?

Any better ideas?
winden
Member
#8 - Posted: 29 Dec 2005 13:37
Reply Quote
z5, the easiest and fastest way is to just convert the font to chunky and then use a drawing routine to paste the characters onto the chunky buffer prior to c2p. don't worry about space waste since you will pack the exe anyways (stonecracker et all ;) and these bytes will be efficiently packed on disk. on memory you don't have to worry really, even 4 megs of fastmem go a loooooong way for democoding :)
dalton
Member
#9 - Posted: 29 Dec 2005 14:15
Reply Quote
you could use 6 planes for chunky and 2 for fonts... or something like that. Then you could also have free transparancy by palette =)
z5_
Member
#10 - Posted: 29 Dec 2005 15:28 - Edited
Reply Quote
@dalton:
I don't think it is possible with WickedOS. I find that chunky mode does complicate some things a lot though.

There are some fonts converted to chunky included in the WickedOS system. What program can i use to convert my fonts in chunky format?
noname
Member
#11 - Posted: 2 Jan 2006 19:12
Reply Quote
Happy new year everybody!

The font-format in WickedOS is a custom one, i.e. made by myself. I have a converter to generate the correct files from any Amiga font.
noname
Member
#12 - Posted: 3 Jan 2006 10:46
Reply Quote
winden's comment:
winden is absolutely right, you shouldn't care about memory efficiency at this stage. the fonts in chunky pack pretty well. again, please note that chunky in itself is not file-format like iff. to be able to use amiga fonts without having to install any font on the users machine, i let my converter generate a big "texture" (picture) with all characters of the font printed into it. the u/v coordinates to scissor them out of this texture are of course also stored. combining the texture (in chunky format) and the scissor information in a structured way resulted in the wfont format which can be used with examples/font.s
please note that the fonts don't need to be in one colour. the font routine probably needs a bit of tweaking (never used it with multi-colour fonts) but it is absolutely possible and much easier to accomplish 256 colour fonts in chunky mode than in bitplane-mode.

dalton's comment:
dalton mentions a completely different issue but he is also right. if you limit your background/effect/whatever to (0<x<8) bitplanes you have 8-x bitplanes free to draw on top with free palette transparency. i used that a lot, also for scrolling. it fast and efficient and it is of course possible to achieve with wickedos. just write the bitplane pointers in your vertical blanc interrupt.

example:
; ...put this in your vbi...
; this overwrites two bitplane pointers in a 320x200 screen. the new bitmap data is expected to lie in chipmemory, one plane after the other.
; the planespointer must then point to the beginning of that memory.

;--- get the base address of the bitmap data in d0
;--- and the bitmap size in d1
;move.l planespointer,d0 ;excercise for you to allocate chip-memory (!), copy the bitmap data and store the pointer
move.l #320*200/8,d1

;--- redirect some bitplane pointers to our new data.
lea $dff000,a6
move.l d0,bplpt+$18(a6)
add.l d1,d0
move.l d0,bplpt+$1c(a6)
z5_
Member
#13 - Posted: 3 Jan 2006 16:07
Reply Quote
Wow, this is actually neat. Having one or two planes in bitplane format can be usefull in quite a lot of cases. I even could just use my raw format font and copy it straight into one of the upper bitplanes. I suppose that i can't use the blitter anymore for that and have to use the cpu instead for scrollers and copying large blocks of data?

I assume wickedos uses a copperlist for things as color and screenmode. Is there actually a way to manipulate this copperlist? I actually think that some effects are "quite" more simple with a copperlist. Stuff like rasterbars, and anything color related within the screen itself (meaning change of colors at certain positions on the screen).
z5_
Member
#14 - Posted: 4 Jan 2006 12:07
Reply Quote
In the meantime, it's working. I allready used vbihook for my demotimer so i added the bitplane pointers there. I added two planes on top of my chunky screen and my scroller is going into the 7th bitplane, while my "effect" (uhm...) is in my chunky planes. Rather cool stuff... :)
z5_
Member
#15 - Posted: 5 Jan 2006 12:59
Reply Quote
So, what is the first effect that you guys actually coded, chunky wise?

From where i stand, i don't know where to start. Honestly... I mean, things with fonts, such as scrollers, are quite do-able already but real effects... preferably ones which don't need much external files (like pictures, textures and stuff)... And my math knowledge is ... well just say i haven't done any maths since ages.

Any ideas?
noname
Member
#16 - Posted: 5 Jan 2006 13:25
Reply Quote
My first chunky-effect was the fire. Try to think of a clever way to average each pixel from its surrounding neighbours (4 or better all 8 of them). It doesn't require any advanced math-skills and can (and should) be thought through on a piece of paper before coding it.

A few hints (mini-tutorial :)
- you need two chunky buffers
- a fire is a melt that goes one line above (adjust pointers)
- do not divide! (only move,add and shift)
- cache as many sums as possible as you go along
- also use the instruction cache efficiently (which is 256 bytes on 020/030. big enough to unroll a bit. stay under 252 bytes to avoid any cache drops!)

It took me a while to get this one done properly. My routine ran at 50fps on 68030 (in 2x2 160x100) back in the days and can run at 25 fps on a 68060 (in 1x1 320x200).
winden
Member
#17 - Posted: 5 Jan 2006 17:42
Reply Quote
@noname: 256 bytes can be used with no problems provided your loop starts at a cacheline start (ie: the loop address is multiple of 16):

routine:
[blablabla setup]
bra.b loop
cnop 0,16 <---- cachelines are 16bytes wide on 030, 040 and 060
loop_start:
[blablabla loop]
dbra d7,loop_end
rts


I recall having some routines which needed this trick since else it lost a lot of speed on 030 if loop_end - loop_start > 256.
noname
Member
#18 - Posted: 5 Jan 2006 22:46
Reply Quote
@winden: oh cool, didn't know that!
z5_
Member
#19 - Posted: 13 Jan 2006 12:35 - Edited
Reply Quote
I had a bit of time to attempt a fire effect and much too my surprise, i actually ended up with something resembling a fire effect. I soon noticed though that even the crappiest of code can produce a fire effect :) I even ended up with a noise effect at some point :)

Still, most points of the mini-tutorial are still chinese to me and my fire effect only lasts for a few pixels high (after that, it turns into a boring big grey patterns), so here are a bunch of questions:
- why is the last visible line in a 320*200 chunky screen 197? If i put pixels at screen + 320*198, i don't see them anymore (last is 320*197).
- why do i need two buffers? Has this anything to do with doublebuffering? The way i did it: put a pallette from black (color 0) to white (color 63), put some random pixels at the lowest line and keep them changing each "frame", and calculate the pixel value starting from the top. So where does the second buffer come into the story?
- what do you mean with "caching as many sums as you can"?
- i read somewhere that if you average 4 pixels, i should divide them with a value slighty larger than 4 and that is where the whole story begins: dividing, modulo and numbers with comma don't seem that obvious in assembler :)
- what about using the instruction cache efficiently?
- what's the difference between asl and lsl. It isn't obvious in the docs i have. I have the impression that the operation is the same but that the staus flags are different?
- why don't i ever reach black? If i keep shifting the average with 2 (= dividing with 4), at some point i should reach zero, shouldn't i?

Sorry if these questions are a bit lame. Someone has to start somewhere i guess... Getting this effect exactly right could teach me a lot of basics.
noname
Member
#20 - Posted: 13 Jan 2006 14:58 - Edited
Reply Quote
why is the last visible line in a 320*200 chunky screen 197? If i put pixels at screen + 320*198, i don't see them anymore (last is 320*197).
is it like that? it has nothing to do with screens being chunky or not (there is no such property as chunky screens always having 197 lines).
it could be a problem in the copperlist (thus a small bug in wickedos) or in your code. don't bother too much about it at the moment.

why do i need two buffers?
because one is the input buffer and the other one is the output buffer. you wouldn't want to read and write to the same buffer at the same time because it would alter your input data as you process it. doing so on purpose can give interesting feedback effects but that's not what you want now.

what do you mean with "caching as many sums as you can"?
that is for you to find out. it is actually the trick to get it fast! generally speaking, there are certain sums when calculating the average value from the 8 surrounding pixels which can be reused just a bit later. use some checkered paper and a pencil to sketch the algorithm graphically.

i read somewhere that if you average 4 pixels, i should divide them with a value slighty larger than 4 and that is where the whole story begins: dividing, modulo and numbers with comma don't seem that obvious in assembler :)
you don't need neither divides here, nor modulos nor floats

what about using the instruction cache efficiently?
executing code which is already in the cache is faster than having the cpu to fetch it from memory. note that fetching and caching is done transparently for you. but if you prepare your time-critical code in a way (i.e. don't let it exceed 256 bytes for 68020/68030) that it leans towards efficient cache usage, you have just learnt an important pricinple of code optimization.

what's the difference between asl and lsl
read the docs: MC680x0 Reference 1.1

why don't i ever reach black?
this might be because you are working on the same buffer for input and output.

---

actually i would propose you code a one pixel mouse pointer first. this is actually good fun when making a fire effect and is only a few lines of code with the MOUSEX and MOUSEY macros.

and at some point we would like to see your efforts ;)
come on, make the ada logo burn in your program!
z5_
Member
#21 - Posted: 14 Jan 2006 20:13
Reply Quote
I don't think i have the time nor the talent to actually release something someday but i'm still determined to get at least one effect going :)

I now understand why i didn't need two buffers for the fire effect. In the tutorial that i had read, random pixels were placed at the bottom of the screen and the fire effect was achieved by calculating the average of 4 pixels: 3 beneath the pixel and one two lines beneath it. That way i wasn't overwriting my input. However, the tutorial stated that i had to divide with a value slightly higher than 4 or that the fire will continue to the top of the screen, which is the case.

However, i will try to calculate 8 pixels with two buffers. So far i think my routine should look something like this:
do for every pixel
- get pixel from input
- calculate average from 8 pixels surrounding it
- place pixel in output (screen)

I was thinking though: my output actually is the input for the next calculation so it would be cool if i could just swap screens. Like this:
- start
- use buffer 1 as input and write result to buffer 2
- show buffer 2 on screen
- use buffer 2 as input and write result to buffer 1
- show buffer 1 on screen
...

But how do i do this within WickedOs. I tried with SETMODE but i get heavy flickering on the screen.
z5_
Member
#22 - Posted: 8 Feb 2006 16:36
Reply Quote
Haven't touched any assembler since posting my question above but i do wonder if/why my question was so stupid that nobody wants to answer it :o) (noname, where are you? :))
winden
Member
#23 - Posted: 8 Feb 2006 19:57
Reply Quote
a good technique for this case of adjusting values is to make a table. so for example if you make a table "t[x] = max(0,x / 5)" then you can calc the result of averaging with "new = t[old1 + old2 + old3 + old4]", which is faster than doing a divide on 68k machines. as for the problem with double buffers, yes your thinking is accurate and you should be able to do it this way... maybe the code you wrote is showing the same buffer you are updating?
winden
Member
#24 - Posted: 8 Feb 2006 19:58
Reply Quote
btw, maybe the fact that the site was down had any implications in people not answering? :P
noname
Member
#25 - Posted: 9 Feb 2006 10:53
Reply Quote
z5, good that you moved the site. the downtimes on the previous server became too long.

concerning your question. your thinking (buffer1, buffer2, buffer1, ..) is right. and as you alreday noticed you wouldn't use the SETMODE macro in every frame. you have two choices:

(let a2 and a3 contain pointers to the buffers)

1)
- setmode to buffer 1
loop:
- do effect from a2 to a3
- display (would c2p buffer 1)
- swap a2,a3
- CHECKEXIT, beq loop

doing it this way will correctly calculate the effect from one buffer to the other and back. BUT: it will always only display buffer 1 which means you would loose half your frames. this is not what you want.

2)
- setmode to buffer 1
loop:
- do effect from a2 to a3
- lea _wosbase,a0
- move.l a3, mode1ptr(a0)
- display (would c2p the latest buffer (buffer1, buffer2, buffer1,..)
- swap a2,a3
- CHECKEXIT, beq loop

this will allow you to use different buffers without having to use setmode. i have just noted that there isn't a macro for this in wickedos. if i ever make a new version, i will add this.
z5_
Member
#26 - Posted: 19 Jun 2007 13:01
Reply Quote
In general, do modern demos work entirely in chunky mode? Meaning all pics, brushes, fonts,... or is there still a mix of for example chunky for the first 6 bitplanes and bitplane 7 in planar mode for overlay and such?
StingRay
Member
#27 - Posted: 19 Jun 2007 17:41
Reply Quote
Most of them are in 100% chunky I think. Yet, I often mix planar/chunky, specially for scrollers and stuff like that I hate to use chunky mode. The disadvantage is that your demo will not support gfx cards if you hack the hardware directly (copper/blitter raping) and I often heard ppl complaining about that... Then again, I don't really care about that. :)
doom
Member
#28 - Posted: 19 Jun 2007 21:37
Reply Quote
Rule #1 of Amiga coding: learn to ignore the people who think the Amiga has a bright and shiny future ahead of it if we all keep upgrading. ;)
z5_
Member
#29 - Posted: 20 Jun 2007 12:53
Reply Quote
I remember people saying: "nowadays, amiga coders just open a chunky screen and don't use the hardware anymore". At that time, i didn't understand what they meant, now i do.

With my little experience, i don't see chunky mode as evil but as something that is more convenient for some effects. On the other hand, planar mode has some neat advantages in some cases aswell (again, somebody with very little experience talking here). For example, i find it more intuitive to overlay a picture/text/... in the planar format. So why not use the one or the other depending on the effect or on what the programmer prefers.

To me, it's not really that important to know that coders nowadays don't use the amiga hardware anymore. It's not a thing i consider when judging a demo.

On the other hand, i feel strongly about limiting the cpu to 68060 at max. Otherwise, the amiga could just aswell be a pc.
Blueberry
Member
#30 - Posted: 20 Jun 2007 19:06
Reply Quote
The problem with chunky effects on Amiga is not that the hardware does not have a chunky mode. Chunky to planar conversion in itself doesn't take very much time on 060. The problem is the awfully slow chipmem.

If the chipmem had been fast, we could just make our effects write whatever format was convenient (which is, for most modern effects, chunky) and not worry about the time used for the conversion. As it is, most Amiga demos spend a disproportionally large part of their time wasting precious cpu cycles waiting for the chipmem to accept the next piece of data fed to it by the c2p.

To take full advantage of the available cpu resources, you have to merge more calculations than just the c2p in between the chipmem writes. And that is far from convenient.

The coders that just fill up a chunky buffer and then call a c2p to get it on screen are indeed not using the Amiga hardware. Any computer with a chunky buffer could be used for that. Those that carefully fill the time between each chipmem write to utilize the cpu fully are today's hardware hackers. :)

 

  Please register a new account or log in to comment

  

  

  

 

A.D.A. Amiga Demoscene Archive, Version 3.0