A.D.A. Amiga Demoscene Archive

Amiga Demoscene Archive Forum / Coding / coding tutorial: chunky to planar

Author	Message
z5_ Member	#1 - Posted: 9 Dec 2004 17:24 Reply Quote Heard much about it, know nothing about it :) Somebody can give a general explanation, easy to understand, what c2p is all about?
Cyf Member	#2 - Posted: 9 Dec 2004 21:04 - Edited Reply Quote pc/mac use chunky pixel - amiga use planar (bitplanes) - cd32 have a hardware c2p (akiko) some effets are more simples in chunky, and others in bitplanes "refer to various ways of recording graphic information in the memory. The images managed in a computer are in the form of a grid of pixels, each one of them being represented by a number defining its color. For example, with an image 4 colors: Amiga records this image in the bitplane mode. I.e. that it's represented by several plans of bits (binary digits 0 or 1). It's an image 4 colors, therefore each color can be represented by two bits. It there thus 2 bitplans: 00100110 bitplane 0 00101011 bitplane 1 ----------- 00302132 result of the binary addition Now, here is another way of recording this image. they cuts the bits of data in small pieces (chunk): 00 00 11 00 01 10 11 01 = 00302132 It's the principle of the mode chunky pixel. Each method of recording is perfectly logical, and nobody can say that is better than the other. However, certain technical aspects cause advantages and disadvantages according to the required goal. Macintosh and PC use the mode chunky pixel.
Cyf Member	#3 - Posted: 19 Dec 2004 11:52 Reply Quote on AmyCoders ( http://membres.lycos.fr/amycoders/ ), there is a good tuto about Chunky to Planar (and lot of others tutos)
krabob Member	#4 - Posted: 21 Dec 2004 11:49 Reply Quote mathematically, here is a way to view about it: the whole problem is about effecting a 90° rotation to a pixel mass. Imagine you see your bitmap from 'upon' the screen: the Y coord on the scheme below, is the bitplane 'DEPTH'. WE got 8 bitplanes: If we want the "planar" bitmap like this: 00000000 01111110 00000000 00000000 00000110 00000110 00000000 It means we want the 8 chunky byte like this: 00000000 01000110 01000110 01000000 01000000 01000000 00000000 ...So it IS a 90° rotation of the bits. How to do it quicly ? so here is another different problem, it has nothing to see with the previous for the moment: How to invert a bit order in a ... let's say 32 bit integer ? 11100000 01010101 11110000 1000001 We can rotate it, swap its bytes, but how to really invert the bit order ? It is easy: for a 2^n bitfield, you need "n" swaping at "n" levels. this 5 levels are: 16bit 8bit 4bit 2bit 1bit. to swap at a 16 bit level, use the asm "swap", for 8/4/2/1 bit use these masks: 11111111 00000000 11111111 00000000 11110000 11110000 11110000 11110000 11001100 11001100 11001100 11001100 10101010 10101010 10101010 10101010 So to make inversion at a level, and-ise your value, make 2 shifts, and re-or-ise the bits, for each level: you will get: 1000001 00001111 10101010 00000111 From record, a level inversion can be done in 6 asm command, with and,or and shifts. so more or less to invert a 32 bit value, 5*6 asm asm lines can be enough. Note the level order the swapping are done doesn't matter. So what is a C2P ? a 2D application of what is done in 1D with the inversion. chunky2planar do inversion with this kind of masks: 10 01 1100 1100 0011 0011 11110000 11110000 11110000 11110000 00001111 00001111 00001111 00001111 ... an this way, the whole bitplane is "90° rotated". The differences between the differents C2P algo are then in the order of the swap and the synchronisation with video ram writing. (some kalms c2p can do all bit shifting in the time memory is written, bringing a c2p to... classic video copy speed !)
kufa Member	#5 - Posted: 10 Jan 2005 14:46 Reply Quote i reaaaaaalllllllllyyyy have to update(well upload) the amycoders website.. Will do that this week if i have time with work..
rload Member	#6 - Posted: 10 Jan 2005 18:32 Reply Quote yeah right.. how about the 4k coop ? :)
z5_ Member	#7 - Posted: 29 Dec 2005 12:39 Reply Quote How do you guys handle fonts in chunky mode? Is there a converter to chunky? Seems like a lot of space waiste to convert a one bitplane font to chunky? Or do you use a raw bitplane format and insert it into the chunky screen. The one way i can think of doing this would be to test each bit of a font: if bit is 1, then do a OR of chunky screen with #%00010000 (if in this example you want to insert the font in bitplane 5). But that would take a loop of 16*8 (if font is in word format and 8 high) for one simple letter? Any better ideas?
winden Member	#8 - Posted: 29 Dec 2005 13:37 Reply Quote z5, the easiest and fastest way is to just convert the font to chunky and then use a drawing routine to paste the characters onto the chunky buffer prior to c2p. don't worry about space waste since you will pack the exe anyways (stonecracker et all ;) and these bytes will be efficiently packed on disk. on memory you don't have to worry really, even 4 megs of fastmem go a loooooong way for democoding :)
dalton Member	#9 - Posted: 29 Dec 2005 14:15 Reply Quote you could use 6 planes for chunky and 2 for fonts... or something like that. Then you could also have free transparancy by palette =)
z5_ Member	#10 - Posted: 29 Dec 2005 15:28 - Edited Reply Quote @dalton: I don't think it is possible with WickedOS. I find that chunky mode does complicate some things a lot though. There are some fonts converted to chunky included in the WickedOS system. What program can i use to convert my fonts in chunky format?
noname Member	#11 - Posted: 2 Jan 2006 19:12 Reply Quote Happy new year everybody! The font-format in WickedOS is a custom one, i.e. made by myself. I have a converter to generate the correct files from any Amiga font.
noname Member	#12 - Posted: 3 Jan 2006 10:46 Reply Quote winden's comment: winden is absolutely right, you shouldn't care about memory efficiency at this stage. the fonts in chunky pack pretty well. again, please note that chunky in itself is not file-format like iff. to be able to use amiga fonts without having to install any font on the users machine, i let my converter generate a big "texture" (picture) with all characters of the font printed into it. the u/v coordinates to scissor them out of this texture are of course also stored. combining the texture (in chunky format) and the scissor information in a structured way resulted in the wfont format which can be used with examples/font.s please note that the fonts don't need to be in one colour. the font routine probably needs a bit of tweaking (never used it with multi-colour fonts) but it is absolutely possible and much easier to accomplish 256 colour fonts in chunky mode than in bitplane-mode. dalton's comment: dalton mentions a completely different issue but he is also right. if you limit your background/effect/whatever to (0<x<8) bitplanes you have 8-x bitplanes free to draw on top with free palette transparency. i used that a lot, also for scrolling. it fast and efficient and it is of course possible to achieve with wickedos. just write the bitplane pointers in your vertical blanc interrupt. example: ; ...put this in your vbi... ; this overwrites two bitplane pointers in a 320x200 screen. the new bitmap data is expected to lie in chipmemory, one plane after the other. ; the planespointer must then point to the beginning of that memory. ;--- get the base address of the bitmap data in d0 ;--- and the bitmap size in d1 ;move.l planespointer,d0 ;excercise for you to allocate chip-memory (!), copy the bitmap data and store the pointer move.l #320*200/8,d1 ;--- redirect some bitplane pointers to our new data. lea $dff000,a6 move.l d0,bplpt+$18(a6) add.l d1,d0 move.l d0,bplpt+$1c(a6)
z5_ Member	#13 - Posted: 3 Jan 2006 16:07 Reply Quote Wow, this is actually neat. Having one or two planes in bitplane format can be usefull in quite a lot of cases. I even could just use my raw format font and copy it straight into one of the upper bitplanes. I suppose that i can't use the blitter anymore for that and have to use the cpu instead for scrollers and copying large blocks of data? I assume wickedos uses a copperlist for things as color and screenmode. Is there actually a way to manipulate this copperlist? I actually think that some effects are "quite" more simple with a copperlist. Stuff like rasterbars, and anything color related within the screen itself (meaning change of colors at certain positions on the screen).
z5_ Member	#14 - Posted: 4 Jan 2006 12:07 Reply Quote In the meantime, it's working. I allready used vbihook for my demotimer so i added the bitplane pointers there. I added two planes on top of my chunky screen and my scroller is going into the 7th bitplane, while my "effect" (uhm...) is in my chunky planes. Rather cool stuff... :)
z5_ Member	#15 - Posted: 5 Jan 2006 12:59 Reply Quote So, what is the first effect that you guys actually coded, chunky wise? From where i stand, i don't know where to start. Honestly... I mean, things with fonts, such as scrollers, are quite do-able already but real effects... preferably ones which don't need much external files (like pictures, textures and stuff)... And my math knowledge is ... well just say i haven't done any maths since ages. Any ideas?
noname Member	#16 - Posted: 5 Jan 2006 13:25 Reply Quote My first chunky-effect was the fire. Try to think of a clever way to average each pixel from its surrounding neighbours (4 or better all 8 of them). It doesn't require any advanced math-skills and can (and should) be thought through on a piece of paper before coding it. A few hints (mini-tutorial :) - you need two chunky buffers - a fire is a melt that goes one line above (adjust pointers) - do not divide! (only move,add and shift) - cache as many sums as possible as you go along - also use the instruction cache efficiently (which is 256 bytes on 020/030. big enough to unroll a bit. stay under 252 bytes to avoid any cache drops!) It took me a while to get this one done properly. My routine ran at 50fps on 68030 (in 2x2 160x100) back in the days and can run at 25 fps on a 68060 (in 1x1 320x200).
winden Member	#17 - Posted: 5 Jan 2006 17:42 Reply Quote @noname: 256 bytes can be used with no problems provided your loop starts at a cacheline start (ie: the loop address is multiple of 16): routine: [blablabla setup] bra.b loop cnop 0,16 <---- cachelines are 16bytes wide on 030, 040 and 060 loop_start: [blablabla loop] dbra d7,loop_end rts I recall having some routines which needed this trick since else it lost a lot of speed on 030 if loop_end - loop_start > 256.
noname Member	#18 - Posted: 5 Jan 2006 22:46 Reply Quote @winden: oh cool, didn't know that!
z5_ Member	#19 - Posted: 13 Jan 2006 12:35 - Edited Reply Quote I had a bit of time to attempt a fire effect and much too my surprise, i actually ended up with something resembling a fire effect. I soon noticed though that even the crappiest of code can produce a fire effect :) I even ended up with a noise effect at some point :) Still, most points of the mini-tutorial are still chinese to me and my fire effect only lasts for a few pixels high (after that, it turns into a boring big grey patterns), so here are a bunch of questions: - why is the last visible line in a 320200 chunky screen 197? If i put pixels at screen + 320198, i don't see them anymore (last is 320*197). - why do i need two buffers? Has this anything to do with doublebuffering? The way i did it: put a pallette from black (color 0) to white (color 63), put some random pixels at the lowest line and keep them changing each "frame", and calculate the pixel value starting from the top. So where does the second buffer come into the story? - what do you mean with "caching as many sums as you can"? - i read somewhere that if you average 4 pixels, i should divide them with a value slighty larger than 4 and that is where the whole story begins: dividing, modulo and numbers with comma don't seem that obvious in assembler :) - what about using the instruction cache efficiently? - what's the difference between asl and lsl. It isn't obvious in the docs i have. I have the impression that the operation is the same but that the staus flags are different? - why don't i ever reach black? If i keep shifting the average with 2 (= dividing with 4), at some point i should reach zero, shouldn't i? Sorry if these questions are a bit lame. Someone has to start somewhere i guess... Getting this effect exactly right could teach me a lot of basics.
noname Member	#20 - Posted: 13 Jan 2006 14:58 - Edited Reply Quote why is the last visible line in a 320200 chunky screen 197? If i put pixels at screen + 320198, i don't see them anymore (last is 320197).* is it like that? it has nothing to do with screens being chunky or not (there is no such property as chunky screens always having 197 lines). it could be a problem in the copperlist (thus a small bug in wickedos) or in your code. don't bother too much about it at the moment. why do i need two buffers? because one is the input buffer and the other one is the output buffer. you wouldn't want to read and write to the same buffer at the same time because it would alter your input data as you process it. doing so on purpose can give interesting feedback effects but that's not what you want now. what do you mean with "caching as many sums as you can"? that is for you to find out. it is actually the trick to get it fast! generally speaking, there are certain sums when calculating the average value from the 8 surrounding pixels which can be reused just a bit later. use some checkered paper and a pencil to sketch the algorithm graphically. i read somewhere that if you average 4 pixels, i should divide them with a value slighty larger than 4 and that is where the whole story begins: dividing, modulo and numbers with comma don't seem that obvious in assembler :) you don't need neither divides here, nor modulos nor floats what about using the instruction cache efficiently? executing code which is already in the cache is faster than having the cpu to fetch it from memory. note that fetching and caching is done transparently for you. but if you prepare your time-critical code in a way (i.e. don't let it exceed 256 bytes for 68020/68030) that it leans towards efficient cache usage, you have just learnt an important pricinple of code optimization. what's the difference between asl and lsl read the docs: MC680x0 Reference 1.1 why don't i ever reach black? this might be because you are working on the same buffer for input and output. --- actually i would propose you code a one pixel mouse pointer first. this is actually good fun when making a fire effect and is only a few lines of code with the MOUSEX and MOUSEY macros. and at some point we would like to see your efforts ;) come on, make the ada logo burn in your program!
z5_ Member	#21 - Posted: 14 Jan 2006 20:13 Reply Quote I don't think i have the time nor the talent to actually release something someday but i'm still determined to get at least one effect going :) I now understand why i didn't need two buffers for the fire effect. In the tutorial that i had read, random pixels were placed at the bottom of the screen and the fire effect was achieved by calculating the average of 4 pixels: 3 beneath the pixel and one two lines beneath it. That way i wasn't overwriting my input. However, the tutorial stated that i had to divide with a value slightly higher than 4 or that the fire will continue to the top of the screen, which is the case. However, i will try to calculate 8 pixels with two buffers. So far i think my routine should look something like this: do for every pixel - get pixel from input - calculate average from 8 pixels surrounding it - place pixel in output (screen) I was thinking though: my output actually is the input for the next calculation so it would be cool if i could just swap screens. Like this: - start - use buffer 1 as input and write result to buffer 2 - show buffer 2 on screen - use buffer 2 as input and write result to buffer 1 - show buffer 1 on screen ... But how do i do this within WickedOs. I tried with SETMODE but i get heavy flickering on the screen.
z5_ Member	#22 - Posted: 8 Feb 2006 16:36 Reply Quote Haven't touched any assembler since posting my question above but i do wonder if/why my question was so stupid that nobody wants to answer it :o) (noname, where are you? :))
winden Member	#23 - Posted: 8 Feb 2006 19:57 Reply Quote a good technique for this case of adjusting values is to make a table. so for example if you make a table "t[x] = max(0,x / 5)" then you can calc the result of averaging with "new = t[old1 + old2 + old3 + old4]", which is faster than doing a divide on 68k machines. as for the problem with double buffers, yes your thinking is accurate and you should be able to do it this way... maybe the code you wrote is showing the same buffer you are updating?
winden Member	#24 - Posted: 8 Feb 2006 19:58 Reply Quote btw, maybe the fact that the site was down had any implications in people not answering? :P
noname Member	#25 - Posted: 9 Feb 2006 10:53 Reply Quote z5, good that you moved the site. the downtimes on the previous server became too long. concerning your question. your thinking (buffer1, buffer2, buffer1, ..) is right. and as you alreday noticed you wouldn't use the SETMODE macro in every frame. you have two choices: (let a2 and a3 contain pointers to the buffers) 1) - setmode to buffer 1 loop: - do effect from a2 to a3 - display (would c2p buffer 1) - swap a2,a3 - CHECKEXIT, beq loop doing it this way will correctly calculate the effect from one buffer to the other and back. BUT: it will always only display buffer 1 which means you would loose half your frames. this is not what you want. 2) - setmode to buffer 1 loop: - do effect from a2 to a3 - lea _wosbase,a0 - move.l a3, mode1ptr(a0) - display (would c2p the latest buffer (buffer1, buffer2, buffer1,..) - swap a2,a3 - CHECKEXIT, beq loop this will allow you to use different buffers without having to use setmode. i have just noted that there isn't a macro for this in wickedos. if i ever make a new version, i will add this.
z5_ Member	#26 - Posted: 19 Jun 2007 13:01 Reply Quote In general, do modern demos work entirely in chunky mode? Meaning all pics, brushes, fonts,... or is there still a mix of for example chunky for the first 6 bitplanes and bitplane 7 in planar mode for overlay and such?
StingRay Member	#27 - Posted: 19 Jun 2007 17:41 Reply Quote Most of them are in 100% chunky I think. Yet, I often mix planar/chunky, specially for scrollers and stuff like that I hate to use chunky mode. The disadvantage is that your demo will not support gfx cards if you hack the hardware directly (copper/blitter raping) and I often heard ppl complaining about that... Then again, I don't really care about that. :)
doom Member	#28 - Posted: 19 Jun 2007 21:37 Reply Quote Rule #1 of Amiga coding: learn to ignore the people who think the Amiga has a bright and shiny future ahead of it if we all keep upgrading. ;)
z5_ Member	#29 - Posted: 20 Jun 2007 12:53 Reply Quote I remember people saying: "nowadays, amiga coders just open a chunky screen and don't use the hardware anymore". At that time, i didn't understand what they meant, now i do. With my little experience, i don't see chunky mode as evil but as something that is more convenient for some effects. On the other hand, planar mode has some neat advantages in some cases aswell (again, somebody with very little experience talking here). For example, i find it more intuitive to overlay a picture/text/... in the planar format. So why not use the one or the other depending on the effect or on what the programmer prefers. To me, it's not really that important to know that coders nowadays don't use the amiga hardware anymore. It's not a thing i consider when judging a demo. On the other hand, i feel strongly about limiting the cpu to 68060 at max. Otherwise, the amiga could just aswell be a pc.
Blueberry Member	#30 - Posted: 20 Jun 2007 19:06 Reply Quote The problem with chunky effects on Amiga is not that the hardware does not have a chunky mode. Chunky to planar conversion in itself doesn't take very much time on 060. The problem is the awfully slow chipmem. If the chipmem had been fast, we could just make our effects write whatever format was convenient (which is, for most modern effects, chunky) and not worry about the time used for the conversion. As it is, most Amiga demos spend a disproportionally large part of their time wasting precious cpu cycles waiting for the chipmem to accept the next piece of data fed to it by the c2p. To take full advantage of the available cpu resources, you have to merge more calculations than just the c2p in between the chipmem writes. And that is far from convenient. The coders that just fill up a chunky buffer and then call a c2p to get it on screen are indeed not using the Amiga hardware. Any computer with a chunky buffer could be used for that. Those that carefully fill the time between each chipmem write to utilize the cpu fully are today's hardware hackers. :)

A.D.A. Amiga Demoscene Archive, Version 3.0