A.D.A. Amiga Demoscene Archive

Amiga Demoscene Archive Forum / Coding / coding tutorial: general questions

Page: 1 2 3 4 5 6 7 8 »»

Author	Message
z5_ Member	#1 - Posted: 2 Dec 2004 20:44 Reply Quote Some very basic questions: - how do i clear a screen (for example 8 bitplanes)? I tried with the blitter (with only destination D and minterm $0100) but i can't do all bitplanes in one go (and this shows on the screen)?
Cyf Member	#2 - Posted: 2 Dec 2004 21:16 - Edited Reply Quote you can use the blitter or the CPU but more faster with cpu. correct with blitter (D and minterm $100) cpu clear. screen 320x256 : lea screen+40*256(pc),a3 ; end of screen move.w #255,d7 ; 256 line movem.l buff(pc),d0-d6/a0-a2 ; 10.L=40 bytes .clear movem.l d0-d6/a0-a2,-(a3) ; fill one line with 0 dbra d7,.clear rts buf ds.l 10 ; 1 line ; or 14 with movem.l d0-d6/a0-a6 and complete with one movem.l d0-d6/a0-a4
z5_ Member	#3 - Posted: 3 Dec 2004 12:59 - Edited by Admin Reply Quote Another general question: Is there anything special that needs to be done when showing 256 color pictures on screen? Do you have to tell the system that you are going to go AGA? The reason i ask: i succesfully displayed 16 color pictures but i have trouble displaying a 256 color picture. I've searched and searched but i don't find it. Things i've done: - reserved a screen buffer of 402568 - setup my bitplane pointers, looping 8 times - changed BPLCON0 (bit 14-13-12 = 0, bit 4 = 1) - filled in all my 8 bitplanes with a raw picture converted by piccon - filled in my color copperlist with the color copperlist generated by piccon (8bit copper list) There always seems to be some trash in my last bitplane (bitplane 8). The pictures look perfect (colorwise and pixelwise) when i change BPLCON0 to use 7 bitplanes instead of 8. But when adding that last bitplane, some random (?) trash seems to appear. The picture however is perfect, colorwise and pixelwise (the trash is outside of the picture actually) I even tried clearing all 8 bitplanes before showing the picture but the problem keeps appearing. Any help would be appreciated :)
Cyf Member	#4 - Posted: 3 Dec 2004 13:35 - Edited Reply Quote if someone could post good Aga copperlists with examples of differents new Modes. from AGA doc : "modulos in aga mode is the same as in normal mode minus 8 : with FMODE=3, modulo = -8 (when normal modulo=0). with FMODE=2, modulo=-4 and under Aga, bitplanes, sprites and copperlists must be 64-bit aligned under certain circumstances, with CNOP 0,8" "bit 0 of bplcon0 ($dff100) must be set to enable all the ECS+ features. and for more than 4 colours, must set the bits 0 and 1 of $dff1fc and align bitmaps to 64 bits addresses"
z5_ Member	#5 - Posted: 3 Dec 2004 16:39 - Edited by Admin Reply Quote Is there any official guide on the AGA chipset (from Commodore)? I have the AGA-guide which was included in the asm-one archive, but this is sonewhat limited. I too have the feeling that it has something to do with the setup of my bplcon0, screen modulo,size... Can anyone give a short example on how a copperlist should look like regarding the screen (bplcon, modulo, size, diwstartn diwstop,...) when displaying a 256 color screen? I have setup all my screen info all the same as with a 16 color screen, apart from the number of bitplanes in bplcon0.
TheDarkCoder Member	#6 - Posted: 3 Dec 2004 17:38 Reply Quote @z5: Unfortunately, Commodore deliberatley refused to release a new version of the Hardware reference manual covering AGA. They wanted such a form of coding to die. Indeed, 10 years later the death of Commodore, people are still learning how to code the hardware!!! :-) About your problem: I would rather like to write a long tutorial on the subject, since I think to have useful information to share. For example, setting the modulo to -4 is not the best thing to do, in general... (although it works!). But currently I have no time to write long tutorials. However, if you are willing to use a lo-res 8 bplane PAL display, you can set (for now) fetch-mode to 0, setting to 0 register FMODE. If in your startup-code you do a pair of LoawView(NULL) before taking control of the hardware (as you should) then fetch-mode is already set to 0. In such a condition, you will not take advantage of AGA increased bandwidth, but you can set DIWxxxx, DDFxxxxand modulo exactly like on OCS. Try it, and if you still have problem, then ther is a bug somewhere else. Again, I suggest not to put the palette in the copperlist, but rather set it using the CPU. It is a good exersice to write the code that does the job. hope it helps
z5_ Member	#7 - Posted: 4 Dec 2004 11:59 Reply Quote Another basic questions... I have all my variables grouped in a section called datastuff, data_c. What does data_c mean? Does that mean that the variables are stored in chip ram? It's kind of annoying because i want to declare my variables beneath each subroutine where they are used. Can i do this? And where is my variable stored in this case (chip, fast)?? For example: mysubroutine: move.l myvariable,d0 rts myvariable: dc.l 0
Cyf Member	#8 - Posted: 4 Dec 2004 12:04 Reply Quote data_c, code_c, bss_c = Chip ram data, code, bss = Fast ram (if you have fastram, else chipram or other mem) but for simple variable, put them with your code in fast. and for chip data pointers, like screenpointer, init them at start.
z5_ Member	#9 - Posted: 8 Dec 2004 22:35 - Edited by Admin Reply Quote Another newbie question (lamer alert: yes, i need a book on assembler, but the shops are closed now... :)): suppose that i have a variable at var1 and one at var2. I want to do the following calculation: var1*40 + var2 and i want to add this to an address. I did it like this but i think it's bollocks... move.l var1,d0 move.l var2,d1 muls #40,d0 add.l d1,d0 lea screen,a1 add.l d0,screen
TheDarkCoder Member	#10 - Posted: 9 Dec 2004 09:58 Reply Quote @z5: it's almost correct, except for the very last instruction which should be: add.l d0,a1 of course you can be more efficient: use var1 and var2 as .W (I guess they're just video coordinate, so .W is enough): move.W var1,d0 lea screen,a1 mulu #40,d0 (why muls??) add.W var2,a1 ; maybe on the 030 add.L is faster ?? check the tables! add.L d0,a1 also note how I intermixed instruction so to support superscalar execution (trying not to use the same register in 2 consecutive instructions). Avery classic optimization is using a lookup-table instead of the MULU. But on the 060 I think is faster to use the MULU (just 2 clock cycles on the 060!!) No need to go to a shop, just go to Freescale web site and download all 680x0 manuals!!!! ;-)
Cyf Member	#11 - Posted: 9 Dec 2004 10:05 Reply Quote depend also for which 680x0 and new capabilities of 68020-30... for example : on 68000: add.w d0,d0 ; x2 move.w (a1,d0.w),d1 can be writed on 68020 : move.w (a1,d0.w2),d1 or move.l d0,(a0)+ move.l d1,(a0)+ add.l d2,d0 add.l d3,d1 is slower than : move.l d0,(a0)+ add.l d2,d0 move.l d1,(a0)+ add.l d3,d1 you can use lsl to replace muls : move.l var1,d0 move.l d0,d1 lsl.l #3,d0 ;8 lsl.l #5,d1 ;*32 add.l var1,d1 add.l d1,d0 ... or with asl but slower than lsl
z5_ Member	#12 - Posted: 20 Dec 2004 12:55 Reply Quote What does bss mean (seen in section chipmem,bss_c for example)?
Cyf Member	#13 - Posted: 20 Dec 2004 13:37 - Edited Reply Quote BSS=Block Started by Symbol bss is one of amigados executable hunks ($3eb) it's used to put large empty zones for the program without increasing the size of the file. Contains only the size of the zones. more informations from : http://encyclopedia.thefreedictionary.com/BSS Block Started by Symbol (BSS) was a pseudo-op in UA-SAP (United Aircraft Symbolic Assembly Program), the assembler developed in the mid-1950's for the IBM 704 by Roy Nutt, Walter Ramshaw, and others at United Aircraft Corporation. This pseudo-op was later incorporated into FAP (Fortran Assembly Program) the standard IBM assembler for the IBM 709, 7090 and 7094 computers.It defined its label and reserved space for a given number of words. Most modern assemblers produce a BSS section in their output object module containing all reserved but unitialized space.
d0DgE Member	#14 - Posted: 24 Dec 2004 23:49 Reply Quote You can imagine this BSS thingie as a section where you declare somewhat like empty bytearrays in Chip/Fast ram. that's the way I try to port this from OO- languages like Java. section stuff,bss_f [ _c = force to chipmem, _f force to fastmem] buffer: ds.b 4096 this means you have now 4096 empty bytes in fastram at label "buffer" where you can store data very easy
z5_ Member	#15 - Posted: 27 Dec 2004 23:17 Reply Quote cool, thanks for all the explanations! I have another question with a weird bug i still have (they are diminishing though :)). When i assemble a project and run it (asm-one), sometimes the screen i'm working on for my effects isn't what it should be. It is as if the modulos aren't correct. I see the picture, but in a distored kind of way. Just as you would with wrong screen modulos. I do get the same when trying a simple effect in a 1 bitplane screen. Sometimes it is like it should be, sometimes it isn't (with exactly the same code). Sometimes it works 3 times in a row, sometimes it bugs 2 times or 1 time. All very random... It is as if my 'demo' is still working with the modulo values of another (workbench?) screen. The strange thing ofcourse is that it works sometimes, sometimes it doesn't. Seems something in my startup isn't correct... I'm initialising my bitplane pointers to the screen before i write my new copper-address in cop1lc and write anything to the strobe register to activate my copper. All my bitplane info (bplcon0,1,screen modulo's,...) are in my copperlist. Anyone have an idea where i should start looking for this problem?
dalton Member	#16 - Posted: 28 Dec 2004 09:21 Reply Quote be sure that you have ditched the wb-coplist and -irq before you write the registers to set up the display. that's the only thing I can think of I'm afraid =/
z5_ Member	#17 - Posted: 28 Dec 2004 12:51 Reply Quote @dalton: hmmm...i haven't got the source here (i'm at work). I don't write to the registers to set up the display, they are already in my copperlist. The only thing i do is initialise my bitplane pointers (= point them at my screen buffer) by filling them in my copperlist. Then i'm telling the system what copperlist to use and write to the strobe register to use the new copperlist. Before that, i disable all dma and interrupt...
Cyf Member	#18 - Posted: 28 Dec 2004 14:40 Reply Quote what size and depth for the screen and modulo used ?
dalton Member	#19 - Posted: 28 Dec 2004 18:40 Reply Quote maybe some other routine writes on the copperlist by mistake? I once wrote a program with the coplist and bitplanes separate memory sections. there was no problem, unless the program was run directly after a clean boot. that was becase the coplist turned up directly after the bitplanes in memory, and my cls routine cleared a few bytes too much =)
z5_ Member	#20 - Posted: 30 Dec 2004 19:15 Reply Quote In the meantime, the problem sort of dissapeared, after i made more definitions in my bpl coplist (stuff like bpl1mod, bpl2mod). Hopefully this is solved for good. In the meantime, i have another general question. Sometimes, when testing a variable, i get illegal address size: tst.b sc1_speed with sc1_speed defined as: sc1_speed: dc.b 0 Sometimes it works, on other variables it doesn't. I know that every word has to start at an even address, so after my last byte declaration, i do add 'even' (otherwise asm-one would complain anyway about that). So what's wrong here? And by the way, i just finished my first little intro :)
Cyf Member	#21 - Posted: 30 Dec 2004 21:11 Reply Quote try to align 32 bits before with : CNOP 4,0
z5_ Member	#22 - Posted: 4 Jan 2005 12:56 Reply Quote @cyf: i still have the same problem. I did have cnop 0,4 before every routine though (at least, if i didn't forget any). For now, i have solved it with a comparison instead of a test.b. It's quite strange. Can you explain what the cnop 4,0 is about and why it is needed to solve this problem? And another general question: what is the fastest way to fill in a table? Let's say i have to fill in a table of 256 longwords with the value $11111111. The way i do it, i just loop through it 256 times like this: lea table,a0 moveq #255,d0 .doloop: move.l $11111111,(a0)+ dbra d0,.doloop Is there a faster way? Also, am i correct that in this case (a0)+ will move my pointer with 4 bytes because the move is of type l(ongword)?
noname Member	#23 - Posted: 4 Jan 2005 14:12 Reply Quote replace move.l $11111111,(a0)+ with move.l #$11111111,(a0)+! (you don't want to copy from memory address $11111111 but want to set a fixed value of #$11111111) otherwise the code is fine. it is clear to read and fast enough (really, i mean it!) in execution since it is not part of any time critical loop. there is no use to optimize initialization-code beyond recognition. if you still want to "optimize" that bit use movem.l d0-d7,(a0)+ which would copy 32 bytes with one command. don't forget to set the registers to the same value before starting to loop then. but as i said, there is no speed advantage here, just educational value. cnop assures that the following code or data lies on a given boundary (cnop 4,0 makes the address divideable by 4. even makes an address divideable by 2)
krabob Member	#24 - Posted: 4 Jan 2005 15:27 Reply Quote [quote]lea table,a0 moveq #255,d0 .doloop: move.l #$11111111,(a0)+ dbra d0,.doloop Is there a faster way? [/quote] of course: move.l #$11111111,d1 .doloop: move.l d1,(a0)+ dbra d0,.doloop move.l d1,(a0)+ opcode must be shorter than move.l #$11111111,d1 , so for a 68000 for exemple, that has no instruction cache, less memory must be read for the code. if your table size is multiple of 4, unloop this: .doloop: move.l d1,(a0)+ move.l d1,(a0)+ move.l d1,(a0)+ move.l d1,(a0)+ dbra d0,.doloop ...then you gain 3/4 of the loss dur to the load and execution of the dbra. But the best for 68000 is: check the size of your table. If it is bigger than 7*4=28, do this: move.l #$11111111,d1 move.l d1,d2 ... move.l d1,d7 ... then do a loop with size= size/7 , and point the END of your table. then do: .doloop: movem.l d1/d7,-(a0) dbra d0,.doloop ... and finish the "rest" with another classic loop. It should be the fastest 68000-like filling routine: the movem opcode is incredibly smaller than a list of move. -(a0) is because (a0)+ is not possible with movem. However... It also depends A LOT on the 680X0 version you use. For each , according to the cache configuration, one or other routine will be faster. this is really a mess with 680x0 family: a 68k mean a different optimisation. 68000: no instruction cache, no data cache, and most of the time only CHIP ram: use blitter the best possible for big memory access. 68020: 256 instruction cache , no data cache With even a short inst.cache, It changes things a lot: possibly, the memory executed in a copying loop can stand in the cache. there will be no memory access due to the instruction once read. It frees the bus for the data copy. And you can "unrol" and "align"" the loop 'til the cache is filled and o the dbra execution time is minimal. 68030: 256 instruction cache , 256 data cache 68040: ? instruction cache , ? data cache 68060: 8kb instruction cache , 8kb data cache you can also change the modes of the cache and play with 16 byte alignment of your datas. then speed varies according to the copy routine and the cache configurations.
z5_ Member	#25 - Posted: 29 Dec 2005 12:33 Reply Quote Can anyone explain this move instruction: move.b (a3,d5.l),(a1,d5.l) i have even seen something like move.b (x,y,z),w Never seen any explanation on such move instructions, with several parameters in source and/or destination.
winden Member	#26 - Posted: 29 Dec 2005 13:28 Reply Quote q = a3 + d5; w = a1 + d5; x = read_byte_at_(q); write_byte_at_(w)_with_value(x);
z5_ Member	#27 - Posted: 24 Apr 2007 19:09 - Edited Reply Quote How does a compare/branch work after a move instruction? The 68k reference guide isn't giving me any answer... move.l d0,d1 beq Can you do a higher than, lower than,... and how does the comparison work?
dalton Member	#28 - Posted: 24 Apr 2007 19:30 Reply Quote neg and zero flags are set as you would expect. overflow and carry are cleared and x is unchanged. thus move d0,?? followed by beq means jump if d0 is equal to zero
Kalms Member	#29 - Posted: 24 Apr 2007 23:34 Reply Quote Another way to think about it is: after a move/add/sub/shift/rotate/whatever operation, the flags will be set as if the result of the operation has been compared against zero (i.e. "lsr.w #1,d1" would be equivalent to "lsr.w #1,d1; cmp.w #0,d1"). [Note that operations against address registers (movea, adda, suba etc) do not affect the flags.]
rload Member	#30 - Posted: 9 May 2007 00:34 Reply Quote In asmone is it possible to do something like this? rept 5 move.l d0,d0 add.l #5,d1 bpl .label@ clr.l d1 .label@ endr and have the assembler generate a new .label@ per repeat? (this code doesn't make any sense btw, just a syntax question). Is there some other preprocessor macro trick I can use? I also tried to branch over the clr with bpl.b *+2 Which I thought would branch from to the address "current line's address+2 bytes offset", but that doesn't seem to work. I've used the star with some success before in phxass atleast.

Page: 1 2 3 4 5 6 7 8 »»

A.D.A. Amiga Demoscene Archive, Version 3.0