A.D.A. Amiga Demoscene Archive

  Welcome guest! Please register a new account or log in

  

  

  

log in with SceneID

  

Demos Amiga Demoscene Archive Forum / Coding / Running A500 executable in faster machines/AGA machines

 

Author Message
merry
Member
#1 - Posted: 20 Aug 2013 01:32 - Edited
Reply Quote
Hi fellows, hereby comes my first post! Hope not the last :)

From years ago I've tried multiple times to write down something for my old Amiga 500. Unfortunately, I was not able in the 90s and from then I did mostly of my stuff on PC and 8-bit machines.

But, I am *really* engaged so take back the ride of Amiga coding! I want to end up at least one intro for it :)

As my fav is the A500 (probably cause I'm still owning my precious) I am quite interested in writing down something for it. I'm using a variation of xxxxx's WinUAE toolchain -to be released someday in the future- as I'm more used to write code in PC. My target is then OCS/ECS chipsets running on a retail A500 (512KB ChipRAM).

So, my question is about how to make my code multiplaform: I'm assuming A1200 is able to run ECS/OCS programs but what about speed? Am I just depending on VSYNC or is there any way to "downgrade" the machine on start then "upgrade" it on exit?

I know this question can sound really weird for people used to write Amiga code but I'm just a newbie on it :P
Thank you so much for your help :)
Angry Retired Bastard
Member
#2 - Posted: 20 Aug 2013 15:44
Reply Quote
If the code runs at full speed (50fps) on the A500 then it's just about respecting vsync (and blitter waits, if you're using the blitter).

If it runs < 50fps on the A500 then it might of course be faster on faster machines and you'll want to deal with that (it doesn't need to be very difficult either). That said; since A500 is your favourite (as it should be!) focus on getting stuff right there first. :) I'm not going to say "only do 50 FPS effects" because that's just stupid (if your effect looks cool anyway).

Some nitpickers may also start complaining that the speed differences are really due to lots of factors (cpu types and clock freq, memory types +++) but for what you're talking about it's good enough to say "stock a500 vs anything that's faster".

Lastly: In some extreme cases you _could_ get into situations where stuff runs faster on A500 than a stock a1200 as well, but you'll really have to try hard to get there. Don't worry about that one for now. :)
merry
Member
#3 - Posted: 20 Aug 2013 23:46
Reply Quote
Great! Thank you so much for your answer and time.

I will focus then on coding for a stock A500 with mostly 50FPS or so FX then dealing later if required with A1200 et al. I don't think I'm going to code something not feasible to run at 50FPS in my first attempt, so I will deal with those later when they arise (if they do).

Thank you so much again! Hope to have something to release soon or later :)
Blueberry
Member
#4 - Posted: 22 Aug 2013 19:59 - Edited
Reply Quote
There are a couple of things you should keep in mind when making your A500 demos to make them run on other Amigas. Off the top of my head:

1. Chipset state: The AGA chipset contains a number of control registers which change the behavior of existing OCS/ECS features. Since the OS uses these new features (unless you select on older chip set in the boot menu), some of these registers could be in a "bad" state when your intro starts.

You are usually home free if you include these three commands in your copper list (in addition to the setup for all OCS registers):

    dc.l    $01060000    ; Color register bank, black border, sprite resolution
dc.l $010c0011 ; Color remapping, sprite palette
dc.l $01fc0000 ; Fetch mode, bitplane/sprite scandoubling


2. CPU caches: If you use any kind of self-modifying code (writing code to memory, then executing it), you need to either:
- Flush caches between the write and the execute. CacheClearU (offset -636 in Exec, only present on KS 2.04+) does the trick.
- Disable caches. This in more involved. CacheControl is the Exec function to look up.
- Write the code to chip memory. Chip memory is always uncacheable. This does not have any performance impact on A500, but it might make your code run slower on faster Amigas than it does on A500.

Note that decrunching counts as writing code to memory. If you need to crunch your executable, use a cruncher which properly flushes the cache. Crunchmania is known not to do this. I of course recommend my own cruncher - it compresses better than all the others, but decompresses very slowly. :)


3. Vectorbase. If you install your own interrupt by writing its address to the interrupt vector, you need to take into account that the interrupt vector might not be placed starting from address 0. The address is in the VBR register, which can be read using the privileged movec instruction. So, for example, to install a VBlank interrupt:

    lea     GetVBR(pc),a5
jsr -30(a6) ; Supervisor
lea OldInterrupt(pc),a0
lea Interrupt(pc),a1
move.l $6c(a2),(a0)
move.l a1,$6c(a2)

...

GetVBR:
movec vbr,a2
rte

OldInterrupt:
dc.l 0
Interrupt:
; Interrupt code here
rte

Keep in mind that the VBR register is only present on 68010+, so the above code needs to be guarded by a CPU check.


4. Interrupt request bits: At the end of an interrupt, you need to clear the corresponding interrupt request bit. On some A4000s, the actual disabling of the interrupt lines to the processor happens so slowly that the CPU sometimes manages to fully return from the interrupt beforehand, which means the interrupt gets triggered again. The solution is to clear the interrupt request bits twice:

    move.w  #$0020,$dff09c
move.w #$0020,$dff09c
rte


For an example of some of these things in practice, take a look at my startup code. :)
ZEROblue
Member
#5 - Posted: 26 Aug 2013 17:11 - Edited
Reply Quote
A couple of other things:

1. If the machine has a graphics board you may have to open graphics.library and call LoadView(NULL) followed by two calls to WaitTOF() to make sure the native video is enabled and/or gets passed through the graphics board.

LoadView is an asynchronous call, and calling WaitTOF twice is to make sure the new Copper program installed by LoadView gets run at least once, so you want to do these calls before disabling any DMA, multitasking and interrupts.

2. It's easy to assume some code will always need a certain minimum amount of time to finish, so if you're not using any kind of raster-interrupts and keep things simple f.ex like this:

wait cmp.b #100, vhposr
bne wait
(some code)
bra wait


and (some code) executes quickly, then on faster CPUs with caches it may finish before the next scanline and end up running twice per frame, so always guard by f.ex waiting first for line 99 and then line 100.
merry
Member
#6 - Posted: 27 Aug 2013 01:44 - Edited
Reply Quote
Thank you both so much for your answers :) I appreciate them a lot!!! As expected, some things should be sort off to make A500 code to run smoothly (but not faster) in faster machines.

@Blueberry: regarding the CPU cache, initially I am not in the mood of self modifying code but you never know :) I know about the CPU cache (mostly x86) due to my job, and I mostly understand why Chipmem is not cached but, what I can't really figure out is why code running on Chipmem is slower on faster machines... are you talking *exclusively* about self modifying code or actually about any code running on Chipmem?

To anyone that could help, do you think I should focus from the very beginning in keeping my code compatible with faster machines? I mean, writing original code for A500 then "porting" A500 code in a second sprint to make it compatible with faster machines is going to be really painful or is, in your oppinion and previous experiences, the way to go?

Thanks!
Blueberry
Member
#7 - Posted: 30 Aug 2013 17:59
Reply Quote
I am not entirely sure what makes chip code run slower on faster machines, but Paradroid reported seeing this behavior here.

As for your compatibility strategy, I would suggest using a well-tested, compatible startup code which handles most of the issues mentioned above. Keep the chipset state and CPU cache issues in mind while coding, and you are mostly set. Of course problems will show up anyway (they always do), but at least you will have the basics covered. :)
Lonewolf10
Member
#8 - Posted: 27 Dec 2013 20:18 - Edited
Reply Quote
Blueberry:
1. Chipset state: The AGA chipset contains a number of control registers which change the behavior of existing OCS/ECS features. Since the OS uses these new features (unless you select on older chip set in the boot menu), some of these registers could be in a "bad" state when your intro starts.

You are usually home free if you include these three commands in your copper list (in addition to the setup for all OCS registers):
    dc.l    $01060000    ; Color register bank, black border, sprite resolution
dc.l $010c0011 ; Color remapping, sprite palette
dc.l $01fc0000 ; Fetch mode, bitplane/sprite scandoubling

Is it sufficient to run this in the copperlist once, e.g. main menu and or logo screen, or would they have to be included in all copperlists used in the demo/game? (I'm working on a new 2D platform game and would like to have the basics covered so I have compatibility across the chipsets)
Blueberry
Member
#9 - Posted: 28 Dec 2013 00:15
Reply Quote
If you close the system, call LoadView(0), wait at least 2 vblanks and then set the registers once, it should be sufficient.
d0DgE
Member
#10 - Posted: 28 Dec 2013 12:59 - Edited
Reply Quote
Lonewolf10:
Is it sufficient to run this in the copperlist once, e.g. main menu and or logo screen, or would they have to be included in all copperlists used in the demo/game?

AFAIK setting BPLCON3, BPLCON4 ($0106, $010c) and the FMODE ($01fc) once is just fine, UNLESS you want to, for instance, change the colour table addresses for the Sprites which are to be set in the lower byte of BPLCON4 ( default is $11 ). For standard OCS use (i.e. 5 bitplane, 32 colour palette $0180 - $01be, 4 bit colour values ) there is no need to do anything to BPLCON3 other than to initialize it with 0.
For example, if you're dealing with AGA colour palettes you would still only have the 32 colour registers handy, hence in order to display 8 bit colours on screen you would have to "rewrite" 4-bit nibble values of the colours to the 32 colour registers and after each turn "switch" the bank in BPLCON3 like this:

copperlist:

; ...all the init stuff

dc.w $0106,$0000
dc.w $0180,$0111
...
dc.w $01be,$0fff

dc.w $0160,$0200
dc.w $0180,$0222
...
dc.w $01be,$0123

dc.w $0106,$2000
...colour regs & values

dc.w $0106,$2200
... colour regs & values

; up until $0106,$e200 ... resulting in a monstrous palette listing


Doing classic AGA chipset stuff is a royal pain in the arse. Working on the Stealthranger demo this year taught me valuable lessons :)
Lonewolf10
Member
#11 - Posted: 28 Dec 2013 16:07
Reply Quote
Thanks for the help guys :)
losso
Member
#12 - Posted: 27 Mar 2014 10:15
Reply Quote
A little addition to the AGA-compatible initialization copperlist (because some month ago I ran into this myself): The correct initialization value for BLPCON3 should be $0c00, not $0000:

 dc.w    $0106,$0c00 ; set color table offset for playfield 2 to 8


Of course, that does not really matter unless you happen to be using dual-playfield mode.
Blueberry
Member
#13 - Posted: 31 Mar 2014 20:35 - Edited
Reply Quote
Good catch! Sorry for leading you astray. ;)

For good measure, here is the corrected set of compatibility copper instructions:

    dc.l    $01060c00    ; Color register bank, black border, sprite resolution
dc.l $010c0011 ; Color remapping, sprite palette
dc.l $01fc0000 ; Fetch mode, bitplane/sprite scandoubling

Lonewolf10
Member
#14 - Posted: 3 Apr 2014 23:33
Reply Quote

Thanks losso, and you too Blueberry ;)

 

  Please register a new account or log in to comment

  

  

  

 

A.D.A. Amiga Demoscene Archive, Version 3.0