Author |
Message |
rloaderror
Member |
When making demos for the A1200 060 class Amigas I've always relied on having lots of memory available so that all data could be loaded before the demo starts. For our recent demos it tends to get a bit ridiculous with large textures/tables chewing up 64MB of memory budget.
I'd like to know a more about how other coders handle memory management and streaming in data from HD?
(On A500 trackmos there are lots of efficient memory management and dynamic loading of data.. but I think us A1200 coders have not really picked up the gauntlet!)
|
todi
Member |
How about having the textures compressed in memory and unpacking before each part? In our demo system we have a big data file (wad style) which we open in the begining and read offset before each part needs the data, but if you would like to stream when a routine i running I think you have to read in chunks to not take up to much CPU...
|
rloaderror
Member |
That sounds like a good strategy. Which compression scheme do you use? Is it close to copy-speed decompression?
I'm trying to stream in music from disk now. This seems very unstable. So far I'm just getting stuttering for when the demo is also loading in other data while the music is streaming and when an heavy effect is running. I've just implemented it though so maybe I have some bugs in the buffering code.
*edit* yes, there were stupid bugs.. Now it works although need to carefully select the buffer size to avoid stuttering.
|
todi
Member |
You should be able to ready about 2mb/s on an a1200 with accelerator, but i found this, for some strange reason it's only 1.6mb/s on a Blizzard ppc/060 board: http://www.elbox.com/tests/fastata_speed.htmlI would pick some algo like lz4 which is close to copyspeed or adpcm for audio, we use shrinkler or uncompressed but we are not "streaming"...
|
rloaderror
Member |
Mmmh mhm * notes down lz4 *
|
jar
Member |
" if you would like to stream when a routine is running I think you have to read in chunks to not take up to much CPU... "
this most likely requires using a decompression routine that can work in chunks instead of depacking the whole source buffer in one go. Not sure if typically used amiga (de)compression routines or LZ4 support this.
If you don't need very good compression ratio you could take a look at ye olde LZW. An LZW decompressor can be easily made to work in a way to provide you an arbitrary number of decompressed bytes with each call. This way you could even create an fread-style API which directly decompresses while reading from disk. Read performance is probably not so hot (if you use AmigaDOS maybe it buffers/prefetches into RAM) - but if you read ahead stuff for the next part in background while the current part is running it might be good enough.
|
todi
Member |
If I understand right (havn't tried) lz4 has support for "streaming" when using the lz4 frame format (multiple lz4 blocks) https://github.com/lz4/lz4/blob/dev/doc/lz4_Frame_format.md. Ofcource LZW is better compression, but a close to copyspeed algo was requested...
|
jar
Member |
If you decompress everything between parts then yes you need a decompressor that is as fast as possible (hard to beat LZ4 in this case). Because you don't want to have ugly delays between parts. However, for the use case I mentioned ("if you read ahead stuff for the next part in background while the current part is running"), decompression speed probably doesn't matter that much - provided the current part runs long enough so you can decompress everything before the part ends. I.e. you reserve a ceratin budget per frame (in your mainloop) to cooperatively feed the decompression routine with the next chunks (or do it in an IRQ). You will most likely write some kind of simplified async I/O functions that queue up files/buffers to be processed by the decompression "thread".
|
winden
Member |
I did some tests like these maaaany years ago on 030/50 w/ traditional disk -- let me try to recall...
I'd have two exec tasks, one running the demo at prio 1 and another doing the background loading at prio 2.
The loading thread, if working on any request, would do a WaitTOF then call dos.library to read a chunk of data, it would get highest priority and block quite a bit of CPU each frame.
I'm not sure but seem to recall reading 10Kb / frame would consume 20% of a frame.
If you want to double check, I recall Xenophobia / Subspace ( https://www.pouet.net/prod.php?which=4712 ) had a part with a anim streamed from disk, you could run it under SnoopDos to measure the amount of reads they did.
|
rloaderror
Member |
I tried an experiment with using two exec tasks. The goal was to see whether I could run effects in the second task at free framerate while the main task would always be 50 fps. The main task would be light and handle palette fades/sync/composition etc. However I ended up with two tasks live cutting down the performance a lot. Could be I made some mistake with the priorities or other, but I ended up abandoning the idea.
I seem to remember seeing some language disallowing IO or memory allocation outside the main task as well, which made this hard to use in practice as well.
Tried using Snoopdos3 while running Xenophobia, but didn't spot any list of Read calls made during the animation. Could be I'm running from a high ram config and it just loads the whole thing upfront or something. Either that or my snoopdos version doesn't provide individual Fread type of call logs.
10KB costing 20% of a frame might be a bit of a showstopper though, but then again some of these chunky effects are 2-frame at best and usually worse than that.
In a moment of madness I thought about making something like a co-routine approach to allow partially progressing loading routines. The loading routine would have no local stack variables and instead keep all its state in a struct (including a "Program Counter" type of state).. so that it can be suspended and continued from the main loop.. There, now I've said it! Of course it won't work! :D
|
krabob
Member |
Hello all. a pleasure to read and write here. I should come back to classic Amiga coding soon. For "Catabasis" (2017,cocoon) , it looks like a trackmo, but I use dos.library calls for all data reading. I override the system interupts without blocking them, and own/disownblit() so dos.library calls can fully work. Then in the demo, the first effect, and one in the middle, are completely running in the VBL interupt doing copper things, while the main task is actually reading with dos/uncompressing. ( the plasma at 1m11 with the big face sprite scrolling down is also dual playfield trick + sprites, no need for cpu or blitter, so main task loads .) Yet, I also wanted to do the tests to stream data with 2 tasks, so thanks winden and loaderror, this is interesting.
|
rloaderror
Member |
Implemented music streaming now. Seems to work ok if keeping a fairly large streaming buffer and update it from HD depending on the playback status after each frame. Saved about 9 MB memory on The Martini Effect (now it requires 35 MB.. goal is to take it down below 32 MB)
According to an old 80s movie, spilling a can of coke into a C64 should grant me unlimited RAM. I'm about to find out if this trick still works on modern computers as I just spilled a beer into my Macbook Air M1. If it works, perhaps no point in digging into this memory management problem any further.
Can anyone remember the name of this movie? I'd like to watch it again
|
hellfire
Member |
rloaderror: According to an old 80s movie, spilling a can of coke into a C64 should grant me unlimited RAM. Can anyone remember the name of this movie? I'd like to watch it again Reminds me of "Electric Dreams": https://www.imdb.com/title/tt0087197/plotsummary
|
rloaderro
Member |
Thanks Hellfire! That's the one. My memory was a bit off as I remembered it was coke+c64 giving it unlimited RAM, but in the movie it was some kind of PC+champagne that gave the PC sentience and it already had unlimited RAM because the protagonist simply asked for it while downloading all the data on his boss' computer :D Was a great nostalgia trip to watch it again and isn't it kind of an 80s Alexa movie?
|
jar
Member |
But.. is your code fast enough now to stream that movie from floppy? :)
|
rloaderro
Member |
Maybe Algotech could do it!? They got some awesome compression/streaming tech I think.
|