A.D.A. Amiga Demoscene Archive

  Welcome guest! Please register a new account or log in

  

  

  

log in with SceneID

  

Demos Amiga Demoscene Archive Forum / Coding / Streaming in data while the demo runs

 

Author Message
rloaderror
Member
#1 - Posted: 31 Oct 2021 10:46
Reply Quote
When making demos for the A1200 060 class Amigas I've always relied on having lots of memory available so that all data could be loaded before the demo starts. For our recent demos it tends to get a bit ridiculous with large textures/tables chewing up 64MB of memory budget.

I'd like to know a more about how other coders handle memory management and streaming in data from HD?

(On A500 trackmos there are lots of efficient memory management and dynamic loading of data.. but I think us A1200 coders have not really picked up the gauntlet!)
todi
Member
#2 - Posted: 31 Oct 2021 17:19
Reply Quote
How about having the textures compressed in memory and unpacking before each part?
In our demo system we have a big data file (wad style) which we open in the begining and read offset before each part needs the data, but if you would like to stream when a routine i running I think you have to read in chunks to not take up to much CPU...
rloaderror
Member
#3 - Posted: 7 Nov 2021 08:35 - Edited
Reply Quote
That sounds like a good strategy. Which compression scheme do you use? Is it close to copy-speed decompression?

I'm trying to stream in music from disk now. This seems very unstable. So far I'm just getting stuttering for when the demo is also loading in other data while the music is streaming and when an heavy effect is running. I've just implemented it though so maybe I have some bugs in the buffering code.

*edit* yes, there were stupid bugs.. Now it works although need to carefully select the buffer size to avoid stuttering.

todi
Member
#4 - Posted: 8 Nov 2021 19:05 - Edited
Reply Quote
You should be able to ready about 2mb/s on an a1200 with accelerator, but i found this, for some strange reason it's only 1.6mb/s on a Blizzard ppc/060 board: http://www.elbox.com/tests/fastata_speed.html
I would pick some algo like lz4 which is close to copyspeed or adpcm for audio, we use shrinkler or uncompressed but we are not "streaming"...
rloaderror
Member
#5 - Posted: 11 Nov 2021 15:38
Reply Quote
Mmmh mhm * notes down lz4 *
jar
Member
#6 - Posted: 15 Nov 2021 12:19
Reply Quote
" if you would like to stream when a routine is running I think you have to read in chunks to not take up to much CPU... "

this most likely requires using a decompression routine that can work in chunks instead of depacking the whole source buffer in one go. Not sure if typically used amiga (de)compression routines or LZ4 support this.

If you don't need very good compression ratio you could take a look at ye olde LZW. An LZW decompressor can be easily made to work in a way to provide you an arbitrary number of decompressed bytes with each call.
This way you could even create an fread-style API which directly decompresses while reading from disk. Read performance is probably not so hot (if you use AmigaDOS maybe it buffers/prefetches into RAM) - but if you read ahead stuff for the next part in background while the current part is running it might be good enough.
todi
Member
#7 - Posted: 15 Nov 2021 20:59 - Edited
Reply Quote
If I understand right (havn't tried) lz4 has support for "streaming" when using the lz4 frame format (multiple lz4 blocks) https://github.com/lz4/lz4/blob/dev/doc/lz4_Frame_format.md.
Ofcource LZW is better compression, but a close to copyspeed algo was requested...
jar
Member
#8 - Posted: 16 Nov 2021 14:28
Reply Quote
If you decompress everything between parts then yes you need a decompressor that is as fast as possible (hard to beat LZ4 in this case). Because you don't want to have ugly delays between parts.
However, for the use case I mentioned ("if you read ahead stuff for the next part in background while the current part is running"), decompression speed probably doesn't matter that much - provided the current part runs long enough so you can decompress everything before the part ends.
I.e. you reserve a ceratin budget per frame (in your mainloop) to cooperatively feed the decompression routine with the next chunks (or do it in an IRQ). You will most likely write some kind of simplified async I/O functions that queue up files/buffers to be processed by the decompression "thread".
winden
Member
#9 - Posted: 16 Nov 2021 20:21
Reply Quote
I did some tests like these maaaany years ago on 030/50 w/ traditional disk -- let me try to recall...

I'd have two exec tasks, one running the demo at prio 1 and another doing the background loading at prio 2.

The loading thread, if working on any request, would do a WaitTOF then call dos.library to read a chunk of data, it would get highest priority and block quite a bit of CPU each frame.

I'm not sure but seem to recall reading 10Kb / frame would consume 20% of a frame.

If you want to double check, I recall Xenophobia / Subspace ( https://www.pouet.net/prod.php?which=4712 ) had a part with a anim streamed from disk, you could run it under SnoopDos to measure the amount of reads they did.
rloaderror
Member
#10 - Posted: 19 Nov 2021 15:30
Reply Quote
I tried an experiment with using two exec tasks. The goal was to see whether I could run effects in the second task at free framerate while the main task would always be 50 fps. The main task would be light and handle palette fades/sync/composition etc. However I ended up with two tasks live cutting down the performance a lot. Could be I made some mistake with the priorities or other, but I ended up abandoning the idea.

I seem to remember seeing some language disallowing IO or memory allocation outside the main task as well, which made this hard to use in practice as well.

Tried using Snoopdos3 while running Xenophobia, but didn't spot any list of Read calls made during the animation. Could be I'm running from a high ram config and it just loads the whole thing upfront or something. Either that or my snoopdos version doesn't provide individual Fread type of call logs.

10KB costing 20% of a frame might be a bit of a showstopper though, but then again some of these chunky effects are 2-frame at best and usually worse than that.

In a moment of madness I thought about making something like a co-routine approach to allow partially progressing loading routines. The loading routine would have no local stack variables and instead keep all its state in a struct (including a "Program Counter" type of state).. so that it can be suspended and continued from the main loop.. There, now I've said it! Of course it won't work! :D
krabob
Member
#11 - Posted: 3 Dec 2021 15:20 - Edited
Reply Quote
Hello all.
a pleasure to read and write here. I should come back to classic Amiga coding soon.
For "Catabasis" (2017,cocoon) , it looks like a trackmo, but I use dos.library calls for all data reading. I override the system interupts without blocking them, and own/disownblit() so dos.library calls can fully work.
Then in the demo, the first effect, and one in the middle, are completely running in the VBL interupt doing copper things, while the main task is actually reading with dos/uncompressing. ( the plasma at 1m11 with the big face sprite scrolling down is also dual playfield trick + sprites, no need for cpu or blitter, so main task loads .)

Yet, I also wanted to do the tests to stream data with 2 tasks, so thanks winden and loaderror, this is interesting.

 

  Please register a new account or log in to comment

  

  

  

 

A.D.A. Amiga Demoscene Archive, Version 3.0