A.D.A. Amiga Demoscene Archive

Amiga Demoscene Archive Forum / Coding / CACR register on the Mc68060

Author	Message
sp_ Member	#1 - Posted: 17 Feb 2012 10:30 Reply Quote I found out that it is possible to freeze cachelines. by changing the cache control register of the cpu. Has anybody tried to use this for optimizing? For a texturemapper with a 8kb texture the a small loop could put the whole texture into the cache, then run a freeze cachelines (in supervisor mode). And the renderer would run at memory writespeed.. You probobly would need to creat a loop that push longwords to fastmem since memory isn't cached. move.l d0,(a0)+ render 4 pixels for free. Could it be used for 50fps effects?
dalton Member	#2 - Posted: 17 Feb 2012 11:28 Reply Quote I think it's possible using the NAD (no allocate data) bit. But if you lock an 8k texture in there, wouldn't all other memory accesses be dead slow? Like writing to chunky buffer...
sp_ Member	#3 - Posted: 17 Feb 2012 13:17 Reply Quote Yes, they will be slower. So you have to always write longwords. However, the 060 is able to execute instructions while writing to ram so if your render code is complicated, it might not be slower..You can also read from the texturebuffer without causing a stall since everything is cached. move.l d0,(a0)+ ;push 4 pixels to the chunky buffer. move.b (a1,d4.w),d1 ;a cachemiss here will cause the CPU to wait for (a0) and then wait for a1
sp_ Member	#4 - Posted: 17 Feb 2012 14:19 - Edited Reply Quote Copyspeed on amiga Fast2chip is normally around 4.5 MB/S. But writespeed to chip can be as High as 6.5 mb/s. Cache->chip. Lets take a (1x1) Zoom rotate. If you keep the texture in the cache. Lock it, and then create interpolation logic together with the c2p. I think perhaps you can get a zoomrotate that executes at writespeed(6.5mb/s) That gives you a bandwidth of 130 000 bytes per frame.. Should be enough to do a fullscreen effect in 50fps 640*200 in 8bpl.(1x1)
dalton Member	#5 - Posted: 17 Feb 2012 15:09 Reply Quote That would be awesome! But can you really get cache->chip speed if the cache is locked? I would assume that if it is not possible for the cpu to put the value you're writing in cache, it would have to wait for it to be stored in ram before execution continues. Ie you pay full cache miss penalty on each access. I'm just guessing though.
sp_ Member	#6 - Posted: 17 Feb 2012 16:01 - Edited Reply Quote The instruction cache is not locked. Only the Datacache. Chipram is not cachable.. You get a big wait penalty, but while waiting you can execute 40 superscalar instructions, wich should be enough for the c2p and renderer. I have most of the code ready to test. Will create a small sample to try it out. (on the Natami)
jamie2010 Member	#7 - Posted: 17 Feb 2012 16:05 Reply Quote I'm not sure it's the good way to explore. Effect in c2p is hard to maintain and need more time to develop, it's ok for rotozoom or simple effect but i'm sure you can find a better usage of the free cycle:) I can't believe that we talk c2p in 2012 but the challenge it's not finish, i use a new method specific to 060 that give a boost.
sp_ Member	#8 - Posted: 17 Feb 2012 21:09 Reply Quote Zoom&rotate is borring... With some work you can make perspective correct rotation around 3 axis in 640x400(interlaced) in full framerate The only demo I have seen in this resolution was made by sonic clique in the nineties..
Raylight Member	#9 - Posted: 6 Apr 2012 14:33 Reply Quote sp_: With some work you can make perspective correct rotation around 3 axis in 640x400(interlaced) in full framerate The only demo I have seen in this resolution was made by sonic clique in the nineties.. Do you remember the name of their demo? Sounds really interesting, I'm surprised I haven't seen it (I think)! :)
Blueberry Member	#10 - Posted: 10 Apr 2012 13:44 Reply Quote You don't need to lock the cache to achieve this. If your code is not accessing any other (significantly sized portion of) memory, your texture will mostly stay cached anyway, resulting in very few cache misses along the way. Most of the oneframe effects in Hotstyle Takeover are done like this: Render a small section of the screen (typically a scanline or a block, depending on the effect) to a buffer, then c2p that buffer (now in cache), writing only a few of the bitplanes to chip memory and the rest to another buffer, then copy those remaining bitplanes to chip memory while calculating the effect for the next section. The effects done this way include flatshaded 3D (rendered per scanline) and grid expansion using a 64x64 texture (rendered per 32x16 block, IIRC). Texture-mapped 3D using a 64x64 should be possible in theory.
sp_ Member	#11 - Posted: 14 Apr 2012 15:52 Reply Quote If you dont lock the cache 4kb is wasted to cache the writebuffer instead of caching the texture. So your txtures can be 2 times bigger with the same speed
Blueberry Member	#12 - Posted: 15 Apr 2012 10:52 Reply Quote Not if your writebuffer is small (say, one or two scanlines). Even with 8kb of texture data, I think you will be better off caching most of the texture plus the writebuffer and other temporary buffers than caching only the texture and getting cache misses everywhere else. If you were to avoid all memory acceses apart from the texture, you would need to keep all state in registers throughout both the effect and the c2p - doable in principle (in 16 colors, at least), but it doesn't allow for a lot of flexibility in the effects you can do. At one point I experimented with marking the texture memory as non-cachable. If your cache hit rate is very low (which it can easily be in a texture mapper), it can actually increase performance, since you get a predictable delay corresponding to reading one longword instead of the occasional (sometimes frequent) whole-cacheline delay.
jamie2010 Member	#13 - Posted: 11 May 2012 01:57 Reply Quote When you said it's equal to reading one longword, do you know the number of cycles? I'm sure is in the 060 manual but i don't see the information.
Blueberry Member	#14 - Posted: 13 May 2012 15:25 Reply Quote According to the 060 manual, the delay for a noncached read is the same as for the first longword to be available during a line fill. This is, according to my measurements, 8 cycles on my system (Blizzard 1260 50MHz).
jamie2010 Member	#15 - Posted: 13 May 2012 18:15 Reply Quote f... it's a lot but it can be useful for rendering with 2 textures stage, thanks for the info.

A.D.A. Amiga Demoscene Archive, Version 3.0