Author |
Message |
sp_
Member |
I finished some code to make magic now. Still missing the trackloader source. Stingray(can you finnish it before assembly)? The AGA version will run in one frame 1x1 320x200. 3 ekstra bitplanes will be used for free hardware motionblur. The txturemapper is ideal on the 020 since it fit the 256byte cache. The free buscucles is used by the blitter c2p. I don't have real hardware to speedtest,but it looks like its going to run faster than any demo I have seen so far.
64bit fetchmode and Hires gives a smooth 1x1 display for the Aga version. Perfect speed on a plain a1200.
Looks like I will get some help from my old friends to put it all together.
|
Ralf
Member |
Nice to hear that you are still working on it and that there is a chance to see it at Assembly! ;-)
|
coyote
Member |
Hi sp_,
It's nice to see that Amiga is still not dead! :-)
(especially A500!)
Anyway, I wanted to contribute a bit... It's just a small undocumented hardware thingy that might help you saving several video dma accesses on A500.
If you use 5 bitplanes then you should set it up as 7 bitplanes on A500. This will give you 4 bitplanes read by the dma from the memory, however the 5th bpldat register will also be used but for free (without dma accesses) - you simply need to fill it up yourself. This is perfect if some pattern is needed to be in the 5th bitplane.
After all these years have passed, I am wondering if I have done all the proper testing when I discovered this... For example, now the following crosses my mind - what exacty happens if you set up 1 bitplane and fill other bpldata registers? Hmmm...
Anyway, good luck with your project and I hope that you will make the code of your c2p public. ;-)
|
winden
Member |
!!!
|
ultra
Member |
just did a small test... hm unforunately the trick don't work on winaue it seems... most of the ppl will watch the demo on it... doh...
|
dodke
Member |
not necessarily if there's a proper video available
|
coyote
Member |
I suppose it should not be hard to implement this behaviour in winuae.
Actually at the moment I am waiting for HAM5 to be implemented (ham6 mode with only 5 bitplanes through dma) so one of my old demos would work correctly under winuae. Btw HAM5 works correctly on fellow.
|
ultra
Member |
sp... btw removing the extra blitter pass in the c2p and using the native chunky format is not faster ... ok if you do a cube it's a bit faster ... but many faces are producing more overhead in the section drawloop and the speedgain is gone... i'm using unrolled loops ... and of couse 14 cycles per pixel...
ok maybe routine sucks... but i don't see much to optmise there anymore... so i'll use the extra blitterpass...;)
|
ultra
Member |
new idea how to speed it up... hm that's funny... i always have the best ideas in bed ... shortly before sleep ;)
but hm... for many and small polys i guess the extra pass is still faster... maybe an good idea to include both...
|
Crumb
Member |
@Coyote
which demos? perhaps we could send them to Toni Willen so he adds ham5 support :-)
|
dodke
Member |
also since this thread came up... when are sp and ultra relasing their demos? :)
|
coyote
Member |
To Crumb:
Group was "Lazy Bones". Two released demos in 1993. Links to demos can be found at this forum thread:
http://eab.abime.net/showthread.php?p=418448
"Move Any Mountain" - Simple copper voxel routine. I used fake fast as chip memory in this demo. They finally implemented it in WinUAE, but still one thing is not working properly - there is a twisted logo which in WinUAE shows only as a twisted band without the logo on it. (voxel routine runs indefinitely, but there is an easter egg - click the mouse button for a copper screen picture)
KS 1.3, ECS, 0.5 Chip, 0.5 Slow (exactly this memory configuration or it won't work!)
"B2" - Again, something was recently fixed in WinUAE and something still is not. They fixed the HAM5 (for the spinning/morphing globe) but zoomer/rotator still does not work perfectly (right half of it has some copper timing problems) and also I think that I only once saw that the Croatian flag was correct. Usually with WinUAE it has some problems in the middle of the flag.
One released intro in 1994 (actually only one routine with 60 blocks wide 4096 color zoomer/rotator) called "First Anniversary". This intro also doesn't work as expected because of using of the still unimplemented and above mentioned undocumented "A500 7 bitplanes" trick. I can't find this intro on the net. I really should search through my amiga disks one of these days.
Anyway, I know it doesn't work correctly in WinUAE because I saw a screenshot (http://arabuusimiehet.com/break/amiga/details.php ?id=9761) which must have been taken from some emulator because instead of 3x3 px blocks there are 8x3 px blocks of color. (in this particular zoomer/rotator instead of the regular 4x4 px blocks there should be 3x3 px blocks with 1 px black lines dividing them both horizontally and vertically)
|
sp_
Member |
Ultra,
A fullscreen blitter pass ABC + D will use 8 cycles pr word wriltten(slower with 5bpl dma).
With a 2*2 (320*256) display using my c2p. That's 10240 words (81920 cycles)
When rendering without the byteswap pass, my unrolled innerloop will in worst case plot 8 extra pixels pr scanline,
in average 4 pixels extra.
4 * 14 cycles + abit more. pr. scanline in scrambled penalty.
When i know the longest line in the polygon I can generate excactly as many SMC instructions needed for the loop.
Currently the realtime mapper is 3 muls and 0 divs pr poligon. The mulses can be precalculated away.
..
I might release a 4k with my mapper. but progress is slow.
Nice to see activity on the OSC platform with nice releases. Gives me Inspiration :D
|
sp_
Member |
I said in another thread that the new assembly a500 demo: hardnee lotus was slow. It's probobly the fastest txturemapper released for the Mc68000 on amiga to date.But still my mapper is much faster.
Hardnee lotus renders to a byte buffer using an unrolled smc loop as shown below. The offsets are linear so no scrambling is done in the plotting. I think the txture is scramled to remove a merge in c2p. This byte pr pixel buffer can be converted to planar by using 3 blittermerges. (maybe four..) The outer loop is important too. I see many memory reads that could be improved. Everything should be put in registers. The code that generates the smc can be improved. Subq.w #2,a4 outside the loop, and move.w d2,-2(a4)---> move.w d2,(a4). When scrolling trough the mapper I counted 13(?) mulses pr. poligon.
move.b 0000(a1),-(a4)
move.b 0000(a1),-(a4)
move.b 0000(a1),-(a4)
move.b 0000(a1),-(a4)
...
My c2p converts half the data and I only use one blitter merge. In addition my loop is 14cycles pr pixel, and this is 16. I have 0 muls and 0 divs pr poligon, This routine have 13 (?) muls.
|
britelite
Member |
I now have a new version of the mapper that removes most of the muls. I've also completely rewritten the c2p now, and it's quite a lot faster.
Anyway, someone had to set the bar, and we did it with Hardknee Lotus. Now it's time for everyone else to improve on that, as unreleased routines don't count :)
|
sp_
Member |
Nice job. Keep up the good work..
I agree that unreleased routines doesn't count. I plan to release a 4k with prof of consept.
|
klipper
Member |
without the frames in ADA database, it didn't happen! So common sp_... :)
(sorry, I'm using z5 motivational techniques here! :)
|
ultra
Member |
moin sp,
@8 cycles...
a) yes the byteswap uses abc d... but one of the sources is a constant means 6 so the blitter needs around 136 rasterlines...
b) the blitterpass is not running in blitter nasty... so far i remember
a and d is for free means b starts to slows down the cpu...but during the 136 lines the 68k is still doing things... hard to say how much it is effectivly... depends of course on the code of the 68k during the pass...
but one is for sure... the cpu is not locked out for 136 lines
c) bitplane dma... if it is 200 lines big... there are 112 lines left to use
the blitter there... of course works only for effects which are not running in 1 or 2 vbls
but 3 or 4 vbl 3d is ok for me...
@8 pixels in worst case ?
hm ... in worst case i count 14
00221133 /../ 00221133
01452367 /../ 01452367
-------1 /../ 1-------
(damn its not shown correctly because of the font... should be last pixel of the first
4 bytes and first pixel of the last 4 bytes)
correct me if i'm wrong.
so... you have more pixels you mostly draw...
+saving and restoring the wrong pixels for left and right
+additional logic in the line loop to handle all
hard to say what is really faster... of course it depends how many polys you have
for bigger polys yours is surely faster... for many and smaller i still think my is faster...
sure... to know the longest line is essential... otherwise prestepping would
be a bit stupid ;)
greetz ultra...
|
ultra
Member |
hm... byte swap needs around 70 lines i just checked...
with a quickcheck i saw during the pass the cpu is running with 44% (counting up a mem value and some rasterline check) so cpu is locked for 39 lines...
well not perfekt ... true... the loss is not very heavy... but it's only one effect in the demo so... room for im provement later ;)
hell i'm tired... i need coffee !
|
sp_
Member |
hey Ultra.
In worst case my mapper will calculate 7 pixels extra. +andmask,logic in the lineloop. I could calculate excacly how many extra cycles needed, but we need to keep some secrets before we release something :D When using many polys I think your blitterpass might be faster.
|
ultra
Member |
yeps ... @7... the next day i saw why ;)
|
sp_
Member |
When watching the new excellent C64 demo "Edge of discrace" it inspired me to do some more Amiga 500 coding. Apparently these guys worked on their demo for 7 years, and I think I will manage to finish something by 2015. :D
.
Today I made a depthshaded perspective correct ZoomRotator around 2 axis. Big txtures supported. The routine is similar to the tilt-zoomer in roots by Sanity, but mine runs on a500 7mhz 1mb. 2x2 25fps The zoomer in roots had 16x16 txtures, and ran 50fps in 1x1 on a 14mhz amiga 1200.
Next routine to code is a dualplayfield copperplasma in interlaced mode. Then a infinite Fractalzoomer with a new approach. After that I will work more on the txturemapper and the world routine.
Finally start put it all together with a trackloader, modules and gfx.
|
z5_
Member |
sp: should i change the topic of this thread into Breakpoint 2009 then? :) Come on, amaze us :)
|
sp_
Member |
I still have some unreleased code to show:) Progress is slow:)) its been 4-5 years since my announcement of a demo. Hehe. I use a special trick to get 4 bpl dma with 5 bpl screen. I set 4 bpl and set a mask of %101010101... In bpldat5. Then tha a500 will show a 5bpl screen with dma of 4 bpl.. Last time I worked on the smc mapper was in 2010. Fixing some bugs and optimizing some more. Anyway . There is an old rebels coder in town who might help me to release it:)
Has there been any recent ocs productions i should check out?
|
d0DgE
Member |
Last year two nice A500 4kB intros got released at Revision. One by coded Hitchhikr and one by Britelite
|
slayer
Member |
sp_: I repeat what I wrote somewhere here. The only good place / time for A500 demo release is Revision 2012 and Oldskool Demo Compo :)
|
Angry Retired Bastard
Member |
As I'm not releasing my A500 demo at Revision (it's simply nowhere near releasable) I strongly disagree with the "only good place/time"-bit! ;) However, it would be great to see some new stuff from sp soon, so he should absolutely aim for easter! :)
|
z5_
Member |
Sometime ago, there was an interesting discussion about code optimising versus releasing something (or rather how continiously optimising code means not releasing something). Since this thread was started in 2007, this seems to be a good example :)
|
HM Kaiser
Member |
That reminds me a discussion I had with the leader of the group a few days ago. To keep a platform alive, the platform needs releases, even if world records are not broken every time... It's just nice to see what people created ! they spent time to create things, and often did their best, so let's enjoy the work and effort :-) It seems some people don't want to create stuff if they don't think they could win a competition ? But, we can do it just for fun :-)
It could be nice to make a census : who is active in the demoscene ? :-)
|
Angry Retired Bastard
Member |
I'm active in the demoscene, released 2 demos last year featuring some (supposedly) "new/advanced" stuff, and won both compos. I think winning compos is really fun, so sometimes I try to do just that. ;) This obviously doesn't mean that all my releases have to be dead serious (well, duh!), but if I start a project I believe has some cool potential then I sure as hell won't dumb it down just to have a release at a party I'm not even attending. :)
|