|
Author |
Message |
LaBodilsen
Member |
Hi All i've slowly been chipping away at relearning assembler programming on Amiga again, and can proudly say that i have now made my first effect :D
a DOTFLAG .. yeah i know, it was just to do something.
it was inspired by this thread: http://eab.abime.net/showthread.php?t=50595 , where PCM did the same thing because i he was bored, but i thought it would be a good exercise.
But: PMC mentioned he plotted 1134 dots with his plotting rutine, without doing anything else (clearing, sinus etc.) and my current rutine sits at 1472 sineplots in a Dotflag with clearing and all. (i know, it's not a record of any kind)
So my question is really, Can i trust Winuae to be cycle precise, my settings when testing was set to AGA chipset, cycle exact, 60020 CPU, with Aproximate A500/A1200 speed, and CPU frequency set a 2 x (A500) 7.09mhz.
Would that be enough to estimate the speed of a real A500, or do i have to set it at 68000, KS1.3 etc. to be more precise?
i'm targeting A500 OCS when programming at the moment, but KS3.1 and Asm-Pro, is just a nicer enviroment to be working in, hence i set it to A1200 :)
|
Blueberry
Member |
Dotflags are cool. :)
The settings you mention would correspond to something in between A500 and A1200 (an A1200 with underclocked CPU). For an A500, you have to set chipset to "ECS Agnus" and CPU to 68000. The kickstart probably doesn't matter much if you shut down the system while your demo is running anyway.
If you want a nicer development environment, you can add lots of Chip and Slow RAM (Fast RAM is faster, so it skews your timings if your demo ends up there). Then set the CPU speed to "Fastest possible" while you edit and then switch it to cycle-exact when you speed test.
The A500 emulation in WinUAE is extremely accurate. The A1200 emulation accuracy has improved lately, but I don't know how good it is.
I have sometimes seen the A500 emulation running slightly slower (that is, slower than a real A500) when I have a HD configured in WinUAE, even if the system is completely disabled. Has anyone else seen something like this?
|
LaBodilsen
Member |
Thank you for the answer.. and yeah dotflags actually do look pretty sweet, even though it's a very oldschool effect.
I tried setting it as you suggested, and was surprised how much of a performance hit it took. :(
I had optimized the rutine even further, so it did 2048 dotflag plots on 020, 7.09mhz AGA, chipram only. but loading it up with 68000, OCS, KS1.3, Chip+fakefast. Showed only about 1216 dots possible. (at least it still beats PMC :D)
i was under the impression that AGA was not faster than OCS/ECS, and Winuae did'nt work faster or slower depending on CPU choice, but seems i was very much wrong :)
So back to the optimizing, as i really would like it to do 2048 dots on an A500, and i still have a few ideas on how to optimize it further, but it will require me to rewrite the rutine (again).
|
Blueberry
Member |
With CPU speed set to "Fastest possible", the speed probably doesn't differ much between the different CPUs. But in cycle-exact mode, WinUAE tries to match the actual speed of the CPU, and a 68020 can do a lot more than a 68000 per clock. Not least because the 68020 has an instruction cache, which means it is less hampered by the chip ram for instruction execution.
|
LaBodilsen
Member |
That would explain the difference i see. no wonder i like my A1200 so much more than my A500 back in the day
btw: now at 1664 dotflag plots, almost there.. :)
|
LaBodilsen
Member |
Mission accomplished!!! 2048 plots in a dotflag ... o/ I even managed to implement som variation in the sinewave, so it don't look to boring. Had it working at 1728 for quite some time, and a little Loop unrolling pushed it above 2048. unfortunately there is not muh raster time left to do anything else, like music etc. so i ask, is there a way to unrolling a loop wih REPT .. code .. ENDR, where a can add a changing variable?.. example: and for every repeat, it will change #0 to some thing else, like rept 10 Add #4*@Rept,D1 EndR
so it will keep increasing the value that is added to D1, for every repeat step?
|
Angry Retired Bastard
Member |
Asm-one notation rval set 0 rept 10 add.w #4*rval,d1 rval set rval+1 endr
|
LaBodilsen
Member |
Hi Thank you so much, i've been looking all over google for this. unfortunately it did not give me any speed increase as: is just as fast as (can't use Addq as the value will at some point be bigger than 128 :( ) but big thanks anyway, as i'm sure it will come to great use at some other point. /Regards
|
dodke
Member |
The REPTN keyword is also handy in rept loops like that _greypal rept 16 dc.w $111*REPTN endr
About the routine, maybe there would be a somewhat expensive calculation that could be interpolated between certain number of frames or dots?
|
LaBodilsen
Member |
Hi Dodke thanks for the hint, but it seems like REPT dont work in Asm-Pro, Slummys information worked in Asm-Pro, and even though it did not give any improvement in speed, it did free up a register. Regarding the routine. It uses a rather naive approach of getting Y and X sin for every dot, and then add the Screen position to it. Innerloop looks like this (first X=D3 and Y=D2 is loaded before this point, A0 = Y sin table pre*40, A1 = X sin table, A6 = Drawbuffer) moveq #YSize-1,D0 ;Number of Y-Rows
Next_YRow: ;.Draw_Dot: Rval Set 0 REPT 64 Add.w #$4*Rval,D3 ;Add X-Pos offset to X
Move.w D3,D4 Asr.w #3,D4 Add.w D4,D2 ;Add X-Pos byte to Y-Sin Not.w D3 ;Reverse Bits Bset D3,(A6,D2) ;Set bit in DrawBuffer ;.Next_Dot Move.w (A0)+,D2 ;Get next Y-Sin Move.w (A1)+,D3 ;Get next X-Sin Rval Set Rval+1 ENDR
Lea 40*4(A6),A6 ;Add #40*4 to A6 drawbuffer (Y-Pos Offset)
Lea -(XSize-1)*2(A0),A0 ;Set Y Sintable to Start + 4 Move.w (A0)+,D2 ;Get next Y-Sin
Lea -(XSize)*2(A1),A1 ;Set X Sintable to Start + 2 Move.w (A1)+,D3 ;Get next X-Sin
dbf D0,Next_YRow As there is a lot of Add's and Asr going, i'm currently thinking of using the Blitter to Pre-add and shift the values every frame. but it might be going a bit overboard. And as this was mostly done as an exercise for me, and i don't intend to rehash every oldschool effect, and release it. I might just let i be in it's current form, and move on to something more interesting :)
|
LaBodilsen
Member |
after posting last i discovered and quick speed up. instead of getting Y-Sin value to D2, and adding D4 to it, why not just at the value directly to D4.. yay 2176 Dots. moveq #YSize-1,D0 ;Number of Y-Colomns Next_YRow ;.Draw_Dot: Rval Set 0 REPT 64 Move.w (A1)+,D3 ;Get next X-Sin Add.w #$4*Rval,D3 ;Add X-Pos offset to X Move.w D3,D4 Asr.w #3,D4 Add.w (A0)+,D4 ;Add Y-Sin to X-Pos byte Not.w D3 ;Reverse Bits Bset D3,(A6,D4) ;Set bit in DrawBuffer Rval Set Rval+1 ENDR ;.Next_Dot Lea 40*4(A6),A6 ;Add #40*4 to A6 drawbuffer Lea -(XSize-2)*2(A0),A0 ;Set Y Sintable to Start + 2 Lea -(XSize-1)*2(A1),A1 ;Set X Sintable to Start + 2 dbf D0,Next_YRow
This exercise turned out to be quite rewarding, starting of with a rather slow rutine that used All Data and address register. and ending up with the above snippet. oh the joy of programming assembler again, it sure beats the Excel macros i've been doing lately :D
|
dodke
Member |
Nice! And good to know the other workaround. I haven't used asm-pro but perhaps sometime.
I was going to suggest unrolling the whole loop. :) I suppose for 2k dots it wouldn't be too massive still. Another optimisation could be if your tables aren't too large you could have a specialised one that has the data already in the right format that doesn't need to be shifted or anything. But not sure if that would help in this case if you needed to add another mem read.
|
|
|