A.D.A. Amiga Demoscene Archive

        Welcome guest!

  

  

  

log in with SceneID

  

Demos Amiga Demoscene Archive Forum / Coding / My first effect / Winuae precision?

 

Author Message
LaBodilsen
Member
#1 - Posted: 7 Feb 2016 17:04
Reply Quote
Hi All
i've slowly been chipping away at relearning assembler programming on Amiga again, and can proudly say that i have now made my first effect :D

a DOTFLAG .. yeah i know, it was just to do something.

it was inspired by this thread: http://eab.abime.net/showthread.php?t=50595 , where PCM did the same thing because i he was bored, but i thought it would be a good exercise.

But: PMC mentioned he plotted 1134 dots with his plotting rutine, without doing anything else (clearing, sinus etc.) and my current rutine sits at 1472 sineplots in a Dotflag with clearing and all. (i know, it's not a record of any kind)

So my question is really, Can i trust Winuae to be cycle precise, my settings when testing was set to AGA chipset, cycle exact, 60020 CPU, with Aproximate A500/A1200 speed, and CPU frequency set a 2 x (A500) 7.09mhz.

Would that be enough to estimate the speed of a real A500, or do i have to set it at 68000, KS1.3 etc. to be more precise?

i'm targeting A500 OCS when programming at the moment, but KS3.1 and Asm-Pro, is just a nicer enviroment to be working in, hence i set it to A1200 :)
Blueberry
Member
#2 - Posted: 8 Feb 2016 22:23
Reply Quote
Dotflags are cool. :)

The settings you mention would correspond to something in between A500 and A1200 (an A1200 with underclocked CPU). For an A500, you have to set chipset to "ECS Agnus" and CPU to 68000. The kickstart probably doesn't matter much if you shut down the system while your demo is running anyway.

If you want a nicer development environment, you can add lots of Chip and Slow RAM (Fast RAM is faster, so it skews your timings if your demo ends up there). Then set the CPU speed to "Fastest possible" while you edit and then switch it to cycle-exact when you speed test.

The A500 emulation in WinUAE is extremely accurate. The A1200 emulation accuracy has improved lately, but I don't know how good it is.

I have sometimes seen the A500 emulation running slightly slower (that is, slower than a real A500) when I have a HD configured in WinUAE, even if the system is completely disabled. Has anyone else seen something like this?
LaBodilsen
Member
#3 - Posted: 9 Feb 2016 18:54
Reply Quote
Thank you for the answer.. and yeah dotflags actually do look pretty sweet, even though it's a very oldschool effect.

I tried setting it as you suggested, and was surprised how much of a performance hit it took. :(

I had optimized the rutine even further, so it did 2048 dotflag plots on 020, 7.09mhz AGA, chipram only. but loading it up with 68000, OCS, KS1.3, Chip+fakefast. Showed only about 1216 dots possible. (at least it still beats PMC :D)

i was under the impression that AGA was not faster than OCS/ECS, and Winuae did'nt work faster or slower depending on CPU choice, but seems i was very much wrong :)

So back to the optimizing, as i really would like it to do 2048 dots on an A500, and i still have a few ideas on how to optimize it further, but it will require me to rewrite the rutine (again).
Blueberry
Member
#4 - Posted: 10 Feb 2016 09:58 - Edited
Reply Quote
With CPU speed set to "Fastest possible", the speed probably doesn't differ much between the different CPUs. But in cycle-exact mode, WinUAE tries to match the actual speed of the CPU, and a 68020 can do a lot more than a 68000 per clock. Not least because the 68020 has an instruction cache, which means it is less hampered by the chip ram for instruction execution.
LaBodilsen
Member
#5 - Posted: 10 Feb 2016 19:24
Reply Quote
That would explain the difference i see. no wonder i like my A1200 so much more than my A500 back in the day

btw: now at 1664 dotflag plots, almost there.. :)
LaBodilsen
Member
#6 - Posted: 13 Feb 2016 16:46
Reply Quote
Mission accomplished!!!

2048 plots in a dotflag ... o/
I even managed to implement som variation in the sinewave, so it don't look to boring. Had it working at 1728 for quite some time, and a little Loop unrolling pushed it above 2048.

unfortunately there is not muh raster time left to do anything else, like music etc.

so i ask, is there a way to unrolling a loop wih REPT .. code .. ENDR, where a can add a changing variable?..

example:

rept 10
Add #0,D1
EndR


and for every repeat, it will change #0 to some thing else, like

rept 10
Add #4*@Rept,D1
EndR


so it will keep increasing the value that is added to D1, for every repeat step?

Angry Retired Bastard
Member
#7 - Posted: 13 Feb 2016 17:46
Reply Quote
Asm-one notation

rval set 0
rept 10
add.w #4*rval,d1
rval set rval+1
endr
LaBodilsen
Member
#8 - Posted: 13 Feb 2016 18:26
Reply Quote
Hi
Thank you so much, i've been looking all over google for this.

unfortunately it did not give me any speed increase as:
Addq #$4,D1
Add.w d1,d3


is just as fast as
Addi.w #$0004,d1

(can't use Addq as the value will at some point be bigger than 128 :( )

but big thanks anyway, as i'm sure it will come to great use at some other point.

/Regards
dodke
Member
#9 - Posted: 14 Feb 2016 11:16
Reply Quote
The REPTN keyword is also handy in rept loops like that

_greypal
rept 16
dc.w $111*REPTN
endr


About the routine, maybe there would be a somewhat expensive calculation that could be interpolated between certain number of frames or dots?
LaBodilsen
Member
#10 - Posted: 14 Feb 2016 12:32
Reply Quote
Hi Dodke
thanks for the hint, but it seems like REPT dont work in Asm-Pro, Slummys information worked in Asm-Pro, and even though it did not give any improvement in speed, it did free up a register.

Regarding the routine. It uses a rather naive approach of getting Y and X sin for every dot, and then add the Screen position to it.

Innerloop looks like this (first X=D3 and Y=D2 is loaded before this point, A0 = Y sin table pre*40, A1 = X sin table, A6 = Drawbuffer)
	moveq	#YSize-1,D0		;Number of Y-Rows

Next_YRow: ;.Draw_Dot:
Rval Set 0
REPT 64
Add.w #$4*Rval,D3 ;Add X-Pos offset to X

Move.w D3,D4
Asr.w #3,D4
Add.w D4,D2 ;Add X-Pos byte to Y-Sin
Not.w D3 ;Reverse Bits
Bset D3,(A6,D2) ;Set bit in DrawBuffer
;.Next_Dot
Move.w (A0)+,D2 ;Get next Y-Sin
Move.w (A1)+,D3 ;Get next X-Sin
Rval Set Rval+1
ENDR

Lea 40*4(A6),A6 ;Add #40*4 to A6 drawbuffer (Y-Pos Offset)

Lea -(XSize-1)*2(A0),A0 ;Set Y Sintable to Start + 4
Move.w (A0)+,D2 ;Get next Y-Sin

Lea -(XSize)*2(A1),A1 ;Set X Sintable to Start + 2
Move.w (A1)+,D3 ;Get next X-Sin

dbf D0,Next_YRow


As there is a lot of Add's and Asr going, i'm currently thinking of using the Blitter to Pre-add and shift the values every frame. but it might be going a bit overboard. And as this was mostly done as an exercise for me, and i don't intend to rehash every oldschool effect, and release it. I might just let i be in it's current form, and move on to something more interesting :)
LaBodilsen
Member
#11 - Posted: 15 Feb 2016 17:23
Reply Quote

after posting last i discovered and quick speed up. instead of getting Y-Sin value to D2, and adding D4 to it, why not just at the value directly to D4.. yay 2176 Dots.

	moveq	#YSize-1,D0	;Number of Y-Colomns
Next_YRow
;.Draw_Dot:
Rval Set 0
REPT 64
Move.w (A1)+,D3 ;Get next X-Sin
Add.w #$4*Rval,D3 ;Add X-Pos offset to X
Move.w D3,D4
Asr.w #3,D4
Add.w (A0)+,D4 ;Add Y-Sin to X-Pos byte
Not.w D3 ;Reverse Bits
Bset D3,(A6,D4) ;Set bit in DrawBuffer
Rval Set Rval+1
ENDR
;.Next_Dot
Lea 40*4(A6),A6 ;Add #40*4 to A6 drawbuffer
Lea -(XSize-2)*2(A0),A0 ;Set Y Sintable to Start + 2
Lea -(XSize-1)*2(A1),A1 ;Set X Sintable to Start + 2
dbf D0,Next_YRow


This exercise turned out to be quite rewarding, starting of with a rather slow rutine that used All Data and address register. and ending up with the above snippet.

oh the joy of programming assembler again, it sure beats the Excel macros i've been doing lately :D

dodke
Member
#12 - Posted: 15 Feb 2016 20:10
Reply Quote
Nice!
And good to know the other workaround. I haven't used asm-pro but perhaps sometime.

I was going to suggest unrolling the whole loop. :) I suppose for 2k dots it wouldn't be too massive still.
Another optimisation could be if your tables aren't too large you could have a specialised one that has the data already in the right format that doesn't need to be shifted or anything. But not sure if that would help in this case if you needed to add another mem read.

 

  Please log in to comment

  

  

  

 

A.D.A. Amiga Demoscene Archive, Version 3.0