Author |
Message |
dalton
Member |
I've recently started working on some 3d routines, and I don't won't the code to depend on the fpu so i use fixed point maths. It's just that I can't really decide where to split the numbers.
16:16 is nice, but muls give 64bit-results, and some juggling with words and registers is required every time.
8:8 is also nice, but almost any mul will make the result go out of the word.
So, what do you people use? I guess after some time you find a good compromise. Perhaps 24:8 or 10:6 ? What are your experiences?
|
rload
Member |
I use 18:14
|
kUfa
Member |
I wouldn t go for something below xx:10, unless you dont care too much about precision. I usually use 22:10.. Fpu is cool
|
rload
Member |
fpu roolz.. no matter how structured youre with the fixed point youll end up in bitshifter hell.
|
dalton
Member |
bitshifter hell indeed...how fast is the fpu in comparision to using the 060 and shifts?
|
rload
Member |
im not really sure.. but it is slower.. that is what Im sure of.. and btw.. I heard floating point ADD is slower than MUL actually.. any idea on that?
|
krabob
Member |
karate use 16:16 :-)
swap is faster than shifting.
|
Pezac
Member |
A general question for formats like 18:14 and 16:16.
How do you handle multiplications in a good way, granted you use the full range? We get >32bit result and we need to use 64bit multiply (emulated on 060 mind you!). Even if we use 64bit multiply we still need to bitshift to get back to the original format, because the fraction bits over 32 will not match the original format.
Edit:
This was quite a silly post :) However I can't delete it!
Explanation why it was silly: If you use 64bit instructions there will not be additional instructions compared to the usual way. You make a 64bit multiply and then a shift the upper 32bit result. This is the same as you will do with all fixed point calculations i.e. mul+shift.
But then again if you have a format like 16:16 and make a multiply, the fraction will be 32bit and will all be placed in the low 32bit result of the 64bit operation. Not nice ;)
|
krabob
Member |
hey pezac, actually I had the same questions with karate. there is some "muls.l" left in the code, but most of the time I multiply 16:16 with:
; here d0 and d1 are 16:16 we will multiply.
asr.l #8,d0 ; d0 become 24:8
asr.l #8,d1 ; d1 become 24:8
muls.l d0,d1 ; d1 magically is a 16:16 again.
This way, I got enough pecision before and after the "point."
... It takes a lot of time to come to my mind (How idiot i can be sometimes), but when you do a: 64b result = 32b x 32b, and then recast the 64b (32:32) to a 32b (16:16) , you loose as many "bit meaning" as you do with the (asr,asr,muls) I typed before :-)
So I think I will just kick out really all muls.l d0,d1:d2 in karate3D.
What is more interesting: I've done a function to multiply 4x4 matrixes with this muls, And i do test for 0.0 and 1.0 cases to avoid the muls.l: ( actually in 3D matrixes, there are a lot of 0 and 1.0.)
|
winden
Member |
In my latest 3d engine (unpublished yet), I recall using .X for vertices and .Y for rotation matrix such that X+Y=16... this way I could kill decimal places for Z coord with just a swap and then divs.w
Also I recall using 32x32->64 and 64/64->32 operations for synthesis on the floormapper and it ran great on 030 and 040... but 060 had to trap and emulate those in software which was REALLY slow... like going 2 frames on 030 and 8 frames on 060... so the night before kalms left for assembly... wasi, kalms, peskanov and me were sitting at IRC at 3am trying to guess how to make it run without recoding the whole routine... finally decided to detect 060 and then use FPU for that part of the calculation:
mul64by64:
fmove.l d0,fp0
fmove.l d1,fp1
fmuls.s fp0,fp1
fmuls.s .divide_by_65536,fp1
fmove.l fp1,d1
rts
.divide_by_65546 dc.s 1.0 / 65536.0
|
rload
Member |
how about using emulated floating point arithmetic...
If you use a separate mantissa and exponents you can have great range and enough precision..
Say we have 15 bits of mantissa and 16 bits exponent a mul becomes
muls d_mantissa1, d_mantissa2
asr.l #8,d_mantissa2
asr.l #7,d_mantissa2
add.w d_exponent1, d_exponent2
And we get an effective range of +-2^32767 to +-2^-32768 with 15 bits of precision :)
For adds and subs you must make sure that the exponents are equal or else there will be errors..
|
Pezac
Member |
krabob wrote:
... It takes a lot of time to come to my mind (How idiot i can be sometimes), but when you do a: 64b result = 32b x 32b, and then recast the 64b (32:32) to a 32b (16:16) , you loose as many "bit meaning" as you do with the (asr,asr,muls) I typed before :-)
I see what you mean, but isn't it a difference between cutting some bits before the operation and cutting some bits after the operation? I mean if we talk about what is more computational correct.
|
rload
Member |
cutting before reduces precision
|
Pezac
Member |
Yes, that was my point loaderror. But you just had to spell it out, didn't you? :)
|
rload
Member |
Pezac is from Sweden :)
|
Pezac
Member |
Care to explain what you mean? At least in a fun way :)
Or are you just grumpy because you have no soccer team in the world cup?
|
rload
Member |
soccer is unimportant!!!
|
StingRay
Member |
Loady: you are soooo right! :)
|