A.D.A. Amiga Demoscene Archive

        Welcome guest!

  

  

  

log in with SceneID

  

Demos Amiga Demoscene Archive Forum / Coding / coding tutorial: general questions
 Page:  ««  1  2  3  4  5  6  7  8  »» 
Author Message
rload
Member
#1 - Posted: 9 May 2007 00:52
Reply Quote
err.. the bpl.b *+2 should be *+2+2 really which seems to work. I didn't take the bpl.b instruction size into account. blah.
dalton
Member
#2 - Posted: 9 May 2007 09:55
Reply Quote
you can use .label\@ in macros and preprocessor loops.. i think you left out the backslash in previous post
Kalms
Member
#3 - Posted: 9 May 2007 11:09
Reply Quote
In some assemblers, \@ is only given a new (unique) value in each macro instantiation; not inside REPT loops, so you might still have to hand-code jump offsets inside those.
doom
Member
#4 - Posted: 9 May 2007 23:31
Reply Quote
What about a macro in a REPT loop?
rload
Member
#5 - Posted: 10 May 2007 00:08
Reply Quote
Tried it just now. That works perfectly. Hooray!

Tricks 1,
Readability 0
z5_
Member
#6 - Posted: 16 May 2007 21:56
Reply Quote
Let's say that i want the delta value (difference) between two (non-signed) bytes, but i need the absolute value. There's no absolute value command for non floats, it seems, so what is the easiest (most elegant) way to do this. I would substract, compare the result with 0 and if less than just reverse my substraction but i have a feeling that there is a nicer way :)
Kalms
Member
#7 - Posted: 17 May 2007 01:19 - Edited
Reply Quote
[The discussion below assumes that d0 and d1 contain a pair of unsigned values, and that the signed delta lies within the range -$7f .. +$7f.]

The most common way for doing this branchless, is to generate a bit pattern which is $00 when the delta is positive, and $ff when
the delta is negative. That 'mask value' can then be used in logic operations to conditionally flip the sign of the delta value.

The Scc instructions (cc == all different condition code combinations, like with the Bcc conditional jumps) can be used to take CPU flags (zxnvc) and convert them into straight values in registers. ADDX/SUBX/ROXR/ROXL can also be used for grabbing the X flag.

Here is one way of getting the mask value:
sub.b	d0,d1	; Compute delta, with sign
smi	d2	; If result of previous operation was
		; negative, d2 = $ff
		; otherwise, d2 = $00

subx is less flexible (it corresponds only to SMI) but it can generate mask values which are word- or longword-sized:
sub.b	d0,d1	; Compute delta, with sign
subx.l	d2,d2	; If result of previous operation
		; was negative, d2 = $ffffffff
		; otherwise, d2 = $00000000

Now, let's say that we have a mask value which is $00 or $ff depending on whether the delta is positive or negative. What shall we do with it?
Look at how you negate a number on a two's complement machine:
neg.b	d0	; Compute two's complement

This is equivalent to flipping all bits in d0, and then adding 1 to the result:
not.b	d0	; Flip all bits in d0 (compute one's 
		; complement)
addq.b	#1,d0	; compute two's complement

Flipping all bits can be done by XORing with $ff. Adding 1 is the same as subtracting $ff:
eor.b	#$ff,d0	; Flip all bits in d0 (compute
		; one's complement)
sub.b	#$ff,d0	; Compute two's complement

If you do the above EOR/SUB combination with the value $00 instead, the value in d0 will remain unchanged.

So, what to do with our mask value? Use it as input into the EOR/SUB
operation to conditionally flip the sign of the delta! The following piece of code will compute the absolute delta between d0 and d1:
sub.b	d0,d1		; Compute delta, with sign
subx.b	d2,d2		; Generate sign flag
eor.b	d2,d1		; If negative, compute one's
			; complement
sub.b	d2,d1		; ... and two's complement

Another example of usage of Scc/SUBX is to clamp the results of an addition/subtraction to $ff/$00 respectively.

This sequence of code will limit the result of an unsigned addition to $ff (so if the value wraps around, it will directly get reset to $ff again):
add.b	d0,d1	; Compute sum of values; X and C
		; get set if the result is > $ff
scs	d2	; "Set if Carry Set".
		; Carry flag set -> d2  = $ff
		; Otherwise, d2 = $00
or.b	d2,d1	; If carry set, force d1 to contain
		; the value $ff

And the following sequence of code will limit the result of an unsigned subtraction to $00:
sub.b	d0,d1	; Compute difference of values
		; X and C get set if the result
		; goes below $00
scc	d2	; Set if value has NOT wrapped around
and.b	d2,d1	; If value did wrap around, reset
		; value to $00
		; Otherwise, leave value as-is

There are more elaborate things that can be done. Perhaps you have two long sequences of bytes (two images), which you want to add together, but you want to clamp the result at $ff for each individual byte? You can often do the addition/clamping 4 bytes at a time, but takes a bit more code to do the clamping correctly.
z5_
Member
#8 - Posted: 26 May 2007 11:01
Reply Quote
What is "bpl"? I see it used as a command but i can't find it in the 68k reference manual?
Kalms
Member
#9 - Posted: 26 May 2007 14:06
Reply Quote
BPL = branch if plus (positive). It's one variation of the "Bcc" (i.e. conditional branch) instruction.
doom
Member
#10 - Posted: 26 May 2007 15:29
Reply Quote
To avoid confusion, "Bcc" is the template for the general conditional branch where "cc" stands for condition code. "BCC" is the specific version of that instruction that branches when the carry flag is clear.
z5_
Member
#11 - Posted: 24 Jun 2007 19:20 - Edited
Reply Quote
Lately, since about a week, i've got a lot of errors when running my source. The strange thing is: sometimes it works, but from the moment that it doens't work, it will never work again until i quit asm (and/or reset winuae).

My question though: when running my code, i've got the following errors (not at the same time):
- illegal instruction at blabla
- CHK exception at blabla
- lineF emulator

I really need to find out the cause as it's getting very frustrating. So how does one go about it? I've tried experimenting with the debugger but i'm not exactly sure where i should start looking...
StingRay
Member
#12 - Posted: 24 Jun 2007 19:49
Reply Quote
Easiest way to do that would be using a macro:

YOUR_ROUTINE MACRO
move.l d0,d0
add.l #5,d1
bpl .\@label
clr.l d1
.\@label
ENDM

Then you could use:
REPT 5
YOUR_ROUTINE
ENDM

and it's a much cleaner solution than this naughty "fixed size branches". Imagine you add code between your bpl *+2 instruction and your branch target, your code will be nicely screwed up then if you forget to fix the branch distance. :) The bpl.b *+2 doesn't work in Asm-Pro/One, dunno if that's a bug or a feature as I never used this approach to branch. It seems, in Asm-Pro at least, that bpl.b *+2 would branch to "current address+2" where current address is the location of your bpl opcode. So instead of skipping the clr.l d1 instruction, it would execute it instead. Well, another reason why I would not use this approach as it seem ever assembler handles these things differently. And in my opinion, there is no need to use these kind of instructions at all. /end of rant :D
StingRay
Member
#13 - Posted: 24 Jun 2007 19:51 - Edited
Reply Quote
Hmm, weird, my above post was answer to a question z5 had, but for some weird reason, that very question disappeared while I was writing. Z5, enlighten me please. :)

Edit: Turns out, I was answering to an old question. :D Next time I should look more carefully I guess. :-)
StingRay
Member
#14 - Posted: 24 Jun 2007 19:59 - Edited
Reply Quote
Lately, since about a week, i've got a lot of errors when running my source. The strange thing is: sometimes it works, but from the moment that it doens't work, it will never work again until i quit asm (and/or reset winuae).

My question though: when running my code, i've got the following errors (not at the same time):
- illegal instruction at blabla
- CHK exception at blabla
- lineF emulator

I really need to find out the cause as it's getting very frustrating. So how does one go about it? I've tried experimenting with the debugger but i'm not exactly sure where i should start looking...



Sounds as if you trash memory somewhere in your source. So first I'd disable all memory writes and check if these crashes disappear. Altermatively, you could add some "sanity space" :D to your screens and other buffers that you might use and check afterwards if there was something written there. Example:

dcb.l 320*200/4,"STR!" ; sanity buffer with ID :)
CHUNKYBUFFER ds.b 320*200 ; your regular chunkybuffer
dcb.l 320*200/4,"STR!" ; look above

Then execute your source and look at CHUNKYBUFFER-64000 and check the area there. If all is ok (i.e. you didn't trash any memory), you should find the "STR!" longs there. That's how I usually do it.
z5_
Member
#15 - Posted: 25 Jun 2007 19:22
Reply Quote
Next question: when declaring variables, one can't start at an odd address. Can somebody explain the reason why?

On top of that, what about things like included binaries? Can i do this:
test dc.b 5
pic incbin bla
pic2 incbin bla2

I learned that bitplane addresses need to be 2/4/8 byte aligned. But pic and pic 2 aren't my screen buffers. Also, pic2 will have an odd/even address depending on the lenght of pic.
winden
Member
#16 - Posted: 25 Jun 2007 20:30
Reply Quote
devpac notation is to say " cnop 0,16" to request padding so that the next stuff goes to a 16byte aligned address. just use 16 everywhere and you will be safe :)
StingRay
Member
#17 - Posted: 25 Jun 2007 21:36
Reply Quote
Next question: when declaring variables, one can't start at an odd address. Can somebody explain the reason why?

On top of that, what about things like included binaries? Can i do this:
test dc.b 5
pic incbin bla
pic2 incbin bla2

I learned that bitplane addresses need to be 2/4/8 byte aligned. But pic and pic 2 aren't my screen buffers. Also, pic2 will have an odd/even address depending on the lenght of pic.



Of course you can have variables at odd addresses. Though on 68000, you can only have byte variables at odd address. On 680x0 you can also have words/longs at odd addresses. Your example above is totally valid, even on 68000, as long as you only access pic/pic2 bytewise (if you want your code to be 68000 compatible that is). But I'd still use properly aligned variables just because it's faster! :)
z5_
Member
#18 - Posted: 25 Jun 2007 21:52
Reply Quote
Though on 68000, you can only have byte variables at odd address. On 680x0 you can also have words/longs at odd addresses. Your example above is totally valid, even on 68000, as long as you only access pic/pic2 bytewise (if you want your code to be 68000 compatible that is). But I'd still use properly aligned variables just because it's faster! :)

Asm-One is always complaining about variables at odd addresses (for example test dc.b 5, test2 dc.w 10 that needs "even" after the .b). I assume it's like Stingray mentioned: to keep compatibility with 68000? Maybe it's an option?

What do you mean with accessing pic bytewise. Does it mean that this wouldn't work:
lea pic,a0
move.w (a0),d0

The last question: how do you properly align? :o) Examples with variables (.b,.w,.l) + incbin mixed please :o)
StingRay
Member
#19 - Posted: 25 Jun 2007 22:06
Reply Quote
Asm-One is always complaining about variables at odd addresses (for example test dc.b 5, test2 dc.w 10 that needs "even" after the .b). I assume it's like Stingray mentioned: to keep compatibility with 68000? Maybe it's an option?

Indeed, it's an option you have to enable in Asm-One. So press ALT+Crsr down to access the settings and check the "68020++ Odd data" box. :)


What do you mean with accessing pic bytewise. Does it mean that this wouldn't work:
lea pic,a0
move.w (a0),d0


As said, valid on 68020+, on 68000 you will get a nice exception, i.e. your program crashes! :)


The last question: how do you properly align? :o) Examples with variables (.b,.w,.l) + incbin mixed please :o)


if you want to align to the next even address, you simply use CNOP 0,2. To align to the next long, i.e. an address that can be divided by 4, you'd use CNOP 0,4. Here's an example:

SCROLLTEXT dc.b "yes, once every intro had a scroller",0
CNOP 0,2 ; so PICTURE will be at next even address
PICTURE incbin ram:bla ; bla is the best filename ever! :D
CNOP 0,4 ; longword align DATA
DATA dc.l 0,1,2,3,4 ; some table

Hope that helps. :)
lvd
Member
#20 - Posted: 25 Jun 2007 22:12
Reply Quote
68000 has fixed 16bit databus, divided into 2 paths by 8 bits each. It has two strobe signals, UDS and LDS, for each part of databus - instead of having A0 address line on its addressbus. So when it accesses 16bit word at even address, it strobes both UDS and LDS simultaneously. When accessing bytes, it strobes either UDS or LDS depending on which address is - even or odd. So it is obvious that it cannot access a word on odd address, which would lead it to make 2 consecutive byte reads from different (in A23-A1) addresses. So it will simply catch an exception.

On 68020/030 there is totally different situation. Both have 32bit databus, as well as full A31..A0 address bus (on 68ED020 only A23-A0 available). Moreover its bus is dynamically sized, which means it can automatically break any word- or long-sized access depending on which capacity has the address space it accesses (for example, it can be 8bit ROM, then it will read 32bit longs byte by byte, or it can be 32bit fastram, which will be read by 32bit portions). AND, as well, 020/030 is able to break long word transactions into 8- and 16-bit portions when accessing 32-bit long memory at odd address. Obviously, it will slow it down, since instead of reading 32bit at once, it has to do 2 or 3 transactions.

So the rules:

1. if wanting 68k-kompatibility, NEVER access words and longs on odd addresses, otherwise exception will ruin everything.
2. if using 020+, place words at even addresses and longs at mod 4=0 addresses for fastest speed, though accesses at odd addresses are still possible.
StingRay
Member
#21 - Posted: 25 Jun 2007 22:20
Reply Quote
1. if wanting 68k-kompatibility, NEVER access words and longs on odd addresses, otherwise exception will ruin everything.

Well, actually, you can if you patch the "address error" exception vector. :D So it's not impossible to use odd addresses even on 68000. :) If it's useful is another question though.
lvd
Member
#22 - Posted: 25 Jun 2007 22:25
Reply Quote
Well, actually, you can if you patch the "address error" exception vector. :D So it's not impossible to use odd addresses even on 68000. :) If it's useful is another question though.
And then write complete 68000-emulator to emulate EACH word-accessing command inside exception )
Kalms
Member
#23 - Posted: 25 Jun 2007 23:43 - Edited
Reply Quote
Even on 68020+ systems, there are still some alignment rules that need to be adhered to.

All the classic custom-chipset DMA requires the data to be 2, 4- or 8-byte aligned (exact alignment depends on what type of DMA). Audio DMA, for instance, needs 2-byte alignment.

You might encounter this if you attempt to include a module into your program. All samples inside of the module are located at an even number of bytes from the beginning of the module, so if the module itself begins at an even address -> all samples will be located at even addresses.

Align by doing a CNOP 0,2 before including the module.

DMA will generally not fail if a non-aligned address is specified -- it simply ignores the lowest bit(s) of the address (assumes that they are zero).


I should also point out that just doing CNOP 0,1024 will not guarantee that the following chunk of code/data will be 1024-byte aligned in memory: the CNOP command aligns the code/data with respect to the start of the current section, not with respect to memory address 0. To get 1024-byte alignment, the section itself must be 1024-byte aligned. What is the situation under AmigaOS?

AmigaOS's executable-loader will load the program into memory one section at a time. Before loading a section into memory, a buffer large enough to hold the section is allocated using an AllocMem() or similar call. AllocMem() happens to always return 8-byte aligned chunks under all versions of AmigaOS so far.

So: Sections will get 8-byte aligned, and therefore a CNOP 0,2 / CNOP 0,4 / CNOP 0,8 will behave as expected.

If you need more than 8-byte alignment, you have two choices:

1. Implement your own executable loader.
2. After your program has started, move it around a bit in memory so that it ends up N-byte-aligned.
StingRay
Member
#24 - Posted: 26 Jun 2007 00:36
Reply Quote
1. Implement your own executable loader.

Hmm, shouldn't a simple LoadSeg() call be enough?
Kalms
Member
#25 - Posted: 26 Jun 2007 00:45 - Edited
Reply Quote
Hmm, shouldn't a simple LoadSeg() call be enough?

No, that will give you 8-byte alignment. There is no place within the executable where the alignment requirement is specified. If you want a more coarse alignment than what the AmigaOS memory allocator happens to give you, you need to perform the memory allocations yourself.

It might be possible to do this by using InternalLoadSeg() instead.

Still, this means that your executable needs to have its own loader stub as a 1st hunk (similar to the depack header of a crunched executable) and that makes source-level debugging a bit messier.
winden
Member
#26 - Posted: 27 Jun 2007 17:50
Reply Quote
hmm, it would be time to hack a patch for allocmem to always align to 16byte ;)
z5_
Member
#27 - Posted: 30 Jun 2007 17:09 - Edited
Reply Quote
Easy question time. I have read that mulu is slower because it takes two cycles (i assume that this are cpu clock cycles). Do most other instructions take one cycle or less?

For example, which one is faster (on slower hardware):
mulu.w #3,d0

or

add.w d0,d0
lsl.w #1,d0

If other instructions take one cycle, than i would assume that there is no point in replacing the mulu?
StingRay
Member
#28 - Posted: 30 Jun 2007 17:28 - Edited
Reply Quote
First, your code to multiply by 3 is wrong. ;) you multiply by 2 (add.w d0,d0) and then you multiply again by 2 (lsl.w #1,d0) = you multiply by 4. :) To multiply by 3, you have to do this:
move.w d0,d1
add.w d1,d1
add.w d1,d0

:)

On cpu's < 68060 multiplication is pretty slow so it's always worth to optimize it (if you don't code for 060 only that is).
z5_
Member
#29 - Posted: 30 Jun 2007 17:37
Reply Quote
i should think more when writing code in forums :) you could do this aswell:
move.w d0,d1
lsl.w #1,d1
add.w d0,d1

but that doesn't really answer my question. In this case, i'm using 3 instructions to replace one mulu instructions. How many clock cycles will each case take? Are these 3 instruction still faster than one mulu (on slower machines)?
StingRay
Member
#30 - Posted: 30 Jun 2007 17:49 - Edited
Reply Quote
Yes, still faster! And my version is faster than yours (add.w d1,d1 vs. lsl.w #1,d1). :) On 68000, mulu takes about 40 cycles iirc, it's one of the slowest instructions you can have (only divu/divs is even slower). So again, as I already said, if you don't plan to code for 68060 ONLY then it ALWAYS makes sense to optimize multiplications! :)

Edit: to answer your question clearly, these 3 instructions are not just faster than mulu, they are MUCH faster! my version has just 1 move and 2 add instructions which are much faster than one single mulu instruction. (move: 4 cycles, add 8 cycles -> 2*8+4 = 20 cycles (only valid for 68000))
 Page:  ««  1  2  3  4  5  6  7  8  »» 

  Please log in to comment

  

  

  

 

A.D.A. Amiga Demoscene Archive, Version 3.0