|
Author |
Message |
Krishna
Member |
Hi all,
with some friends and after seeing the prods at the breakpoint, we have decided to code an Amiga demo for the next breakpoint. It's a bit hard to test something on winuae, so if someone can help me :)
It's a little question about accesses : I have a buffer in fastmem and the addy of the buffer is longwords alligned (check the code ...)
example :
Section Buffer,BSS_F
cnop 0,8
Buffer: ds.b 320*256
And inside the code I have :
lea buffer,a0
move.l d0,(a0)+ ; (1)
add.l #2,a0
move.l d0,(a0)+ ; (2)
add.l #1,a0
move.l d0,(a0)+ ; (3)
Does it take the same cpu time on 060 for (1), (2) and (3) or will I have some penalties for (2) and (3) compared to (1) ?
Thanx in advance
Krishna
Bye the way we are looking for graphists (2D & 3D) ^^
|
krabob
Member |
what , still noone to give a hand ??
> move.l d0,(a0)+ ; (1)
> add.l #2,a0
it looks strange to me: "move.l d0,(a0)+" puts the 4 byte value in the memory pointed by a0, and THEN make a0=a0+4 , so that a0 points the memory just after. in that case, the a0=a0+4 is cpu-time-free.
usually there is no need for adds after. if you want to just write memory without automtic increment, don't set the '+'.
be warned: on 680x0, all read/write access with .w (2 bytes) or .l (4 bytes) must be done on * EVEN * address, or it will crash. So your add.l #1,a0 is strange.
If you target a 68060, it has a 8Kb data cache and a 8Kb code cache, and a 16 bytes cacheline. It means: if you have to write 16 bytes data, write then contiguously on a 16 byte aligned adress. if you have to read and not write 16 bytes, read them ona 16 bytes contiguous address, and don't write any of them.
|
Kalms
Member |
Just a note -- it is only on 68000/010 that .w and .l accesses to odd addresses generate exceptions. 68020+ handle all unaligned accesses properly (albeit slowly) by splitting the access into multiple aligned accesses.
|
winden
Member |
hmmm... if these memory is beign cached in datacache, unaligned should go as fast as aligned, shouldn't it??? hmmm... 060 book time... chapter 10 page 11 says that misaligned reads take 1 extra clock cycle, and misaligned writes (or read-modify-write) 2 extra clock cycles.
|
Krishna
Member |
Just a note -- it is only on 68000/010 that .w and .l accesses to odd addresses generate exceptions. 68020+ handle all unaligned accesses properly (albeit slowly) by splitting the access into multiple aligned accesses.
ok so I will have penalties for (2) and (3), thank you.
|
Kalms
Member |
The 68060 datacache can only perform one access per cycle.
When you read a byte/word/longword from datacache, the datacache will deliver one aligned longword to the core, and then the core picks out the byte(s) that it is interested in.
If the word/longword crosses a longword-boundary (read word from N+3 or read longword from N+1,N+2,N+3), then the datacache must deliver two longwords (first longword N, then longword N+4) to core. This will keep the datacache busy for 2 cycles instead of 1 cycle. Also, the core will stop execution of pOEP & sOEP for 1 cycle.
Aligned read performance:
move.l (a0),d0 ; cycle 1 pOEP
move.l d1,d2 ; cycle 1 sOEP
Misaligned read performance:
move.l 1(a0),d0 ; cycle 1 pOEP - datacache fetches bytes 1(a0) thru 3(a0)
; stall -- cycle 1 sOEP
; stall -- cycle 2 pOEP - datacache fetches byte 4(a0), move.l completes
move.l d1,d2 ; cycle 2 sOEP
Misaligned writes within a longword seem to be OK, while writing across the longword boundary gives a penalty of 2 cycles.
Aligned write performance:
move.l d0,(a0) ; cycle 1 pOEP
move.l d1,d2 ; cycle 1 sOEP
Misaligned write performance:
move.l d0,1(a0) ; cycle 1 pOEP
; stall -- cycle 1 sOEP
; stall -- cycle 2 pOEP
; stall -- cycle 2 sOEP
; stall -- cycle 3 pOEP
move.l d1,d2 ; cycle 3 sOEP
I don't know how read-modify-write instructions (such as "add.l d0,(a0)") work with misaligned accesses.
|
krabob
Member |
|
|
|