A.D.A. Amiga Demoscene Archive

  Welcome guest! Please register a new account or log in

  

  

  

log in with SceneID

  

Demos Amiga Demoscene Archive Forum / Coding / writing and reading in fast men

 

Author Message
Krishna
Member
#1 - Posted: 29 May 2006 11:43
Reply Quote
Hi all,

with some friends and after seeing the prods at the breakpoint, we have decided to code an Amiga demo for the next breakpoint. It's a bit hard to test something on winuae, so if someone can help me :)
It's a little question about accesses : I have a buffer in fastmem and the addy of the buffer is longwords alligned (check the code ...)

example :
Section Buffer,BSS_F

cnop 0,8
Buffer: ds.b 320*256


And inside the code I have :
lea buffer,a0

move.l d0,(a0)+ ; (1)
add.l #2,a0

move.l d0,(a0)+ ; (2)
add.l #1,a0

move.l d0,(a0)+ ; (3)

Does it take the same cpu time on 060 for (1), (2) and (3) or will I have some penalties for (2) and (3) compared to (1) ?

Thanx in advance

Krishna

Bye the way we are looking for graphists (2D & 3D) ^^
krabob
Member
#2 - Posted: 30 May 2006 10:23
Reply Quote
what , still noone to give a hand ??

> move.l d0,(a0)+ ; (1)
> add.l #2,a0

it looks strange to me: "move.l d0,(a0)+" puts the 4 byte value in the memory pointed by a0, and THEN make a0=a0+4 , so that a0 points the memory just after. in that case, the a0=a0+4 is cpu-time-free.
usually there is no need for adds after. if you want to just write memory without automtic increment, don't set the '+'.

be warned: on 680x0, all read/write access with .w (2 bytes) or .l (4 bytes) must be done on * EVEN * address, or it will crash. So your add.l #1,a0 is strange.

If you target a 68060, it has a 8Kb data cache and a 8Kb code cache, and a 16 bytes cacheline. It means: if you have to write 16 bytes data, write then contiguously on a 16 byte aligned adress. if you have to read and not write 16 bytes, read them ona 16 bytes contiguous address, and don't write any of them.
Kalms
Member
#3 - Posted: 30 May 2006 12:17
Reply Quote
Just a note -- it is only on 68000/010 that .w and .l accesses to odd addresses generate exceptions. 68020+ handle all unaligned accesses properly (albeit slowly) by splitting the access into multiple aligned accesses.
winden
Member
#4 - Posted: 30 May 2006 13:02
Reply Quote
hmmm... if these memory is beign cached in datacache, unaligned should go as fast as aligned, shouldn't it??? hmmm... 060 book time... chapter 10 page 11 says that misaligned reads take 1 extra clock cycle, and misaligned writes (or read-modify-write) 2 extra clock cycles.
Krishna
Member
#5 - Posted: 30 May 2006 19:05
Reply Quote
Just a note -- it is only on 68000/010 that .w and .l accesses to odd addresses generate exceptions. 68020+ handle all unaligned accesses properly (albeit slowly) by splitting the access into multiple aligned accesses.


ok so I will have penalties for (2) and (3), thank you.
Kalms
Member
#6 - Posted: 1 Jun 2006 00:55
Reply Quote
The 68060 datacache can only perform one access per cycle.


When you read a byte/word/longword from datacache, the datacache will deliver one aligned longword to the core, and then the core picks out the byte(s) that it is interested in.
If the word/longword crosses a longword-boundary (read word from N+3 or read longword from N+1,N+2,N+3), then the datacache must deliver two longwords (first longword N, then longword N+4) to core. This will keep the datacache busy for 2 cycles instead of 1 cycle. Also, the core will stop execution of pOEP & sOEP for 1 cycle.

Aligned read performance:

move.l (a0),d0 ; cycle 1 pOEP
move.l d1,d2 ; cycle 1 sOEP

Misaligned read performance:

move.l 1(a0),d0 ; cycle 1 pOEP - datacache fetches bytes 1(a0) thru 3(a0)
; stall -- cycle 1 sOEP
; stall -- cycle 2 pOEP - datacache fetches byte 4(a0), move.l completes
move.l d1,d2 ; cycle 2 sOEP


Misaligned writes within a longword seem to be OK, while writing across the longword boundary gives a penalty of 2 cycles.

Aligned write performance:

move.l d0,(a0) ; cycle 1 pOEP
move.l d1,d2 ; cycle 1 sOEP

Misaligned write performance:

move.l d0,1(a0) ; cycle 1 pOEP
; stall -- cycle 1 sOEP
; stall -- cycle 2 pOEP
; stall -- cycle 2 sOEP
; stall -- cycle 3 pOEP
move.l d1,d2 ; cycle 3 sOEP


I don't know how read-modify-write instructions (such as "add.l d0,(a0)") work with misaligned accesses.
krabob
Member
#7 - Posted: 1 Jun 2006 17:25 - Edited
Reply Quote

 

  Please register a new account or log in to comment

  

  

  

 

A.D.A. Amiga Demoscene Archive, Version 3.0