A.D.A. Amiga Demoscene Archive

Amiga Demoscene Archive Forum / Coding / writing and reading in fast men

Author	Message
Krishna Member	#1 - Posted: 29 May 2006 11:43 Reply Quote Hi all, with some friends and after seeing the prods at the breakpoint, we have decided to code an Amiga demo for the next breakpoint. It's a bit hard to test something on winuae, so if someone can help me :) It's a little question about accesses : I have a buffer in fastmem and the addy of the buffer is longwords alligned (check the code ...) example : Section Buffer,BSS_F cnop 0,8 Buffer: ds.b 320*256 And inside the code I have : lea buffer,a0 move.l d0,(a0)+ ; (1) add.l #2,a0 move.l d0,(a0)+ ; (2) add.l #1,a0 move.l d0,(a0)+ ; (3) Does it take the same cpu time on 060 for (1), (2) and (3) or will I have some penalties for (2) and (3) compared to (1) ? Thanx in advance Krishna Bye the way we are looking for graphists (2D & 3D) ^^
krabob Member	#2 - Posted: 30 May 2006 10:23 Reply Quote what , still noone to give a hand ?? > move.l d0,(a0)+ ; (1) > add.l #2,a0 it looks strange to me: "move.l d0,(a0)+" puts the 4 byte value in the memory pointed by a0, and THEN make a0=a0+4 , so that a0 points the memory just after. in that case, the a0=a0+4 is cpu-time-free. usually there is no need for adds after. if you want to just write memory without automtic increment, don't set the '+'. be warned: on 680x0, all read/write access with .w (2 bytes) or .l (4 bytes) must be done on * EVEN * address, or it will crash. So your add.l #1,a0 is strange. If you target a 68060, it has a 8Kb data cache and a 8Kb code cache, and a 16 bytes cacheline. It means: if you have to write 16 bytes data, write then contiguously on a 16 byte aligned adress. if you have to read and not write 16 bytes, read them ona 16 bytes contiguous address, and don't write any of them.
Kalms Member	#3 - Posted: 30 May 2006 12:17 Reply Quote Just a note -- it is only on 68000/010 that .w and .l accesses to odd addresses generate exceptions. 68020+ handle all unaligned accesses properly (albeit slowly) by splitting the access into multiple aligned accesses.
winden Member	#4 - Posted: 30 May 2006 13:02 Reply Quote hmmm... if these memory is beign cached in datacache, unaligned should go as fast as aligned, shouldn't it??? hmmm... 060 book time... chapter 10 page 11 says that misaligned reads take 1 extra clock cycle, and misaligned writes (or read-modify-write) 2 extra clock cycles.
Krishna Member	#5 - Posted: 30 May 2006 19:05 Reply Quote Just a note -- it is only on 68000/010 that .w and .l accesses to odd addresses generate exceptions. 68020+ handle all unaligned accesses properly (albeit slowly) by splitting the access into multiple aligned accesses. ok so I will have penalties for (2) and (3), thank you.
Kalms Member	#6 - Posted: 1 Jun 2006 00:55 Reply Quote The 68060 datacache can only perform one access per cycle. When you read a byte/word/longword from datacache, the datacache will deliver one aligned longword to the core, and then the core picks out the byte(s) that it is interested in. If the word/longword crosses a longword-boundary (read word from N+3 or read longword from N+1,N+2,N+3), then the datacache must deliver two longwords (first longword N, then longword N+4) to core. This will keep the datacache busy for 2 cycles instead of 1 cycle. Also, the core will stop execution of pOEP & sOEP for 1 cycle. Aligned read performance: move.l (a0),d0 ; cycle 1 pOEP move.l d1,d2 ; cycle 1 sOEP Misaligned read performance: move.l 1(a0),d0 ; cycle 1 pOEP - datacache fetches bytes 1(a0) thru 3(a0) ; stall -- cycle 1 sOEP ; stall -- cycle 2 pOEP - datacache fetches byte 4(a0), move.l completes move.l d1,d2 ; cycle 2 sOEP Misaligned writes within a longword seem to be OK, while writing across the longword boundary gives a penalty of 2 cycles. Aligned write performance: move.l d0,(a0) ; cycle 1 pOEP move.l d1,d2 ; cycle 1 sOEP Misaligned write performance: move.l d0,1(a0) ; cycle 1 pOEP ; stall -- cycle 1 sOEP ; stall -- cycle 2 pOEP ; stall -- cycle 2 sOEP ; stall -- cycle 3 pOEP move.l d1,d2 ; cycle 3 sOEP I don't know how read-modify-write instructions (such as "add.l d0,(a0)") work with misaligned accesses.
krabob Member	#7 - Posted: 1 Jun 2006 17:25 - Edited Reply Quote

A.D.A. Amiga Demoscene Archive, Version 3.0