I was thinking I would write generalized memcopy and memclear routines designed to perform well on 060. So I have some ideas, but I'm not sure if they're correct.
Movem of 16-byte chunks should be optimal, as a cache page is 16-bytes large. But movem writes can't use post-increment, only pre-decrement, so does that mean that the longwords (inside cache frame) are fetched in reverse-order? If that's the case, I suspect movem (without pre-decrement,
So I have these two options for clearing memory:
Alternative 1, a0 is initialized to end of buffer
.loop movem.l d1-d4,-(a0)
subq.l #1,d7 ;d7 = numbytes/16
bgt.b .loop
Alternative 2, a0 is initialized to beginning of buffer
.loop movem.l d1-d4,(a0)
adda.l d0,a0 ;d0=16
subq.l #1,d7 ;d7 = numbytes/16
bgt.b .loop
Which would be the faster of these two alternatives?
Also, would there be much to gain from fetching 32, or 48 bytes at a time instead of just 16 ? My gut feeling is that the performance gain should be negligible, but I really don't know.