Which of the following three solutions is preferable, in your opinion, if only d0 can be trashed by the loop logic. Maybe there's a better solution which I haven't thought of?
Both inner and outer loop should run three times (3*3).
Option 1, use d1 too but preserve it on stack:
move.l d1,-(sp)
moveq #3,d0
outer:
moveq #3,d1
inner:
; some stuff
subq.l #1,d1
bgt.b inner
; more stuff
subq.l #1,d0
bgt.b outer
move.l (sp)+,d1
Option 2, keep the outer loop counter on stack:
pea.w 3
outer:
moveq #3,d0
inner:
; some stuff
subq.l #1,d0
bgt.b inner
; more stuff
subq.l #1,(sp)
bgt.b outer
addq.l #4,sp
Option 3, put both loop counters in d0:
move.w #3*256,d0
outer:
addq.w #3,d0
inner:
; some stuff
subq.b #1,d0
bgt.b inner
; more stuff
sub.w #256,d0
bgt.b outer