Author |
Message |
Jamie2021
Member |
Our previous demos were compiled with an old gcc, I always wanted to test the new version of Bartman but I never managed to run it. I finally decided to spend a little more time there and here are the results for Chapter7
All version use 77K of assembler GCC 3.2 453ko (~77k ASM, ~376k CPP) GCC 6.5 233ko (~77k ASM, ~156k CPP) GCC 10.1 137ko (~77k ASM, ~60k CPP)
|
todi
Member |
Nice, but how is the speed, same?
|
Jamie2021
Member |
The executable is compiled, linked but does not work yet. Since I am converting the code to assembler which is used during realtime frames, the speed is probably going to be the same
|
todi
Member |
Ok, was thinking GCC 3.2/6.5 was doing some unrolling that 10.1 didn't do, thats why I was asking about the speed. This looks pretty promising then! Is this for 68060 FPU or 68020?
|
Jamie2021
Member |
The code produced by gcc 10 is much higher in quality than the older versions, big thanks to Bartman/Abyss for the work. I use it for the 68060 + fpu but it can do the whole 68k family.
|
hellfire
Member |
Hi Jamie, nice to see you're doing things again! :) Can you maybe show some examples, where GCC10 produced better code and what compiler-flags you use? For what I have tried so far, it unfortunately doesn't know much about 060-specific optimization (just like VBCC, too). You can easily experiment with it in compiler explorer btw. Last time I checked, the standard c libs were still missing for GCC10/68k. Did you build your own?
|
Jamie2021
Member |
Hello HellFire,
I never stopped, I'm just really slow :) Making an Unreal/DemoMaker for amiga in DX12 from scratch was not one of my best ideas.
I had used this compiler explorer to see the differences and we didn't see much on little bits of code. What I saw and remembered is that gcc3 was adding a lot of unnecessary instruction and I never managed to strip the code, I had to do it manually. GCC 6.5 had removed all unnecessary instructions but the code size was still big in my case (I had converted Chapter7 from GCC 3.2 to 6.5). The big strength of gcc 10 is the optimization of a complex/full program with a lot of function calls as well as the reduction of code duplication (especially with templates)
Some information on recent optimizations: https://gcc.gnu.org/wiki/LinkTimeOptimization https://www.phoronix.com/scan.php?page=news_item&px=GCC-11-m68k-Is-Safe
I use almost the same flags as Bartman, I had to add a few but not really much. It was a little sportier to put all this in visual studio, I wanted to keep vscode only for debugging.
ASM Command: m68k-amiga-elf-as --register-prefix-optional -m68060 -c -o OBJECTFILE ASMFILE
CPP Command: m68k-amiga-elf-gcc -g -MP -MMD -m68060 -fast -nostdlib -Wno-unused-function -Wno-volatile-register-var -fomit-frame-pointer -fno-tree-loop-distribution -flto -fwhole-program -fno-exceptions -ffast-math -c -o OBJECTFILE CPPFILE
I used very little the functions of the standard library and it was in my plans to remove it completely. The only functions I have to rewrite are fopen, fseek, ftell, fread, fclose, printf and fabs.
|
neoman
Member |
Nice its time to try GCC10 now :-) I use bebbos GCC in my engine atm but sometimes it emits buggy code with some of his optimizations.
|
rloaderror
Member |
"Making an Unreal/DemoMaker for amiga in DX12 from scratch was not one of my best ideas." <- this makes me excited! Looking forward to see the fruits of this. I'm also trying to make our demo making process more tool based, but so far it is a bit of a failure. Always end up with every request from the designer turning into a code change instead of something that can be facilitated by the existing "tools" :/
|
Jamie2021
Member |
I remember the time when I was doing everything in assembler without architecture, a demo took a few months. Now with c ++, cross compiling and tooling it's years, for almost the same result
|
rloaderror
Member |
"I remember the time when I was doing everything in assembler without architecture, a demo took a few months. Now with c ++, cross compiling and tooling it's years, for almost the same result"
Indeed! Something to be said for just going directly for "that one demo" instead of trying to make code that will shoulder several demos without falling apart. I'm trying to do it mostly to teach myself some lessons about software architecture. Hopefully the amiga scene will still exist by the time this stuff becomes mature.
I see you are looking at palette generation. That's also something I spent a bit of time on. Amiga effect performance can depend on palette layout and I don't think there exist f.ex make_8_hues_on_the_y_axis_with_32_shades_on_the_x_axis_forming_a_256_color_palette_from_a_24_bit_image type of palette generators in the wild.
|
todi
Member |
Maybe I should expand my shadetable tool with this :) https://github.com/tditlu/shadetabler
|
rloaderror
Member |
That looks neat! Many options to accommodate for in shade-table generation. In the end we may need a shadetable shader
|
punedolls
Member |
Amazing concepts you have shared with all stuff, Pune escorts is impressed with your well written content, punedolls must want to appreciate your shared stuff. Book Mumbai escort
|
rloaderror
Member |
It's been some time and I'm getting more curious about GCC10. Would be interesting to hear people's experience with using it. I'm still using Bebbo's great GCC 6 port for my A1200 AGA demos and build the demos on MacOS (Arm based Apple M1).
I've heard Bartman's setup is somewhat Windows dependent. Is there anyone using GCC10 based toolchains on other platforms such as Apple M1?
|
hellfire
Member |
I'm using Bartman's tool chain and I love it. It's bundled with a Windows build of GCC10 and has a patched version of WinUAE which contains a few extensions for debugging, so out-of-the-box it's Windows only. However, in the VSCode project configs you can adjust paths and settings to run a different GCC and a different UAE. Then there's not much left of the original package, though. Also not sure if VSCode is the editor of choice on Apple.
|
rloaderror
Member |
I guess I would be more than happy if I could build and run a GCC10 version on M1 that compiles Amiga compatible code. No need for me to have all the luxury features as I just want to check speedup from GCC6 to GCC10.
To make a GCC10 build, I suppose it is not enough to download the gcc repository and execute "make amigahh" on the command line on my mac and it would create a useful compiler right? :D
With Bebbo's offering one gets an "all in one" solution tailored to Amiga. He seems to have put in much effort in making things stable and making sure stdlib and all that stuff works on Amiga. I don't know if I have enough GCC skills to make GCC10 work. Although perhaps I should just try.
I'm happy to replace all stdlib usage AmigaOS equivalents in my code if that would make it simpler to get GCC10 compiled code. In Bartman's talk from Evoke he said that the changes to basic GCC were just a few lines?! Not sure if I buy that, but then again I'm not familiar with GCC internals.
|
rloaderror
Member |
I have downloaded the gcc repository. It has 122499 files :)
|
todi
Member |
I made a Homebrew formula for Bartmans patched GCC when it was 8.3, havnt updated it to Bartmans latest GCC 10, but maybe I should... https://github.com/tditlu/homebrew-amiga/blob/master/amiga-gcc.rbhttps://github.com/tditlu/homebrew-amiga
|
rloaderror
Member |
Thanks Todi! Going to try that first. BTW I finally implemented the ordered palette generator today. The input is a 24-bit blender scene and it outputs all textures remapped to an autogenerated, but nicely arranged palette. Look here: for an example scene with its palette
|
carrion
Member |
Hi Guys Carrion here I'm new here ;) I started using Bartman toolset/extension and it's great. OCS stuff works well but I have trouble running AGA/060 stuff on real hardware. IT simply crashes immediately with Guru when I start even my simple programs. What are the correct compiler options that I should work or in general what do I do wrong? TIA
btw: I use Bartmans Makefile with and without -m68060 option for GCC
And yup... I plan to do next demo ;)
|
Jamie2021
Member |
Hello Carrion,
I had the same problem with O3 optimization option, I changed for O1 or OS and now it works.
Good luck..
|
carrion
Member |
Hi I think I solved it and looks like it was rised before (on EAB forum) The problem was the wrong way to get VBR. I used the template provided with Bartmans extension to vscode. after changes it looks like this: __attribute__((section("text"))) __attribute__((aligned(4))) static const UWORD getvbr[] = {0x4e7a, 0x0801, 0x4e73};
static APTR GetVBR(void) { APTR vbr = 0;
if (SysBase->AttnFlags & AFF_68010) vbr = (APTR)Supervisor((ULONG (*)())getvbr);
return vbr; } ok. I move on to the next challanges. Looks like I can get either Kalms amd Britelites ADPCM players to work... :/
|
rloaderror
Member |
On my road to using GCC10 I want to eliminate stdlib from the amiga build. Passing -nostdlib to gcc gives me these undefined references:
__decimalpoint __locale_ctype_ptr __sF __vfprintf_total_size _ctype_ _impure_ptr atoi cimag creal exit fclose feof fflush fgets fopen fread free fseek ftell fwrite getc gets malloc memcmp memcpy memset printf putc putchar puts qsort rand realloc srand strchr strcmp strcpy strlen strncmp strncpy strrchr strstr strtok ungetc
Some are surprising. I don't think I'm using any complex numbers (cimag,creal?).. Anyway..
Does the thought of reimplementing any of these trigger nightmares in you? I noticed the printf() function in dos.library got this notice in the autodocs:
BUGS This function will crash if the resulting stream after parameter substitution is longer than 140 bytes.
*nightmare triggered*
|
Jamie2021
Member |
You better upgrade to gcc 12.2 directly. You use the stdlib much more than I do. For strings and allocations it's always good to use your own implementations even if the std has made huge progress in the last few years. Don't forget the move16 for memory ops :)
|
rloaderror
Member |
Now I’ve eliminated or (re)implemented malloc, free, realloc, memset, memcpy, memcmp, exit. I don’t know gcc sources very well, but I’m wondering why wouldn’t any existing stdlib implementation for Amiga just work out of the box for a new gcc version? It’s not like the OS is changing a lot.
Edit: *Seems that is not trivial after borwsing the code!*
|
rloaderror
Member |
A bit of wrestling was required to get a hello world with no stdlib up and running on gcc12 so I post my progress. int hello(void);
// Seems this needs to be the first function encountered. // If it ends up below hello() it will crash on startup. int _start(void) { return hello(); }
#include <proto/dos.h> #include <proto/exec.h>
struct DosLibrary* DOSBase = NULL; struct ExecBase* SysBase = NULL;
int hello(void) { SysBase = (*((struct ExecBase **) 4)); DOSBase = (struct DosLibrary *)OpenLibrary((STRPTR)"dos.library",33); if (!DOSBase) return 0;
Write( Output(), "Hellon", 6 ); CloseLibrary((struct Library*)DOSBase); return 0; }
Compile with MacOSX build of GCC bundled in Bartman's VSCode plugin. git clone --depth=1 https://github.com/BartmanAbyss/vscode-amiga-debug.git cd vscode-amiga-debug/bin/darwin/opt/bin ./m68k-amiga-elf-gcc -r -nostartfiles -nostdlib hello.c ../../elf2hunk a.out $DEMODIR/hello
This has been an evening of grinding and testing various parameters and disassembling the result so hope this makes it a 5 minute task for the next person to have a go at it :)
|
rloaderror
Member |
After implementing all of these stdlib calls using AmigaOS calls the "no-stdlib build" finally works!
A quite entertaining last bug where my own memset implementation got recognized as "memset-like" by some GCC optimization and it inserted a call to memset within my memset making the memset call itself recursively. lol. I tried using -fno-builtin-memset, but this still couldn't prevent the memset injection optimization.
Anyway the sad thing is that my demo loading stage performance is significantly slower when using my own stdlib calls. So next up will be to identify where the sluggish performance is originating from.
|
rloaderror
Member |
Trying out the gcc12 found in the vscode plugin I found that passing arguments through registers appears to be different compared to bebbo-gcc. On bebbo-gcc a declaration could go like this: float my_acos( register float val __asm("fp0") ); However this throws an error on that gcc12. In the example in the vscode plugin there is this method for passing arguments to specific registers. Below is how ThePlayer gets initialized through a wrapper and register keywords outside of the function declaration: // Demo - Module Player - ThePlayer 6.1a: https://www.pouet.net/prod.php?which=19922 // The Player® 6.1A: Copyright © 1992-95 Jarno Paananen // P61.testmod - Module by Skylord/Sector 7 INCBIN(player, "player610.6.no_cia.bin") INCBIN_CHIP(module, "testmod.p61")
int p61Init(const void* module) { // returns 0 if success, non-zero otherwise register volatile const void* _a0 ASM("a0") = module; register volatile const void* _a1 ASM("a1") = NULL; register volatile const void* _a2 ASM("a2") = NULL; register volatile const void* _a3 ASM("a3") = player; register int _d0 ASM("d0"); // return value __asm volatile ( "movem.l %%d1-%%d7/%%a4-%%a6,-(%%sp)n" "jsr 0(%%a3)n" "movem.l (%%sp)+,%%d1-%%d7/%%a4-%%a6" : "=r" (_d0), "+rf"(_a0), "+rf"(_a1), "+rf"(_a2), "+rf"(_a3) : : "cc", "memory"); return _d0; }
I guess that works, but it is a bit long Anyone found a way to pass arguments in specific registers that is less verbose?
|
Jamie2021
Member |
I use this method, yours seems more efficient. INLINE void audioInit(u8* stream, u8* soundBufferHigh0, u8* soundBufferLow0) { __asm ("move.l %0,a0nt" "move.l %1,a1nt" "move.l %2,a2nt" "jsr _audioInit" : :"r"(stream), "r"(soundBufferHigh0), "r"(soundBufferLow0) :"a0", "a1", "a2"); }
|