[z88dk-dev] sdcc primitives made callee

Bridge to the z88dk-developers mailing list
Post Reply
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

[z88dk-dev] sdcc primitives made callee

Post by alvin »

I added a peephole step to the sdcc compile (zcc needs to be updated to use it). This step is on by default since it sits at -O1 but can be turned off by adding -O0 to the compile line. If the program crashes try compiling with -O0 and let me know about it.

The peephole rule file can be seen here:

http://z88dk.cvs.sourceforge.net/viewvc ... sdcc_opt.1

As of now it only contains rules to turn all sdcc primitive calls into callee format. This removes the stack cleanup following these calls and should save on compiled program size. This rule file is applied after sdcc's own peephole step has already been run and that step does rearrange some of the stack cleanup done following some of the primitive calls. So best substitution occurs with optimization turned up and, it looks like, --reserve-regs-iy on. Without the latter, sdcc seems to sometimes insert code between "pop af"s used to clear the stack.

Note that this is separate from the peephole rules used for sccz80 since those rules will create bugs in sdcc generated code.

Test compile:


#include <stdio.h>
#include <stdlib.h>

char buf[100];
double_t a, b;

main()
{
while(1)
{
printf("\n\nEnter number (a): ");

fflush(stdin);
scanf(" %50[^\n]", buf);

a = strtod(buf, NULL);

printf("Enter number (b): ");

scanf(" %50[^\n]", buf);

b = strtod(buf, NULL);

printf("\n");

printf("%g + %g = %g\n", a, b, a + b);
printf("%g - %g = %g\n", a, b, a - b);
printf("%g * %g = %g\n", a, b, a * b);
printf("%g / %g = %g\n", a, b, a / b);

printf("\n");

printf("%g > %g : %s\n", a, b, (a > b) ? "true" : "false");
printf("%g >= %g : %s\n", a, b, (a >= b) ? "true" : "false");
printf("%g < %g : %s\n", a, b, (a < b) ? "true" : "false");
printf("%g <= %g : %s\n", a, b, (a <= b) ? "true" : "false");
printf("%g == %g : %s\n", a, b, (a == b) ? "true" : "false");
printf("%g != %g : %s\n", a, b, (a != b) ? "true" : "false");
}
}



With optimization on (default or -O1 and higher):
13539 bytes

With optimization off (-O0 added to compile line):
13577 bytes

So even on a short program there is a savings of 38 bytes.



------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

Just a quick note since I should be working. There is a lot of room to improve sdcc code with peephole optimization. I introduced a level 3 optimization that can be specified on the compile line with "-O3". This is not enabled by default as it contains rules that might break some sdcc-generated code since the rules cannot know if a register is dead or not following the sub; these sorts of rules probably should be applied by sdcc's peephole optimizer as it does have that sort of information. But I put it there to experiment with a bit.

With the 3d graph program shown earlier:

sdcc on its own:
6692 bytes

sdcc with -O1 (sdcc primitives made callee)
6615 bytes

sdcc with -O3 (play with code improvement)
6497 bytes

That's a massive savings of 200 bytes on a small program.

The opt file I applied looks like this:
http://z88dk.cvs.sourceforge.net/viewvc ... sdcc_opt.3

There is one error in the code, probably a sub applied when a register wasn't dead but it does show how much sdcc output can be improved. It may be able to beat sccz80 on code size for most circumstances if we made an effort to come up with peephole rules for sdcc's popt.



------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
stefano
Well known member
Posts: 2151
Joined: Mon Jul 16, 2007 7:39 pm

Post by stefano »

I introduced a level 3 optimization that can be specified on the compile line with "-O3".
I'm a bit confused.. wasn't this third level existing already ?



------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

I introduced a level 3 optimization that can be specified on the compile line with "-O3".
I'm a bit confused.. wasn't this third level existing already ?
No, copt is not invoked when sdcc is the compiler. The copt rules we have create bugs in the sdcc generated code because sdcc is sometimes keeping registers live far outside the substitution block.

So what I have done now is enable three copt levels (or rather level 1 and level 3 with level 2 = level 1) for the sdcc compile that uses completely separate rules files from the sccz80 compile.

The rules files are in:
http://z88dk.cvs.sourceforge.net/viewvc ... VELOPMENT/

sdcc_rules.1
http://z88dk.cvs.sourceforge.net/viewvc ... cc_rules.1
is always applied directly to the asm output generated by sdcc. This changes the asz80 syntax to more standard zilog syntax and changes all the section names to what we are using.

If the opt level is at least 1 then:
sdcc_opt.1
http://z88dk.cvs.sourceforge.net/viewvc ... sdcc_opt.1
is applied. This replaces the calls to the sdcc primitives with a callee version that eliminates the stack cleanup following the call.

If the opt level is at least 3 then:
sdcc_opt.3
http://z88dk.cvs.sourceforge.net/viewvc ... sdcc_opt.3
is applied. These are experimental rules to replace the sometimes odd code that sdcc generates in some circumstances. Because sdcc can keep registers live outside small blocks, we really need to do these rules using sdcc's peephole optimizer where information on live registers is available. But it's good to quickly experiment with. As seen, you can get significant code size reduction if we gather a collection of rules after inspecting the code output for a collection of programs.



zcc's compile sequence when sdcc is used is this:

1. zcpp to do #define, #include etc
2. zpragma to do #pragma
3. sdcc including sdcc peephole optimizer
4. copt using sdcc_rules.1 to change to zilog syntax, etc
[5. if opt level is >=1 (-O1, -O2, -O3, etc) copt using sdcc_opt.1]
[6. if opt level is >=3 (-O3) copt using sdcc_opt.3]

All the config files set default -O2 so unless "-O0" is specified on the compile line, sdcc compiles will automatically get the -O1 level with the callee transformation for sdcc primitives.



------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

Ok, I think this will be the last change and it's in before the update:

There are three optimization levels and three sdcc rules files:

sdcc_opt.1 (minimum -O1)
http://z88dk.cvs.sourceforge.net/viewvc ... sdcc_opt.1
* change asz80 syntax to standard zilog syntax (more or less)
* change sdcc section names to z88dk section names
* fix initializer / initialized sections
* extern sdcc primitives so they can be found by assembler

sdcc_opt.2 (minimum -O2)
http://z88dk.cvs.sourceforge.net/viewvc ... sdcc_opt.2
* modify code to use callee versions of sdcc primitives

sdcc_opt.3 (minimum -O3)
http://z88dk.cvs.sourceforge.net/viewvc ... sdcc_opt.3
* copt rules for improving sdcc code; you must be careful with these rules as sdcc can keep registers live and changing souce code might affect register values after the substitution. For this reason most peephole rules should be applied by the sdcc peepholer where information like register lifetime is known.


The default optimization level is "-O2" so sdcc_opt.1 and sdcc_opt.2 will normally be applied.

If you specify "-O0" on the compile line, the asm generated by sdcc will be left untouched. This can't be used by z80asm to compile a program but it can be used with the "-a" flag to generate a raw asm file that can be read by the user.

Eg:

zcc +zx -vn -a -clib=sdcc_ix --max-allocs-per-node200000 --reserve-regs-iy graph.c

will generate the file "graph.asm" which is the raw file generated by sdcc & its peephole optimizer. Using this file you can find opportunities to generate more sdcc peephole rules, and that's the reason this is supported.



------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

If you specify "-O0" on the compile line, the asm generated by sdcc will be left untouched. This can't be used by z80asm to compile a program but it can be used with the "-a" flag to generate a raw asm file that can be read by the user.

Eg:

zcc +zx -vn -a -clib=sdcc_ix --max-allocs-per-node200000 --reserve-regs-iy graph.c

will generate the file "graph.asm" which is the raw file generated by sdcc & its peephole optimizer. Using this file you can find opportunities to generate more sdcc peephole rules, and that's the reason this is supported.
I left out the "-O0". Just so there is no confusion the above compile line should read:

zcc +zx -vn -a -O0 -clib=sdcc_ix --max-allocs-per-node200000 --reserve-regs-iy graph.c



------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
Post Reply