Suggested optimizations

ZX80, ZX 81, ZX Spectrum, TS2068 and other clones
Post Reply
einar
Member
Posts: 47
Joined: Fri Sep 06, 2013 4:23 pm

Suggested optimizations

Post by einar »

Sample program test.c:

Code: Select all

unsigned char counter = 7;

void func(unsigned char i) {
    if (i == 1)
        counter++;
    if (i == 2)
        counter--;
}

main() {
    func(0);
}
Compile options:

Code: Select all

zcc +zx -vn -startup=31 -clib=new test.c -O3 -a
The following code will be generated by z88dk-win32-20151215.zip for function func():

Code: Select all

._func
        ld        hl,2        ;const
        add        hl,sp
        ld        a,(hl)            ; A = i;

        cp        #(1 % 256)        ; if (i == 1)
        jp        nz,i_4

        ld        hl,_counter       ;   counter++;
        ld        a,(hl)
        inc        (hl)

.i_4
        ld        hl,2        ;const
        add        hl,sp
        ld        a,(hl)            ; A = i;

        cp        #(2 % 256)        ; if (i == 2)
        jp        nz,i_5

        ld        hl,_counter       ;   counter--;
        ld        a,(hl)
        dec        (hl)

        ld        l,a               ; ???
        ld        h,0
.i_5
        ret
Comments:

1.) Notice there was no need for instruction ld a,(hl) in the code corresponding to counter++ and counter--. Would it be possible for zcc to avoid this instruction, in cases where expression result is not used?

2.) If ld a,(hl) is eliminated in previous case, then hopefully zcc will notice there would be no need to load parameter twice from stack.

3.) Function func() is declared without returning a result. Why does zcc generate code to return some result value in HL?
einar
Member
Posts: 47
Joined: Fri Sep 06, 2013 4:23 pm

Post by einar »

Another sample program test.c:

Code: Select all

unsigned char counter = 7;

void func(unsigned char i) {
    if (i == 1)
        ++counter;
    if (i == 2)
        --counter;
}

main() {
    func(0);
}
This time, generated code for function func() was worse:

Code: Select all

        ld        hl,2        ;const
        add        hl,sp
        ld        a,(hl)          ; A = i;

        cp        #(1 % 256)      ; if (i == 1)
        jp        nz,i_4

        ld        hl,(_counter)   ;   ++counter;
        ld        h,0
        inc        hl
        ld        a,l
        ld        (_counter),a

.i_4
        ld        hl,2        ;const
        add        hl,sp
        ld        a,(hl)          ; A = i;

        cp        #(2 % 256)      ; if (i == 2)
        jp        nz,i_5

        ld        hl,(_counter)   ;   --counter;
        ld        h,0
        dec        hl
        ld        a,l
        ld        (_counter),a

.i_5
        ret
Comments:

1.) Generated code for ++counter is very inefficient. It was supposed to be only ld hl,_counter / inc (hl) / ld a,(hl), with last instruction omitted when expression result is not used. Similarly for --counter.

2.) Again, if ld a,(hl) was eliminated in previous case, then hopefully zcc would notice there would be no need to load parameter twice from stack.

3.) This time, zcc didn't generate code to return a result value!
einar
Member
Posts: 47
Joined: Fri Sep 06, 2013 4:23 pm

Post by einar »

Yet another sample program test.c:

Code: Select all

unsigned char counter = 7;

void func() {
    counter = 11;
}

main() {
    func();
}
Now generated code for function func() is as follows:

Code: Select all

._func
        ld        hl,11 % 256        ;const
        ld        a,l
        ld        (_counter),a
        ret
Comment:

1.) When assigning a constant expression to a variable, it would be more efficient if zcc converted constant to variable type in compile time, instead of runtime, whenever possible. This would eliminate the need to assign the constant to "integer" HL first, before converting it to "char" A. In this case, ld a,11 would suffice.
alvin
Well known member
Posts: 1872
Joined: Mon Jul 16, 2007 7:39 pm

Post by alvin »

Thanks einar. Some of these can be corrected with the peephole optimizer but some of the others are more difficult. sccz80 doesn't have a global view of what is going on so it really depends on what context the compiler is carrying at the moment it is emitting code. I'm not too familiar with the innards of sccz80 to make those sorts of changes if they are possible.

This is one of the main differences between sdcc and sccz80 - sdcc is able to make these sorts of global optimizations and its peepholer is able to perform some simple code analysis (as in determining if a register is dead so that it can eliminate instructions). sdcc will almost always win on speed because of this but on code size, sccz80 is trying to do things with subroutines so it can often come out ahead of sdcc which inlines everything.
Post Reply