Flipping Bytes & Bits!

ZX80, ZX 81, ZX Spectrum, TS2068 and other clones
Post Reply
Zetr0
Member
Posts: 63
Joined: Mon Aug 22, 2011 12:38 pm

Flipping Bytes & Bits!

Post by Zetr0 »

Hello there my fellow z88dk'ers

I was wondering, after having a look around on the interwebs, on how to horizontally flip a bitmap image so that you get a mirror - now with the oodles of power a PC has this is quite a trivial matter and can easily be brute forced - however implement this ( even for a small 5x13 cells [ 520 bytes ] ) on the humble z80 takes too long using such a method.

I have looked at off-loading this by simply including an already flipped bitmap image - however this eats away at the memory so I was wondering if there is a either a clever method involving a few instructions ( possibly an assembly rotate or two ) that can help me flip a byte horizontally such that
0x"10011100" would yield "00111001" or perhaps a lookup table of sorts to off-load the amount of ram require as well as the amount of instructions.

Currently I have a combination of ASM and C where bitmaps are written to a buffer and then that buffer is written to the ZX Spectrum "Display File"

just wondering if anyone out there has any idle thoughts, ideas or insight on how I can reduce the Ram requirements and not overly increase the render to buffer time.

Thanks for reading.
stefano
Well known member
Posts: 2309
Joined: Mon Jul 16, 2007 7:39 pm

Re: Flipping Bytes & Bits!

Post by stefano »

It is in my wish/todo list to do something similar on the monochrome sprites.
The ideal approach depends on your needs, it could be necessary to swap also the colour attributes, have an knowh number (or proportion) of bytes to flip, etc..

Supercode provides the routines to reflect the characters on the X or Y axis or rotate them:

https://github.com/z88dk/techdocs/blob/ ... .asm#L5339
Zetr0
Member
Posts: 63
Joined: Mon Aug 22, 2011 12:38 pm

Re: Flipping Bytes & Bits!

Post by Zetr0 »

That's really helpful!

Thank you stefano!

Its given me an idea with a 32 byte index, breaking a byte into nibbles - using the lower nibble ( 0 - 3 = bits 1,2,3,4 ) as the index value... hmm something like this...

Code: Select all

Nibble X Mirror	      index	  flipped value	     Change State

0000 = 0000		0		0		   NC*
0001 = 1000		1		8		   +7
0010 = 0100		2		4		   HV
0011 = 1100		3		12                 +9
0100 = 0010		4		2		   HV
0101 = 1010		5		10		   HV
0110 = 0110		6		6		   NC*
0111 = 1110		7		14		   HV
1000 = 0001		8		1		   -7
1001 = 1001		9		9		   NC*
1010 = 0101		10		5		   HV
1011 = 1101		11		13		   +2
1100 = 0011		12		3		   -9
1101 = 1011		13		11                 -2
1110 = 0111		14		7		   HV
1111 = 1111		15		15		   NC*

NC = no change
HV = Half Value
using the nibble patern value it can be used as an index = this value can then be "left shifted" 4 times to become the high nibble within a new byte

We can use a similar reverse method to "right shit" the "high nibble" bits 5,6,7 and 8, to then be bits 1,2,3,4 to use the index again and add these to the flipped byte.....

could it be that easy i wonder..... hmm... looks like I am coding tonight lol
Zetr0
Member
Posts: 63
Joined: Mon Aug 22, 2011 12:38 pm

Re: Flipping Bytes & Bits!

Post by Zetr0 »

I should point out this is based on simple black/white bitmap image data as that's what I am working with...
Timmy
Well known member
Posts: 423
Joined: Sat Mar 10, 2012 4:18 pm

Re: Flipping Bytes & Bits!

Post by Timmy »

The usual way of flipping all 8 bits in an image is to use a lookup table for all the 256 combinations. It costs 256 bytes + a few bytes of code.

The alternative way to do it is to use assembly. This is usually too slow if you want to flip sprites in real time. Uses less bytes but slower.

A third way to to store both sides separately. Some games do this.

A 4th way is to draw direction independent sprites.

PS. I was surprised that sp1 couldn't do this, as it already used so many bytes for storage already.

PPS. On machines like Game Boy people use hardware flipping.
stefano
Well known member
Posts: 2309
Joined: Mon Jul 16, 2007 7:39 pm

Re: Flipping Bytes & Bits!

Post by stefano »

SP1 works on a fixed sprite size. The picture data is provided by the programmer, thus the flip/rotate routines I mentioned could be applied to such data to save space.
I'm not very good with SP1 but a data manipulation routine should be easy to implement.

EDIT: your idea is very interesting. Dealing with nibbles is not very fast on the Z80 (try using RLD on the entire screen) but often it's just a matter of finding the right way to do things..
User avatar
feilipu
Member
Posts: 55
Joined: Tue Nov 15, 2016 5:02 am

Re: Flipping Bytes & Bits!

Post by feilipu »

I was wondering if there is a either a clever method involving a few instructions ( possibly an assembly rotate or two ) that can help me flip a byte horizontally such that
0x"10011100" would yield "00111001" or perhaps a lookup table of sorts to off-load the amount of ram require as well as the amount of instructions.
There is a function available to mirror a byte, mainly used with the z180 CSIO for SPI interfaces.

https://github.com/z88dk/z88dk/blob/mas ... mirror.asm

It might be too slow for what you need though.
Timmy
Well known member
Posts: 423
Joined: Sat Mar 10, 2012 4:18 pm

Re: Flipping Bytes & Bits!

Post by Timmy »

For those people who are still skeptical about a lookup table, here's a link from a random disassembly of a commercial game back in the day:

https://ultrabolido.github.io/skRex/Rex ... 44381.html

I've dissected many games before, and there is a similar table in many commercial games. Usually there's the same pattern when you look at the bytes displayed as pixels.

This is really the fastest way to save many bytes especially when you have very large sprites (unfortunately people often won't use large sprites with SP1, but that could be because one has to store flipped versions of these sprites).
Timmy
Well known member
Posts: 423
Joined: Sat Mar 10, 2012 4:18 pm

Re: Flipping Bytes & Bits!

Post by Timmy »

stefano wrote: Fri Dec 01, 2023 6:26 pm SP1 works on a fixed sprite size. The picture data is provided by the programmer, thus the flip/rotate routines I mentioned could be applied to such data to save space.
I'm not very good with SP1 but a data manipulation routine should be easy to implement.

EDIT: your idea is very interesting. Dealing with nibbles is not very fast on the Z80 (try using RLD on the entire screen) but often it's just a matter of finding the right way to do things..
Just for your information, Alvin and I had a small discussion about the problem of flipped sprites before: viewtopic.php?t=8912

My current (2023) opinion is that sp1 is brilliant and is optimised for speed and is _the_ library to use in most cases. Unfortunately, the focus on speed and versatility and 1 pixel movement means that every sprite takes a huge chunk of memory (for example the shifting tables). The fact that most games use mirrored sprites, and as a result the need for authors to store separate versions of it, means that all practical uses of sprites in SP1 will end up with few small 16x16 sprites.

Supporting Mirroring would help, but now that I understand that SP1 is basically a JIT sprite compiler, means that adding support for mirroring could be a lot of work, and is possibly not backward compatible.

This (=the huge memory footprint) is also one of the main reasons that I haven't used SP1 lately.
Zetr0
Member
Posts: 63
Joined: Mon Aug 22, 2011 12:38 pm

Re: Flipping Bytes & Bits!

Post by Zetr0 »

Hello my fellow z88dk'ers

A big thanks for all the input!!

So i have been experimenting with various results - so here are some "solutions"


bare with me.... this is a long one ( you have been cautioned ).

Preprocessor

Code: Select all

// typedef

typedef unsigned char ubyte;
typedef unsigned int uword;
so first we need a function that can actually mirror the byte so that "10111011" = "11011101" / "10000000" = "00000001"

so its time to get shifty

Code: Select all

ubyte mbyte( ubyte byte );
ubyte mbyte( ubyte byte ) 
{
    // bit shift the byte 
    byte = ( byte & 0xF0) >> 4 | ( byte & 0x0F) << 4;
    byte = ( byte & 0xCC) >> 2 | ( byte & 0x33) << 2;
    byte = ( byte & 0xAA) >> 1 | ( byte & 0x55) << 1;

    return ( ubyte ) byte;
}
*please note that I cast the return of byte. this is because "byte promotion to int" may exist here, this is more my habit than it being a requirement

So the first test is to create an index of all possible mirrored bytes, 256 entities to be sure.

Code: Select all

// global vectors

ubyte mbyte_buffer[256];
Thats the retainer file, while initially 256bytes sounds like a lot for a 128k machine, truth be said this reduces the overall memory foot print considerably, so much so I can now have three times as much "tile data" in the same space.

Code: Select all

// functions

void init_mbyte_buffer( void );
void init_mbyte_buffer( void )
{
    register ubyte index;

    for( index = 0; index < 255; index +=1 )
    {
        mbyte_buffer[ index ] = mbyte( index );
    }
}
so now "mbyte_buffer[]" filled with mirrored byte values so to get a mirrored byte value it is as simple as taking the value of the one you have and using this as an index.

the following function writes 8 bytes ( a cell ) to "*dest" which is a zx spectrum DISPLAY FILE pointer.

Code: Select all

void write_mirror_cell( void *dest, ubyte *src );
void write_mirror_cell( void *dest, ubyte *src )
{
    register ubyte index = 0;

    for( index =0; index < 8; index += 1 )
    {
        *dest = mbyte( src[ index ]);
         dest += 256;
    }
}
to use this there is a small dependency and index to help speed things up

Code: Select all

// system static defines

#define Screen_File 16384;
With this I decided to break the "Display File" in to CELL's - 24 (0-23) x 32 ( 0 -31 ) respectively. Those that know the ZX Spectrum Display File will lso know that it is in fact segmented ( non linear indexing ) in its vector alignment, distributed over 3 fields.

So instead of having to pre-calculate an X/Y Cell coordinate for the Displayfile I am using 24 bytes to represent the starting "Y" coordinate of the Cell in the Display File. With this I can simply add the "X" coordinate to get to the CELL I want to manuipulate. One could use a 192 byte buffer to represent all the byte lines of the "Y" axis - but thats for another day

Code: Select all

// Global Vectores
uword ScreenBitmap_24[ 24 ];


Thats the retainer

Code: Select all

void init_ScreenBitmap_24( void );                  // initializes an array of Screen File pointer address in the Y dimension                                                    
void init_ScreenBitmap_24( void )                   // Array size is 24 bytes, reduces add and offset opps by half
{
     register uword index = 0, row = 0;              // this implementation divides up the screen into 32 cells wide ( 0 - 31 ) and 24 Cells long ( 0 - 23 )
                                                     // As the spectrum doesn't have a linear bitmap to the screen a small translation of cell location
                                                     // based on the tirciary of the screen with a Y lookup table
     while ( index < 24 )
     {
          ScreenBitmap_24[ index ] = Screen_File;

          if( index > 7 )                            // if index is in the 2nd or 3rd bank of the screen file
          {
              if( index > 15 )
              {
                  ScreenBitmap_24[ index ] = ScreenBitmap_24[ index ] + 3584;       // the third bank offset of the screen file  (less 256)
              }
              else
              {
                  ScreenBitmap_24[ index ] = ScreenBitmap_24[ index ] + 1792;       // the second bank offset of the screen file (less 256)
              }
          }
          ScreenBitmap_24[ index ] = ScreenBitmap_24[ index ] + ( index * 32 );     // the Y cell location of the Screen File Bank
          index = index +1;
     };
}
Indeed its neither pretty or elegant, but its does get the job done.

Now one more function so we can derive the X/Y coordinates of the DISPLAY FILE

Code: Select all

void *get_screen_cell_XY( ubyte x, ubyte y );        // returns the Screen File address of the cell 
void *get_screen_cell_XY( ubyte x, ubyte y )         // on the screen
{
     return (void *) ScreenBitmap_24[ y ] + x;
}
yeah... i know I am lazy...

lets tie this together with an example

Code: Select all

uword main( void )
{
    void *scr_ptr;
    ubyte cdata[8];
    
    // a simple left leaning triangle
    cdata[ 0 ] = 128;
    cdata[ 1 ] = 192;
    cdata[ 2 ] = 224;
    cdata[ 3 ] = 240;
    cdata[ 4 ] = 248;
    cdata[ 5 ] = 252; 
    cdata[ 6 ] = 254;
    cdata[ 7 ] = 255;
    
    // now lets put this to the screen
    scr_ptr = get_screen_cell_XY( 5, 5 );
    asm_Write_Screen_Cell( scr_ptr, &cdata[0]); // This is a fast ( and very simple assembly routine to draw a cell
 
time for a little inline Z80 assembly

Code: Select all

void asm_Write_Screen_Cell( void *scr_ptr, unsigned char *cell_data ); // Optimized [ 278 Ts ]
void asm_Write_Screen_Cell( void *scr_ptr, unsigned char *cell_data )
{
      #asm
	    ld hl,2	     ; // [ 10 Ts ] jump the return address of the stack
            add hl,sp    ; // [ 11 Ts ] by incrementing the stack ptr by 2

	    ld c,(hl)	 ; // [  7 Ts ] load the address of the *byte ptr from the stack to bc
	    inc hl       ; // [  6 Ts ] "" *Remember Least Significant Byte First
	    ld b,(hl)	 ; // [  7 Ts ] ""
	    ld a,(bc)	 ; // [  7 Ts ] store the value of BC in a "cell_data"

        ld hl,4      ; // [ 10 Ts ] add a four byte jump to the stack
        add hl,sp    ; // [ 11 Ts ] getting to scr_ptr

        ld e,(hl)	 ; // [  7 Ts ] get the address
        inc hl		 ; // [  6 Ts ] of the
        ld d,(hl)    ; // [  7 Ts ] "*scr_ptr"

        ex de,hl     ; // [  4 Ts ] exchange registers

// [  93 Ts ] Initialization

        ; // rolled out for speed

        ld (hl), a   ; // [  7 Ts ] put the byte in the screen file
// [  24 Ts ]
        inc h        ; // [  4 Ts ] add 256 to get to the next row in the cell
        inc bc       ; // [  6 Ts ] add 1 to the address of the byte
        ld a,(bc)    ; // [  7 Ts ] put the value of the pointer of BC into A
        ld (hl), a   ; // [  7 Ts ] put the byte in the screen file
// [  24 Ts ] 
        inc h        ; // add 256 to get to the next row in the cell
        inc bc       ; // add 1 to the address of the byte
        ld a,(bc)    ; // put the value of the pointer of BC into A
        ld (hl), a   ; // put the byte in the screen file
// [  24 Ts ]
        inc h        ; // add 256 to get to the next row in the cell
        inc bc       ; // add 1 to the address of the byte
        ld a,(bc)    ; // put the value of the pointer of BC into A
        ld (hl), a   ; // put the byte in the screen file
// [  24 Ts ]
        inc h        ; // add 256 to get to the next row in the cell
        inc bc       ; // add 1 to the address of the byte
        ld a,(bc)    ; // put the value of the pointer of BC into A
        ld (hl), a   ; // put the byte in the screen file
// [  24 Ts ]
        inc h        ; // add 256 to get to the next row in the cell
        inc bc       ; // add 1 to the address of the byte
        ld a,(bc)    ; // put the value of the pointer of BC into A
        ld (hl), a   ; // put the byte in the screen file
// [  24 Ts ]
        inc h        ; // add 256 to get to the next row in the cell
        inc bc       ; // add 1 to the address of the byte
        ld a,(bc)    ; // put the value of the pointer of BC into A
        ld (hl), a   ; // put the byte in the screen file
// [  24 Ts ]
        inc h        ; // add 256 to get to the next row in the cell
        inc bc       ; // add 1 to the address of the byte
        ld a,(bc)    ; // put the value of the pointer of BC into A
        ld (hl), a   ; // put the byte in the screen file

        ret          ; // We are done here
                       // [  278 Ts ] Execution Time
      #endasm
}
The above just puts 8 bytes of data into the DISPLAY FILE, starting from *scr_ptr, and incrementing it by 256 bytes to get to the next line down in the CELL.

back to the C code

Code: Select all

    scr_ptr = get_screen_cell_XY( 6, 6);          // get the DISPLAY FILE address for CELL X/Y
    write_mirror_cell( scr_ptr, &cdata[0] );   
}
the "write_cell_mirror()" function is very similar to the assembly routine, however, it uses the value of the "*src" byte as an index when writing it the Display File "*dest"

Code: Select all

void write_cell_mirror( ubyte *mbuffer, void *dest, ubyte *src );
void write_cell_mirror( ubyte *mbuffer, void *dest, ubyte *src )
{
    ubyte index=0;

// l0
    *dest = mbuffer[ src[ index ] ];

// update for Display File;

// l1
    dest = dest + 256;
    index += 1;
    *dest = mbuffer[ src[ index ] ];
// l2
    dest = dest + 256;
    index += 1;
    *dest = mbuffer[ src[ index ] ];
// l3
    dest = dest + 256;
    index += 1;
    *dest = mbuffer[ src[ index ] ];
// l4
    dest = dest + 256;
    index += 1;
    *dest = mbuffer[ src[ index ] ];
// l5
    dest = dest + 256;
    index += 1;
    *dest = mbuffer[ src[ index ] ];
// l6
    dest = dest + 256;
    index += 1;
    *dest = mbuffer[ src[ index ] ];
// l7
    dest = dest + 256;
    index += 1;
    *dest = mbuffer[ src[ index ] ];
}
This all seems to work well and indeed performs as expected... however when I use this code mentioned above

Code: Select all

void write_mirror_cell( void *dest, ubyte *src );
void write_mirror_cell( void *dest, ubyte *src )
{
    register ubyte index = 0;

    for( index =0; index < 8; index += 1 )
    {
        *dest = mbyte( src[ index ]);
         dest += 256;
    }
}
this code essentially does the byte conversion / shifting on the fly not needing a 256 byte ( index ) buffer

Code: Select all

    scr_ptr = get_screen_cell_XY( 8, 6);
    write_mirror_cell( scr_ptr, &cdata[0] );
    
    return 0;
}
So I have tested both methods here, there is a caveat however, over 50 cells to the Display File *directly* and there is very little difference in speed between the two solutions.

The caveat being is that my project uses a buffer to "put the images into" and then "draws" this to the screen using the ASM function seen above.

So here are some lingering questions -

Is there a way I can compare the compiled output of ZCC, i.e. to output to assembly?
Is there a faster way to do this, perhaps a z80 assembly version of the byte mirror shifting
is this post way too long?

Thanks for reading

Z.
stefano
Well known member
Posts: 2309
Joined: Mon Jul 16, 2007 7:39 pm

Re: Flipping Bytes & Bits!

Post by stefano »

If you can limit your code to blind computation, then we have "ticks", it is part of the z88dk kit.
It is a modified CP/M emulator benchmarking the code for you.

Optimizing the code for speed usually involves assembly. I suppose that most of the optimization attempts will be something like "choosing the other way round" everytime ? You could try the Timmy's approach and compare the timing.
Timmy
Well known member
Posts: 423
Joined: Sat Mar 10, 2012 4:18 pm

Re: Flipping Bytes & Bits!

Post by Timmy »

If you are working in C instead of assembly, there are also interesting routines in spectrum.h, like

Code: Select all

uchar *zx_cyx2saddr(uchar row, uchar col)
, instead of writing your own get_screen_cell_XY(). Then again, you learn a lot when you trying writing them yourself.

More info: https://github.com/z88dk/z88dk/wiki/LIB ... nipulators

I personally will write assembly for these routines. Maybe I can try to write one soon.
stefano
Well known member
Posts: 2309
Joined: Mon Jul 16, 2007 7:39 pm

Re: Flipping Bytes & Bits!

Post by stefano »

I'm reviving this old thread to inform you that I've pulled a function to flip vertically the picture in a monochrome sprite, it's 8080 compatible.
I also updated the library to get a sprite plotting variant for the 8080.

The horizontal flipping is way more tricky, and surely slow but I'm trying
Jens
Member
Posts: 32
Joined: Sun Jul 07, 2024 11:02 am

Re: Flipping Bytes & Bits!

Post by Jens »

This is probably the best collection of "Bit Twiddling Hacks": https://graphics.stanford.edu/~seander/ ... rseObvious
This is in C, but should give you an idea for assembler. Though I agree with others in this thread, on many 8 or 16 bit platforms a lookup table is the best solution if you can spare the memory. (On more modern platforms the cache loads may invalidate any savings from the table lookup.)
stefano
Well known member
Posts: 2309
Joined: Mon Jul 16, 2007 7:39 pm

Re: Flipping Bytes & Bits!

Post by stefano »

Wow, it is very exaustive !
My implementation is still being debugged but works fine already on big pictures (I'm fixing the "single byte" conditions).
The nice feature I'm getting is a horizontal flip algorithm working on any picture size, with an even or odd number of horizontal bytes or pixels.

What I noticed in assembler is that, taking apart the tricks like pre-computed tables and loop unrolling, the actual loop optimization matters even more: if you run out of spare registers or need slow instructions, then a different approach which in theory should be slower could provide smaller and faster code.
Also the apparent size paradox must be considered, those accumulator related instructions fly.. you can put many in sequence and probably you'll be faster than a single operation with 16bit registers.
stefano
Well known member
Posts: 2309
Joined: Mon Jul 16, 2007 7:39 pm

Re: Flipping Bytes & Bits!

Post by stefano »

By the way, I'm putting the new sprite modification functions here:

https://github.com/z88dk/z88dk/tree/mas ... gfx/common
stefano
Well known member
Posts: 2309
Joined: Mon Jul 16, 2007 7:39 pm

Re: Flipping Bytes & Bits!

Post by stefano »

Post Reply