Digital Cabin

log whoami uses code feeds

Binary resource inclusion like it's 1989

by dweller - 2024-07-30

#programming #dev #c


So C23 is adding #embed preprocessor command which can be useful to embed resources into your binary. It looks something like this:

const char image[] =
{
    #embed "image.tga"
};

It also has some niceties like if_empty so you can embed some default data if file is empty or non-existent (it’s an assumption about the latter.) Check out cppreference.com for more information, or get latest C23 draft as of writing this post.

In any case, during the writing of this entry to my log, there are no compilers that support this.

$ gcc -std=c2x -Wall -Wextra -pedantic test.c
test.c:5:6: error: invalid preprocessing directive #embed
    5 |     #embed "image.tga"
      |      ^~~~~
test.c:3:12: error: zero or negative size array 'image'
    3 | const char image[] =
      |            ^~~~~

Not only that, this would be useful for people like me who are either stuck with, or intentionally use older standards like C99 or even C89 (like me most of the time.)

While conversing with my friend about adding an embed-like feature to his programming language he said:

you mean #include? :)

Thus the idea was born. #include indeed just embeds a file into your source code. Alas, the compiler will try to parse the raw binary and won’t be happy:

const char image[] =
{
    #include "image.tga"
};
$ cc -std=c89 -Wall -Wextra -pedantic test2.c
...
image.tga:13:14: error: stray '\377' in program
   13 |        <U+0000><U+0000><U+0000><U+0016><U+000C><d9><ff><U+000F><U+000B><e3><ff><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0004>Z<f7><ff><U+0002>m<fa><ff><U+0000><U+0000><U+0000><U+0000><U+000B><9d><fc><ff><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000>?
      |                                                                            ^~~~

image.tga:13:15: warning: null character(s) ignored
   13 |    <U+0000><U+0000><U+0000><U+0016><U+000C><d9><ff><U+000F><U+000B><e3><ff><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0004>Z<f7><ff><U+0002>m<fa><ff><U+0000><U+0000><U+0000><U+0000><U+000B><9d><fc><ff><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000>?
      |                                                                            ^~~~~~~~

image.tga:13:23: error: stray '\4' in program
   13 |    <e3><ff><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0004>Z<f7><ff><U+0002>m<fa><ff><U+0000><U+0000><U+0000><U+0000><U+000B><9d><fc><ff><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000>?
      |                                                                            ^~~~~~~~

image.tga:13:25: error: stray '\367' in program
   13 |   <U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0004>Z<f7><ff><U+0002>m<fa><ff><U+0000><U+0000><U+0000><U+0000><U+000B><9d><fc><ff><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000>?
      |                                                                            ^~~~
...

Well, let’s make the compiler happy. All we need to do is do some preprocessing before the C preprocessor. Prepreprocessing if you will.

My first idea was simple and worked out of the box, with a small nuance. Let’s just read a binary file and output its escaped bytes and cover it all in quotes, finished with a semicolon. It is easily done with printf(3)’s %x conversion specifier. (See the code in bin2c.c file below)

const char image[] =
#include "image.tga"

This works, but as I mentioned above, it has one itty-bitty problem:

... warning: string length '21000' is greater than the length '509' ISO C90 compilers are required to
support [-Woverlength-strings]

You live, you learn. Apparently ISO C89/C90 compilers are not required to handle strings literals larger than 509 characters long. GCC 13.2.0 seems to be handing it well, but I wanted to be in spec. As such, I simply output a char literal for each byte. This sadly makes the post-processed file larger.

bin2c.c:

#define _XOPEN_SOURCE 500

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>


int main(int argc, char** argv)
{
    int   rc  = 0;
    int   ret = EXIT_SUCCESS;
    FILE* in  = stdin;
    FILE* out = stdout;
    const char* name  = "stdin";
    char buffer[4096] = {0};
    size_t got = 0;

    if(argc == 2)
    {
        name = argv[1];
        in = fopen(name, "r");
        if(!in)
        {
            perror("fopen");
            exit(EXIT_FAILURE);
        }

        rc = snprintf(buffer, sizeof(buffer) - 1, "%s.h", name);
        if(rc < 0)
        {
            perror("snprintf");
            exit(EXIT_FAILURE);
        }

        out = fopen(buffer, "w");
        if(!out)
        {
            perror("fopen");
            fclose(in);
            exit(EXIT_FAILURE);
        }

    }
    else if(argc > 2) fprintf(stderr, "warning: ignoring excess paramters\n");

    for(;;)
    {
        size_t i;

        got = fread(buffer, 1, sizeof(buffer), in);
        rc  = ferror(in);
        if(rc)
        {
            perror("fread");
            ret = EXIT_FAILURE;
            goto end;
        }

        for(i = 0; i < got; i++) fprintf(out, "'\\x%02x',", (unsigned char)buffer[i]);

        rc = feof(in);
        if(rc) break;
    }

    fprintf(out, "\n");

end:
    fclose(in);
    fclose(out);
    return ret;
}

As you can see, the simple program just creates a header file with the same name as the input file. It also can just read from standard in, so you can chain it with pipes in scripts. I also wrote it to only depend on standard C library so anyone can use it. You are free to steal this.

With this, we finally can #include files in our source code to embed them in the binary. To demonstrate this, I wrote a simple program with an embedded TGA file that it prints to standard out using ANSI escape sequences for color. For this code to work, your terminal has to support 24-bit True Color sequences.

example.c:

#include <stdio.h>

#include "common.h"
#include "tga.c"
#include "cli.c"


const u8 image[] =
{
    #include "image.tga.h"
};

int main(void)
{
    texture tex = {0};
    tga2tex_from_mem(&tex, image, sizeof(image));
    cli_draw_tex(&tex, true); /* true - Black&White, false - True Color */

    return 0;
}

Here’s an example in black and white.

$ ls
bin2c.c cli.c  example.c  tga.c  common.h  image.tga
$ cc -std=c89 -Wall -Wextra -pedantic bin2c.c -o bin2c
$ ./bin2c image.tga
$ cc -std=c89 -Wall -Wextra -pedantic example.c -o example
$ ./example

                        ####
        ####################
      ##########        ####
    ##########            ##
  ##########      ##        ##
  ##########        ##      ##
  ########                    ##
    ############################
    ####  ##                ##
    ##  ####  ####          ##
    ####  ##  ####    ####  ##
    ##  ####          ####  ##
          ##          ####  ##

$

And here’s a screenshot in True Color: Same as above code block, but the image represented with ‘#’ symbols is in color

Success! You can easily add bin2c to your Makefile or any other build script and have it generate embeddable files that you can embed in the source. Ez pz, no need for C23! ;)

P.S. Interested in the rest of the owl? You can check it out at my git repository. It’s pretty barebones, and doesn’t handle all TGA files properly, only the non-RLE with ARGB channels in that order. But, what did you expect for just an example?

¯\_(ツ)_/¯


[Valid Atom 1.0] More…
If you can spare some $$$... Help Ukraine: Or me :P