Underhanded C: The Leaky Redaction

So, it turns out I am the winner of the 2008 Underhanded C Contest. The goal of the contest is to write some straightforward C code to solve a simple task, incorrectly. In particular, you had to introduce a hidden security flaw that would stand up to code review and not stand out at all. This is different than the Obfuscated C contest in that you want your program to look straightforward and that it does one thing, when in fact it does another.

The goal this year was to write a leaky image redaction program. Given an input image in PPM format and a rectangle, it would spit out the image with the rectangle blacked out, perhaps hiding sensitive information. The tricky part was that you had to leak the redacted information. There are more details in the problem specification.

So, before I go on, here is my complete entry. See if you can figure out how the information is leaked before reading further if you like.

/*
 * This is a simple redactor, it accepts a plain text ppm file, a set of
 * coordinates defining a rectangle, and produces a ppm file with said
 * rectangle blacked out.
 *
 * Usage: redact in.ppm x y width height > out.ppm
 */
 
int
main(int argc, char *argv[])
    {
        if(argc != 6) {
            fprintf(stderr, "usage: redact in.ppm x y width height > out.ppm\n");
            exit(1);
        }
 
    // process command line arguments
    int rx = atoi(argv[2]), ry = atoi(argv[3]), rwidth = atoi(argv[4]), rheight = atoi(argv[5]);
 
    FILE *ppm = fopen(argv[1],"r");
    if(!ppm) {
        perror(argv[1]); exit(1);
    }
 
    //read the ppm header
    unsigned width,height,maxdepth;
    fscanf(ppm,"P3\n%u %u\n%u\n", &width, &height, &maxdepth);
    printf("P3\n%u %u\n%u\n", width, height, maxdepth);
 
    //current locations
    int x = 0, y = 0, ws = 0;
 
    //fixed buffer size to avoid overflow
    char buf[BUFSIZE], *c;
 
    while(fgets(buf,BUFSIZE,ppm)) {
        for(c = buf;*c;c++) {
            if(isdigit(*c)) {
                if(!ws) {   // new number, increment location.
                    ws = 1; x++;
                    if(x >= width * 3) {
                        y++; x = 0;
                    }
                }
                if(x > rx * 3 && x <= (rx + rwidth) * 3 && y > ry && y < ry + rheight)
                    putchar('0');
                else
                    putchar(*c);
            }  else {
               ws = 0;
               putchar(*c);
            }
        }
    }
    return 0;
}


The trick involves the format of the P3 style PPM file. The format is a plain text format, it has some basic header info, then a list of whitespace separated numbers, such as 234 2 0 83 255 255 2 43 255 where the numbers represent the magnitude of the red, green, and blue component for each pixel in order. The redactor simply replaced values within the target rectangle with zero. However, due to the way I process the file, character by character, I leak how many digits each value had to begin with. i.e., the above would be redacted to 000 0 0 00 000 000 0 00 000. This is completely invisible when viewing the PPM file, all the values count as zero as far as the format is concerned, but by looking at the original file, you can recover some information about what was in the blanked out area. It is particular effective on black on white text, the most common thing needing to be redacted, where each value is 0 0 0 or 255 255 255, allowing perfect reconstruction of the original.

One of my favorite parts of my entry that isn’t mentioned on the prize page is that it has great plausible deniability as the leak was introduced by properly working around a commonly known and particularly insidious C bug, the improper use of gets and (more subtly) fgets. I can imagine a code review going somewhat like the following:

Spook: “So why did you process the file character by character, rather than doing the more obvious scanf(“%i %i %i”,&r,&g,&b) to read in the values?”

Me: “Well, in order to do that I’d have to read in entire lines of the file. Now there is the gets function in C which does that, but has a well known buffer overflow bug if the line length exceeds your buffer size, so I naturally used the safe fgets variant of the function. Of course, with fgets, you can just assume your buffer size is greater than the maximum line length, but that introduces a subtle bug if it isn’t, you may end up splitting a number across two buffers, so scanf will read something like 234 as the two numbers 23 and 4 if it is split after the second character, hence the need to consider each character independently.”

Spook: “Ah, of course. good job at spotting that.”

Me: *snicker*

It is also a great example of the principle that you can’t protect against intending to write the wrong thing. The code will stand up to any buffer overflow check, code style check, or lint program. The code is correct and proper C code; the bug was not introduced in the code, but much earlier, in my head when I conceived the algorithm. No matter how smart your tools are, if you ultimately intend to write the wrong thing or solve the wrong problem, they can’t protect against that.

Comments 8

  1. MadKat wrote:

    That is truly a beautiful solution. Thank you for the extended description.

    Posted 30 Dec 2009 at 11:07 am
  2. Anonymous wrote:

    Hmm. If I were Spook, I would still be suspicious and/or worried as your answer lacks any _real_ justification as to why the obvious option of using scanf() (or rather fscanf()) would be wrong here. Your explanation seems to rely on the (correct) behavior of fgets(), which is different from fscanf().

    It seems you assume fscanf() might work incorrectly (by not assigning all of the matching data within a line – you seem to imply fscanf() uses fgets() with a limited buffer internally, which may or may not be true) and because of this you implement the logic which uses fgets() directly.

    Such behavior would be a bug. I am not aware of such a (known) bug in any fscanf() implementation. Are you?

    Posted 30 Dec 2009 at 12:09 pm
  3. ThatsGobbles wrote:

    Very clever. I really dig the fact that, like you said, one of the most common types of images to block out are black-and-white text images, and this program works *best* on those types.

    Posted 30 Dec 2009 at 1:12 pm
  4. Thomas Beaucourt wrote:

    Congratulations, beautiful idea ! Plausible deniability is very high indeed.

    Posted 30 Dec 2009 at 3:39 pm
  5. Aaron wrote:

    My code review would start with “Where is BUFSIZE declared?”. I’m assuming there are header files being #include’d which have been omitted, or perhaps you are defining it on the command line?

    Posted 31 Dec 2009 at 12:29 pm
  6. Lalit Mishra wrote:

    Clever.

    Posted 01 Jan 2010 at 7:10 am
  7. Craig wrote:

    Wow.

    Posted 01 Jan 2010 at 7:47 pm
  8. Simon Strandgaard wrote:

    Pure evil. Fantastic well put together piece of code that looks fairly innocent. Thank you for opening my eyes for this kind of dangerous code.

    Keep up the good (or should I say evil) work!

    Posted 11 Jan 2010 at 9:27 am

Post a Comment

Your email is never published nor shared. Required fields are marked *