Underhanded C: The Leaky Redaction

So, it turns out I am the winner of the 2008 Underhanded C Contest. The goal of the contest is to write some straightforward C code to solve a simple task, incorrectly. In particular, you had to introduce a hidden security flaw that would stand up to code review and not stand out at all. This is different than the Obfuscated C contest in that you want your program to look straightforward and that it does one thing, when in fact it does another.

The goal this year was to write a leaky image redaction program. Given an input image in PPM format and a rectangle, it would spit out the image with the rectangle blacked out, perhaps hiding sensitive information. The tricky part was that you had to leak the redacted information. There are more details in the problem specification.

So, before I go on, here is my complete entry. See if you can figure out how the information is leaked before reading further if you like.

/*
 * This is a simple redactor, it accepts a plain text ppm file, a set of
 * coordinates defining a rectangle, and produces a ppm file with said
 * rectangle blacked out.
 *
 * Usage: redact in.ppm x y width height > out.ppm
 */
 
int
main(int argc, char *argv[])
    {
        if(argc != 6) {
            fprintf(stderr, "usage: redact in.ppm x y width height > out.ppm\n");
            exit(1);
        }
 
    // process command line arguments
    int rx = atoi(argv[2]), ry = atoi(argv[3]), rwidth = atoi(argv[4]), rheight = atoi(argv[5]);
 
    FILE *ppm = fopen(argv[1],"r");
    if(!ppm) {
        perror(argv[1]); exit(1);
    }
 
    //read the ppm header
    unsigned width,height,maxdepth;
    fscanf(ppm,"P3\n%u %u\n%u\n", &width, &height, &maxdepth);
    printf("P3\n%u %u\n%u\n", width, height, maxdepth);
 
    //current locations
    int x = 0, y = 0, ws = 0;
 
    //fixed buffer size to avoid overflow
    char buf[BUFSIZE], *c;
 
    while(fgets(buf,BUFSIZE,ppm)) {
        for(c = buf;*c;c++) {
            if(isdigit(*c)) {
                if(!ws) {   // new number, increment location.
                    ws = 1; x++;
                    if(x >= width * 3) {
                        y++; x = 0;
                    }
                }
                if(x > rx * 3 && x <= (rx + rwidth) * 3 && y > ry && y < ry + rheight)
                    putchar('0');
                else
                    putchar(*c);
            }  else {
               ws = 0;
               putchar(*c);
            }
        }
    }
    return 0;
}


The trick involves the format of the P3 style PPM file. The format is a plain text format, it has some basic header info, then a list of whitespace separated numbers, such as 234 2 0 83 255 255 2 43 255 where the numbers represent the magnitude of the red, green, and blue component for each pixel in order. The redactor simply replaced values within the target rectangle with zero. However, due to the way I process the file, character by character, I leak how many digits each value had to begin with. i.e., the above would be redacted to 000 0 0 00 000 000 0 00 000. This is completely invisible when viewing the PPM file, all the values count as zero as far as the format is concerned, but by looking at the original file, you can recover some information about what was in the blanked out area. It is particular effective on black on white text, the most common thing needing to be redacted, where each value is 0 0 0 or 255 255 255, allowing perfect reconstruction of the original.

One of my favorite parts of my entry that isn’t mentioned on the prize page is that it has great plausible deniability as the leak was introduced by properly working around a commonly known and particularly insidious C bug, the improper use of gets and (more subtly) fgets. I can imagine a code review going somewhat like the following:

Spook: “So why did you process the file character by character, rather than doing the more obvious scanf(“%i %i %i”,&r,&g,&b) to read in the values?”

Me: “Well, in order to do that I’d have to read in entire lines of the file. Now there is the gets function in C which does that, but has a well known buffer overflow bug if the line length exceeds your buffer size, so I naturally used the safe fgets variant of the function. Of course, with fgets, you can just assume your buffer size is greater than the maximum line length, but that introduces a subtle bug if it isn’t, you may end up splitting a number across two buffers, so scanf will read something like 234 as the two numbers 23 and 4 if it is split after the second character, hence the need to consider each character independently.”

Spook: “Ah, of course. good job at spotting that.”

Me: *snicker*

It is also a great example of the principle that you can’t protect against intending to write the wrong thing. The code will stand up to any buffer overflow check, code style check, or lint program. The code is correct and proper C code; the bug was not introduced in the code, but much earlier, in my head when I conceived the algorithm. No matter how smart your tools are, if you ultimately intend to write the wrong thing or solve the wrong problem, they can’t protect against that.

Comments 26

  1. MadKat wrote:

    That is truly a beautiful solution. Thank you for the extended description.

    Posted 30 Dec 2009 at 11:07 am
  2. Anonymous wrote:

    Hmm. If I were Spook, I would still be suspicious and/or worried as your answer lacks any _real_ justification as to why the obvious option of using scanf() (or rather fscanf()) would be wrong here. Your explanation seems to rely on the (correct) behavior of fgets(), which is different from fscanf().

    It seems you assume fscanf() might work incorrectly (by not assigning all of the matching data within a line – you seem to imply fscanf() uses fgets() with a limited buffer internally, which may or may not be true) and because of this you implement the logic which uses fgets() directly.

    Such behavior would be a bug. I am not aware of such a (known) bug in any fscanf() implementation. Are you?

    Posted 30 Dec 2009 at 12:09 pm
  3. ThatsGobbles wrote:

    Very clever. I really dig the fact that, like you said, one of the most common types of images to block out are black-and-white text images, and this program works *best* on those types.

    Posted 30 Dec 2009 at 1:12 pm
  4. Thomas Beaucourt wrote:

    Congratulations, beautiful idea ! Plausible deniability is very high indeed.

    Posted 30 Dec 2009 at 3:39 pm
  5. Aaron wrote:

    My code review would start with “Where is BUFSIZE declared?”. I’m assuming there are header files being #include’d which have been omitted, or perhaps you are defining it on the command line?

    Posted 31 Dec 2009 at 12:29 pm
  6. Lalit Mishra wrote:

    Clever.

    Posted 01 Jan 2010 at 7:10 am
  7. Craig wrote:

    Wow.

    Posted 01 Jan 2010 at 7:47 pm
  8. Simon Strandgaard wrote:

    Pure evil. Fantastic well put together piece of code that looks fairly innocent. Thank you for opening my eyes for this kind of dangerous code.

    Keep up the good (or should I say evil) work!

    Posted 11 Jan 2010 at 9:27 am
  9. Deepak Jangid wrote:

    Awesome concept man, you really are a winner !!!

    Posted 04 Sep 2010 at 3:18 am
  10. jonas wrote:

    To Aaron: it’s an effing macro in the C99 standard, defined in by stdio.h.

    Posted 10 Oct 2010 at 7:14 am
  11. ExplicatioExMachina wrote:

    Nice. OpenSource wouldn’t help here to find this backdoor.

    Is it a bug or a feature of PPM to read ’000′s as ’0′?

    Posted 15 Dec 2010 at 4:24 am
  12. Peter da Silva wrote:

    Should have parsed the PPM parameters without using fscanf. I hardly ever use *scanf, it makes error handling and recovery too hard.

    Posted 23 Dec 2010 at 4:48 am
  13. Jake wrote:

    You misdirected me with this line:

    if(x > rx * 3 && x ry && y < ry + rheight)

    Is there a reason y isn't paralleling x in this case?

    y < ry + rheight

    vs

    x <= (rx + rwidth)

    Posted 23 Dec 2010 at 10:23 am
  14. Craig wrote:

    Interesting. I actually saw the bug, but I suspect it was because between knowing you were supposed to be hiding something (so writing 0 and 000 is inconsistent) combined with ignorance (I had to lookup the PPM file format and refresh myself on C-programming [what was that loop doing :-)] ) that I didn’t just gloss over ‘obvious’ behavior. Definitely interesting in that for most purposes… the program ‘works’

    Posted 23 Dec 2010 at 11:12 am
  15. Jesus of Nazareth wrote:

    I don’t get it

    Posted 23 Dec 2010 at 11:39 am
  16. Witold Baryluk wrote:

    I also have been thinking about this problem, but eventually give it up, as I have no idea how I can hide the fact that I do something evil (any binary tricks, or permutations, will be very susicious).

    Your solution is very good, brillant, and well deserves winning. Congratulations.

    Posted 23 Dec 2010 at 1:52 pm
  17. Witold Baryluk wrote:

    Ah I forgot one question to ask. What is BUFSIZE and where it is defined (and what are includes)?

    Posted 23 Dec 2010 at 1:53 pm
  18. Robert Robertson wrote:

    Nobody actually uses scanf in production code, and if they do, they wouldn’t get where the leak is anyways.

    Posted 23 Dec 2010 at 3:06 pm
  19. JohnH wrote:

    This would only work if I didn’t know that the PPM specification state that no line in the file should be longer than 70 characters, which removes the excuse of not using fgets and a line length of appropriate size.

    Posted 23 Dec 2010 at 4:55 pm
  20. Janne wrote:

    I think you can justify this kind of a coding style to a spook also in performance terms: you don’t perform the costly ascii-to-integer parsing, just a very cheap isdigit() call.

    Very commendable.

    Posted 24 Dec 2010 at 3:45 am
  21. brian wrote:

    If I was the Spook would reject the code as not having enough comments on whats going on. Plus the code layout is setup to hide problems. Code would not get through a normal code review here, let alone a secure one!

    Sorry – its just bad code!

    The penality here for code llike that is a round of drinks!

    Posted 24 Dec 2010 at 4:32 am
  22. Ralph wrote:

    Terrifying.

    Posted 25 Dec 2010 at 10:25 am
  23. Ranju V wrote:

    Very slick indeed!

    Posted 25 Dec 2010 at 11:10 pm
  24. Anonymous wrote:

    a short comment about scanf:

    actually nobody uses scanf, even though the PPM line should not be larger than 70 characters because you don’t whether some stupid user uses a misplaced file ;)
    -> nobody likes programms with security lacks especially not if they’re obvious to any programmer ;)

    on the other side, you could have used fgets and sscanf in combination.

    But still, just gread work.
    You deserve it!

    Posted 26 Dec 2010 at 9:59 am
  25. Rilhas wrote:

    The usage of fgets does not excuse the programmer. The fact that fgets takes a max buffer size parameter should alert the programmer that he/she should think a bit about whether or not the file could contain lines longer than that buffer size and, if so, what the consequences would be. So the programmer shouldn’t be able to simply not consider cases where the files contain lines larger than the buffer.

    The mere act of selecting a buffer size should not be random and, so, should cause the programmer to stop and think about it (using an already defined symbol tucked away in some standard include is no better).

    So the spook either dismisses it as incompetence or believes it was deliberate. Depending on the programmer’s history and persuasion skills the incompetence scenario may be hard to swallow by the spook.

    The fact that it can be missed when reading the code is another matter altogether. I missed it, sure. But if I was writing the code I wouldn’t make such a bug. It would be my job not to make such a mistake. I’m not being paid just to press keys on the keyboard, my bosses expect me to be using my brain well and keep all my senses on high alert.

    And if I did write the bug (perhaps while drunk or under the influence of hallucinogenic drugs) I would blame myself for being stupid, I would never think of it as “oops… there’s no way a fine-and-dandy programmer like me could ever catch that one… moving on to the next application…”.

    I always thought that medical doctors, lawyers, psychologists, and possibly others, should all be necessarily good. Maybe even excellent. These professions have no room for the “sufficient”, “not-so-good”, etc., all those have to be weeded out, simply because people can get really hurt if a certain minimum level of quality in their work cannot be guaranteed. Mediocre is not enough, because those, on average, fail a number of times comparable to the number of times they succeed. And that is just not good enough if you happen to be their patient/client on one of the bad days, so all days have to be good days. Good professions have an extremely high ratio of good to bad days.

    Back in the 80′s I didn’t think programming would end up being one of those professions. But, in fact, today people can get really hurt from bad code, either because some company goes bankrupt after some security flaw is exploited and all its 150 workers lose their jobs, or because users trusted some application implicitly and lost a lot of money when their credit card number was stolen by some malware or virus. And many other scenarios in between.

    So, looking at how things are made today, I think the IT industry is moving in the wrong direction. Well, at least the bulk of it. Hiring cheap novice or unexperienced programmers, or even flat out bad programmers, and then getting them to learn on the job doing final production code can endup being very damaging to third parties (the users of the software, for example). Even worse, keeping bad programmers just because they are cheap makes someone else (probably a third party, like a user) end up paying for the rest of the cost.

    In this case, hiring a rookie to do the code of this entry would, of course, be a bad idea (assuming it was supposed to be used in very critically important situations). A rookie could (and probably would) have made such a mistake, but a more seasoned programmer should have not. And if he/she did? Not seasoned enough then, his/her boss should have known that.

    We may be reaching a time when programmers should work like medical doctors. While they are rookies (for the first 10 years of their career) they are supervised, and do nothing without close observation by their superiors. They are, in fact, trainees.

    For medical doctors this minimizes loss of lives (and other negative consequences), but it is expensive. True. Somewhere along the way people started thinking that programming was easy and started developing all kinds of crappy software.

    At any given time I have something crashing on me, or not doing what it was supposed to do, caused by crappy software. My car’s automatic windshield wiper cannot be used in its most sensitive position, otherwise it sometimes enters a loop of max activity even if it is not raining. The programmer forgot IIR filters sometimes do that and didn’t take the appropriate measures. My phone sometimes enters a state where the screen is non-responsive, but calls can still come in (although I’m unable to answer them). This is a known software bug on the phones. My set top box at home sometimes crashes when the right conditions are met (for example, I have a recording, which I keep for fun, that makes the device crash every time I rewinding to the point where the actor was crossing a door).

    We are surrounded by software-based crap, done without quality. Many of us are the culprits, we want to by brand new, unproven, flashy, fast, and cheap, and there is still no real culture for demanding quality and assurances from gadget makers as opposed to an already well established culture of selecting refrigerators and dishwashers based on how many decades their manufacturers guarantee their devices to work correctly.

    … well, I digress. Very nice post, an excellent example of how to make crappy code if we are not careful.

    Kudos!

    Thanks!

    Posted 27 Dec 2010 at 7:10 am
  26. nike free singapore wrote:

    I’m impressed, I have to say. Really not often do I encounter a weblog that’s each educative and entertaining, and let me tell you, you have hit the nail on the head. Your concept is outstanding; the problem is something that not enough people are talking intelligently about. I am very happy that I stumbled throughout this in my seek for something referring to this.

    Posted 13 May 2011 at 10:01 pm

Trackbacks & Pingbacks 4

  1. From Underhanded C: The Leaky Redaction | Sole Genius on 23 Dec 2010 at 5:47 pm

    [...] by sanitybit to programming [link] [32 comments] reddit: the voice of the internet — news before it happens @ [...]

  2. From Tomatensuppe on 27 Dec 2010 at 4:59 am

    Underhanded C: The Leaky Redaction | Not A Number…

  3. From Linux Flame - Seite 32 on 21 May 2011 at 5:06 am

    [...] kann aber den Source lesen und daher ists viel sicherer!!1!" Thematik immer wieder aufkommt: Not A Number – Underhanded C: The Leaky Redaction backdoored Imagebearbeitungsprogramm vom obigem Link    [...]

  4. From Internet MyBB-Downloads waren infiziert on 26 Oct 2011 at 4:20 am

    [...] [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *