newtype in C, a touch of strong typing using compound literals.

The ISO C 99 standard is a great thing. In addition to desperately needed things like a dedicated bool type and codifying a lot of universally implemented extensions to the language, it added some more subtle things such as compound literals. A compound literal allows you to use a C struct or union as an initialized literal value. This makes declared types more on par with built in ones, such as numbers, characters, and strings. Here I will present just about the simplest but quite useful application of this.

Many modern languages such as Haskell have a concept of a type alias. It is called a newtype in Haskell and I will borrow that terminology here. A newtype is a type that is fully equivalent at run-time and in generated code to an existing type, but nevertheless is distinct to the type system at compile time. They are quite useful in enforcing abstraction of APIs and catching a wide variety of bugs without incurring any run-time penalty. In fact, depending on the compiler, they may actually help optimization. Imagine you represent open files as an index into a table, much as the unix API does, naturally you would represent it by an int. You may have something like this, declaring fd_t as a handy synonym to show when you are working with file descriptors.

typedef int fd_t;
/* write an int out to a file */
void put_int(fd_t fd, int c);

Now, what happens if someone forgets the order of the arguments to put_int? since fd_t is a synonym for int, the compiler has no idea you did anything wrong and happily writes garbage to a random file. Not what we wanted at all. If fd_t were a newtype rather than a typedef synonym then the program would be rejected, because fd_t and int would be distinct types.

This brings us to the following bit of code you can place in a header file newtype.h. Using compound literals, it allows the declaration of newtypes that can be used almost anywhere you can use built in types.

#ifndef NEWTYPE_H
#define NEWTYPE_H
/* this can be used for type safety, to avoid accidental casting of values from one type to another and
 * allowing alias analysis by the compiler to distinguish otherwise identical types
 *
 * NEWTYPE(new_type,old_type); declares new_type to be an alias for the already exsiting old_type
 * TO_NT(new_type,val)  converts a value to its newtype representation
 * FROM_NT(new_val)  opens up a newtyped value to get at its internal representation
 */
 
#define NEWTYPE(nty,oty) typedef struct { oty v; } nty
#define FROM_NT(ntv)       ((ntv).v)
#define TO_NT(nty,val)     ((nty){ .v = (val) })
 
#endif

Now we can modify the above example, instead of

typedef int fd_t;

we use

NEWTYPE(fd_t,int);

Another example would be the traditional lseek routine that comes with C. it is generally declared as something like

#define SEEK_SET 0
#define SEEK_CUR 1
#define SEEK_END 2
long lseek(int fd,long offset, int whence);

Now, whence is supposed to be one of the SEEK_* defined terms, and fd is supposed to be an open file descriptor, and offset is supposed to be an offset into the file. however, to the compiler on many architectures all the argument types are indistinguishable. this means that if you mix up any of them, the compiler will happliy go along. in addition, you can pass bogus values in for ‘whence’ like 5 or 6, and nothing will complain. using newtypes, you might declare the API like so.

NEWTYPE(fd_t,int);
NEWTYPE(whence_t,int);
#define SEEK_SET TO_NT(whence_t,0)
#define SEEK_CUR TO_NT(whence_t,1)
#define SEEK_END TO_NT(whence_t,2)
long lseek(fd_t fd,long offset, whence_t whence);

Now, not only are you protected from mixing up any of the arguments, you are also protected from bogus values being passed into the whence argument meaning you can elide the run-time check for valid values since the compiler will check it for you.

Although this is just the simplest use of compound literals, it is already proving to be quite useful. When combined with other C99 features such as variable length arrarys you can do clever things like non-conservative garbage collection in a clean way, or just make your code that much easier to read by not having to declare temporary structures everywhere.

Comments 4

  1. Peter wrote:

    John, I like this trick but it does not provide as much security as Haskell’s newtype – I cannot hide the constructor, so to speak. Still, I guess there’s no point in being too paranoid…

    Posted 25 Jul 2010 at 4:43 pm
  2. Patrick wrote:

    I mostly agree. I’ve thought this would be fantastic for a while, but I don’t like doing with the preprocessor in this manner because the syntax of it is so obviously forced and unnatural. Something nicer, IMHO, would be an optional flag to your CC of choice to treat typedef in the fashion of newtype. Still, this is good stuff.

    Posted 26 Feb 2011 at 9:07 pm
  3. jazz wrote:

    C in general is designed by default like that. AND I SEE IT ALL THE TIME! CHARS for boolean values…etc. HOW do you think they hacked the xbox? NO string line checks is HOW. In PASCAL(even FPC) if its not defined, its defined by default elsewhere.

    Posted 15 Mar 2013 at 11:30 pm
  4. David Andreoletti wrote:

    “Although this is just the simplest use of compound literals, it is already proving to be quite useful. When combined with other C99 features such as variable length arrarys you can do clever things like non-conservative garbage collection in a clean way, or just make your code that much easier to read by not having to declare temporary structures everywhere.”

    Would you mind providing an example for “non-conservative garbage collection” and preventing temporary structures everywhere ?

    Interesting post 🙂

    Posted 02 Apr 2013 at 2:51 am