Jacob Sparre Andersen

Typed file descriptors - something to add to Unix?

Wouldn't it be nice, if we didn't have to mess with the file(1) command, when we want to know the content type of a file? Wouldn't it be nice, if our pipes could tell which kind of data is supposed to flow through them? I think it would be, so lets give it a try:

The most widely accepted standard for identifying file types is MIME (RFC-2045), so it makes sense to use that for the external identification, even if luminaries like Linus Torvalds don't like it.

The ownership management on Unix systems is also a widely accepted way of doing things, so it makes sense to use something similar for the internal management of file types, and for the mapping to and from the external identification system.

These two things put together is the basis of my proposal. All files on a system shall have an integer number identifying their content type as a part of the file status. The same should also be the case for pipes and other open files. The operating system shall provide access to a database which contains the mapping between internal type IDs and external MIME type strings.

Although I usually do Bash and Ada programming, I accept that C is the primary implementation language for Unix systems, and use that in my examples below.

Setting and getting type IDs

When we create a file, open a pipe, create a socket, etc., a part of the process should be to tell the operating system which data type the file/pipe/socket is intended to store/transmit. We can manage this by extending the profiles of some existing functions:

int open(const char *pathname, int flags, mode_t mode, tid_t tid);
int creat(const char *pathname, mode_t mode, tid_t tid);
int pipe(int filedes[2], tid_t tid);
int socket(int domain, int type, int protocol, tid_t tid);

In addition to this, we may want to be able to set the content type of an existing file:

int settid(const char *pathname, tid_t tid);

One way of allowing programs to access the content type of a file is to extend struct stat returned by stat(2) with a field tid_t tid;. Additionally we may want a function which directly accesses the type ID of a file descriptor:

tid_t gettid(int fd);


To map between the internal representation of data types and MIME types, we need to add some functions to the operating system. They could be:

char * getmime(tid_t tid);

The getmime() function returns a pointer to a statically allocated string containing the MIME type string that matches the type ID tid. If the type ID tid isn't found in the type database, NULL is returned.

tid_t gettid(const char *mime);

The gettid() function returns a type ID that matches the MIME type string mime. If the MIME type string mime isn't found in the type database, 0 (zero) is returned.

The typical implementation of these functions could be to look up the data in the file /etc/types, which would be a colon (:) separated text file with type IDs in the first column and MIME type strings in the second column.


What reasons are there that we shouldn't add this feature to Unix? Should we add something like it, but in a different way? How? Why? — Or should I just get started coding and testing?

Curriculum vitaePublication listProductsScienceRSS feeds

JSA Research & Innovation • Jægerparken 5, 2. th. • 2970 Hørsholm • Danmark