Handling a File by its Descriptor in C

Table of Contents

The available system calls to create or open, read, write, and delete a file in C all make use of a file descriptor. So let’s discover how the operating system handles references to open files and how to manipulate files in our programs.

What is a File Descriptor?

In Unix type systems, a file descriptor ( fd for short) is a small positive integer used as reference to an open file in a process. A process, as we’ve seen in a previous article about processes, is a currently running program.

However, from the operating system’s point of view, a file is not only a text file as we might think of it as a user. A file can also be a directory or even another type of input/output resource such as a keyboard, a screen, a pipe or a network socket.

By default, each process systematically inherits three open file descriptors :

File Descriptor Name <unistd.h> <stdio.h>
0 Standard Input STDIN_FILENO stdin
1 Standard Output STDOUT_FILENO stdout
2 Standard Error STDERR_FILENO stderr

But why use file descriptors as identifiers? An integer is much simpler to process for a computer than a long path string to a file. What’s more, the reference to a file must contain much more than its location: it must also include its permissions, access mode, size, etc… And wouldn’t it be wasteful to keep several entire references to the same file if several processes opened it?

So how does the operating system maintain all of the information about every open file?

The System’s Representation of Open Files

To represent open files, the system uses three data structures:

  • A table of file descriptors per process. Each process has its own table containing a series of indexes, each one referring to an entry in the open file table.
  • An open file table, shared between all processes. Each entry in this table contains, among other things, the access mode, an offset describing the current location within the file, and a pointer to the corresponding entry in the inode table. This table also keeps count of the number references there are to this file in all of the file descriptor tables of all processes. When a process closes the file, this reference count is decremented and if it gets to 0, the entry is deleted from the table.
  • An inode (index node) table which is also shared between all processes. Each entry in the inode table describes the file in detail: the path to its location on the disk, its size, its permissions, etc.
Diagram of open file representation in a Unix-like system. Each process has its own file descriptor table that points to a common file table. This file table contains references to open files and, in turn, points to the inode table, which contains the information of each open file.
Representation of open files in Unix type systems diagram. Made with draw.io on lacontrevoie.fr.

This diagram shows the organization of references to files in use by three processes.

Two processes can of course have the same file open: process A can access file B through its descriptor 4, and the same is true for process B and its descriptor 3. This fact can help facilitate inter-process communication.

A process can also have two or more references to the same file, as with process C. This can happen when we open the same file twice. Later, we will see why such a thing could come in handy.

Opening or Creating a File in C

To manipulate a file in C, we must first inform the operating system of our intentions with the open function of the <fcntl.h> library. This system call allows us to open an existing file, or create the file if it doesn’t already exist. We must at least specify the path towards the file we’d like to open, as well as the way in which we want to access it:

int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);

So there are two versions of this open system call. Their parameters are:

  • pathname: the path towards the file in the form of a string of characters,
  • flags: an integer representing the flags indicating the access mode, which we will take a closer look at below,
  • mode: an integer representing the permissions to give to the file upon creation. This is an “optional” parameter that will be ignored if we don’t ask to create the file if it doesn’t already exist.

The open function returns the newly created file descriptor if the file was successfully added to the inode and open file tables. This file descriptor will typically be larger than 2, since fds 0, 1, and 2, are reserved for the standard input, output and error respectively. However, if there is any issue, for example if we ask to open a file that doesn’t exist, or a file we don’t have permissions for, open will return -1.

The Open Access Mode

The flags parameter in the open system call allows us to choose the file’s access mode. The <fcntl.h> library contains a large selection of symbolic constants (see the full list in the manual page for open). Among them, the most common ones are:

Symbolic Constant Description
O_RDONLY Read only
O_WRONLY Write only
O_RDWR Read and write
O_TRUNC Truncated mode. If the file exists and the access mode allows writing to it ( O_WRONLY ou O_RDWR), truncates its contents to 0 size when it is opened, with the effect of overwriting the file.
O_APPEND Append mode. The file offset is set to the end of the file at each write, with the effect of adding text to the end of the file instead of overwriting it.
O_CREAT Creates the file if it does not already exist. Its permissions must then be specified in open’s mode parameter.

For example, we can open a file in read only mode this way:

open("path/to/file", O_RDONLY);

But we can also combine flags with the bitwise | (OR) operator. For example, if we wanted to open a file in truncated write mode, we could do:

open("path/to/file", O_WRONLY | O_TRUNC);

Let’s note that indicating an access mode with O_RDONLY, O_WRONLY or O_RDWR is mandatory. We could not, for example, simply specify O_CREAT without any other indication of an access mode.

Creating a File with Open’s O_CREAT Option

As we’ve previously seen, open allows us to create a file if the specified file does not already exist. For that, we need to indicate the O_CREAT symbolic constant in its flags parameter. In this case, we must describe in the following optional parameter, mode, which permissions to give the new file.

Symbolic Constant Decimal Description
S_IRWXU 700 The owner (user) has read, write and execute rights
S_IRUSR 400 The owner has read rights
S_IWUSR 200 The owner has write rights
S_IXUSR 100 The owner has execute rights
S_IRWXG 070 The group has read, write and execute rights
S_IRGRP 040 The group has read rights
S_IWGRP 020 The group has write rights
S_IXGRP 010 The group has execute rights
S_IRWXO 007 Other users have read, write and execute rights
S_IROTH 004 Others have read rights
S_IWOTH 002 Others have write rights
S_IXOTH 001 Others have execute rights

We might notice that the numbers encoded in these symbolic constants are exactly the same as those we may use with the chmod command to change file permissions.

Of course, we can also combine these symbolic constants with the same bitwise OR operator as before. For example, to create a file in append write only mode with read and write permissions for the owner but only the read permission for the group:

open("path/to/file", O_WRONLY | O_APPEND | O_CREAT, S_IRUSR | S_IWUSR | S_IRGRP);

An alternative to shorten this permissions specification is to directly use their added numeric values, preceded by a 0. The following is exactly equivalent to the previous example:

open("path/to/file", O_WRONLY | O_APPEND | O_CREAT, 0640);

Closing a File Descriptor in C

When we are done manipulating a file, we must of course de-reference its file descriptor with the close function of the <unistd.h> library. Its prototype could not be simpler:

int close(int fd);

We supply it with a file descriptor and the system de-references it, and, if no other process has that file opened, deletes it from its open file and inode tables. Upon success, the close function returns 0, but on failure, it returns -1 and sets errno to indicate the error.

However, the close function only closes the file descriptor, it does not delete the file itself! That is unlink’s prerogative.

If we wish to completely delete a file on the hard drive with a C program, we can use the unlink system call from the <unistd.h> library. Its prototype is :

int unlink(const char *pathname);

All we have to indicate here is the path towards the file and it will be deleted once all processes using it have closed their descriptors referencing it. This system call returns 0 on success or -1 on failure.

Of course, closing and deleting a file right after opening it serves no purpose: we probably want to do something with that opened file. Why not write to it?

Writing to a File Descriptor in C

Once a file descriptor is opened with an access mode permitting writing, we will be able to write to the file it references thanks to the write system call from the <unistd.h> library. Its prototype is:

ssize_t write(int fd, const void *buf, size_t count);

And its parameters are as follows:

  • fd: the file descriptor to write to,
  • buf: a pointer to a memory zone to write to the file, typically containing a string of characters,
  • count: the number of bytes (i.e. characters) to write, typically the length of the string of characters specified in the previous parameter.

On success, the write function returns the number of bytes it has written, However, upon failure, it returns -1 and sets errno to indicate the error it encountered.

Let’s write a little program to test the behavior of the open, write and close functions:

#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

int	main(void)
{
	int	fd;

//	Open a file descriptor for the "test.txt" file, in
//	truncated write only mode, and create the file if it doesn't
//	already exist. Read and write permissions for the owner, read
//	permissions for the group.
	fd = open("test.txt", O_WRONLY | O_TRUNC | O_CREAT, 0640);
//	Stop everything if the file could not be opened
	if (fd == -1)
		return (1);
//	Print the fd of the new file
	printf("fd = %d\n", fd);
//	Write into the file descriptor
	write(fd, "Hello World!\n", 13);
//	Close the file
	close(fd);
	return (0);
}

Output of a test program showing how to write in a file in C with the write system call.

In this output, we can see that the test.txt file was indeed created since it did not exist before we ran the program. And it does contain the sentence we wrote into it with write!

If we want to append text to the end of a file rather than overwriting it, we need to remember that that is not the responsibility of the write function. The access mode specified in the open function ( O_TRUNC ou O_APPEND) controls where and how write adds to the file.

Reading from a File Descriptor in C

The read function, from the <unistd.h> library, loads the contents of a file into memory, in part or in full, using its file descriptor. Here is its prototype:

ssize_t read(int fd, void *buf, size_t count);

Its parameters are as follows:

  • fd: the file descriptor to read from,
  • buf: a pointer towards a memory area where we can temporarily store the read characters,
  • count: a size in bytes to read, in other words, the number of characters to read. This size will often correlate with the size of the memory area indicated in the previous parameter.

Then, the read function returns the number of characters that it has read, or -1 in case of error. When the read function reaches the end of the file, it will naturally return 0.

So the read function stops reading when it reaches the number of characters we indicated, or the end of the file ( EOF).

Let’s try to open a file in read only mode and read 100 characters at a time until the end of the file:

#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

#define BUFFER_SIZE 100

int	main(void)
{
	char	buf[BUFFER_SIZE + 1];	// stores the characters read
	int	fd;		// file descriptor to read
	int	nb_read;	// stores read's return value
	int	count;		// counts the number of reads

//	Open the cat.txt file in read only mode
	fd = open("cat.txt", O_RDONLY);
	if (fd == -1)
		return (1);
//	Initialize the count variables
	nb_read = -1;
	count = 0;
//	Loop as long as read does not return 0 (which would mean that
//	there is nothing more to read in the file)
	while (nb_read != 0)
	{
		// Read 100 characters with read from the
		// opened file descriptor
		nb_read = read(fd, buf, BUFFER_SIZE);
		// Stop everything if read encounters an error
		if (nb_read == -1)
		{
			printf("Read error!\n");
			return (1);
		}
		// Read does not add the terminating \0
		// We can use the number of read characters as the index
		// of the last character in the string
		buf[nb_read] = '\0';
		// Print the buffer contents after read
		printf("\e[36m%d : [\e[0m%s\e[36m]\e[0m\n", count, buf);
		count++;
	}
//	Close the opened file descriptor
	close(fd);
	return (0);
}

The cat.txt file will contain an extract from Wikipedia:

The cat (Felis catus) is a domestic species of small carnivorous mammal. It is the only domesticated species in the family Felidae and is commonly referred to as the domestic cat or house cat to distinguish it from the wild members of the family. A cat can either be a house cat, a farm cat, or a feral cat; the latter ranges freely and avoids human contact. Domestic cats are valued by humans for companionship and their ability to kill rodents. About 60 cat breeds are recognized by various cat registries.

Once executed, this code outputs:

Output of a test program in C that reads from a file descriptor with the read function.

The read function seems to have an integrated bookmark! At each call, read resumes where it left off last time. In truth, the read function does not remember its last position in the file itself: it only increments the file descriptor’s offset.

A File Descriptor’s Offset

As we may have noticed in the diagram at the beginning of this article, the references in the common table of open files contain an offset. The offset represents the current number of bytes from the beginning of the file, which gives us the current position in the file. This is what the read function increments at the end of its execution.

So when we open a file, the offset is typically 0, which places us at the beginning of the file. When we read, let’s say 12 characters, the offset is set to 12. The next time we access the file descriptor to read or even write, we will start from the current offset from the beginning of the file, in this case, the 13th character.

Both the read and write system calls are affected by the file descriptor’s offset.

So how can we alter this offset? How do we return to the beginning of the file once we’ve read from it?

Resetting the File Descriptor Offset with a New File Descriptor

The simplest solution might be to open the same file again with the open system call. This creates a new entry in the system’s open file table, with an offset of 0 by default.

Let’s test this idea by modifying our previous program. We will open the same file twice to have two references to it, then we will read from the first file descriptor and then the second:

#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

#define BUFFER_SIZE 100

void	read_and_print_100(int fd)
{
	char	buf[BUFFER_SIZE + 1];	// store the read characters
	int	nb_read;	// store read's return value

//	Read 100 characters from the file descriptor
	nb_read = read(fd, buf, BUFFER_SIZE);
//	Stop everything in case of error
	if (nb_read == -1)
	{
		printf("Read error!\n");
		return ;
	}
//	Add \0 to the end of the buffer
	buf[nb_read] = '\0';
//	Print the file descriptor and then the buffer contents
	printf("\e[36mfd %d : [\e[0m%s\e[36m]\e[0m\n", fd, buf);
	return ;
}

int	main(void)
{
	int fd1;	// file descriptor 1 (cat.txt)
	int fd2;	// file descriptor 2 (cat.txt)

//	Open the cat.txt file in read only mode
	fd1 = open("cat.txt", O_RDONLY);
//	Open the cat.txt file in read only mode again
	fd2 = open("cat.txt", O_RDONLY);
	if (fd1 == -1 || fd2 == -1)
		return (1);
//	Print the first 100 characters from descriptor 1
	read_and_print_100(fd1);
//	Print the next 100 characters from descriptor 1
	read_and_print_100(fd1);
//	Print the first 100 characters from descriptor 2
	read_and_print_100(fd2);
//	Close opened file descriptors
	close(fd1);
	close(fd2);
	return (0);
}

Output of a test program in C that shows that a file descriptor has an offset that controls the location inside the associated file. When the file is opened a second time, the two file descriptors do not share offsets.

As this result shows, we have two file descriptors, 3 and 4, which both refer to the cat.txt file. We read 100 characters twice from fd 3, which means the offset is now 200. Then, when we read from fd 4, which we haven’t touched until now, we get the 100 first characters in the file. This means that even though they refer to the same file, both file descriptors are distinct: increasing the offset of one does not affect the other.

Adjusting the File Descriptor Offset with Lseek

There is another option to set a file descriptor’s offset to the position we want. Its the lseek function of the <unistd.h> library. It allows us a much finer control over our position in the file. Its prototype is:

off_t lseek(int fd, off_t offset, int whence);

Let’s take a closer look at its parameters:

  • fd: the file descriptor for which to change the offset,
  • offset: the number of bytes to move the offset,
  • whence: the position from which to apply the offset. Possible options are:
    • SEEK_SET: apply the offset relative to the beginning of the file (overwrites the offset with the one specified in the previous parameter),
    • SEEK_CUR: apply the offset relative to the current position in the file (add the offset specified in the previous parameter to the current offset),
    • SEEK_END: apply the offset relative to the end of the file (add the offset to the end of the file).

We must be careful with lseek since it allows us to set our offset past the end of the file! If we write past the end of the file, the file’s size will not change and it will create “holes”, zones filled with \0 in the middle of the file. See the manual page for lseek for more information on this topic.

If lseek succeeds, it returns the new offset, meaning the total number of bytes from the beginning of the file. On failure, it returns -1 and sets errno to indicate the error code.

So, let’s open our file again to read 100 characters, then let’s experiment with lseek to change the offset between two reads:

#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

#define BUFFER_SIZE

void	read_and_print_100(int fd)
{
	char	buf[BUFFER_SIZE + 1];	// store the read characters
	int	nb_read;	// store read's return value

//	Read 100 characters from the file descriptor
	nb_read = read(fd, buf, BUFFER_SIZE);
//	Stop everything in case of error
	if (nb_read == -1)
	{
		printf("Read error!\n");
		return ;
	}
//	Add \0 to the end of the buffer
	buf[nb_read] = '\0';
//	Print the file descriptor and then the buffer contents
	printf("\e[36mfd %d : [\e[0m%s\e[36m]\e[0m\n", fd, buf);
	return ;
}

int	main(void)
{
	int	fd;	// file descriptor to read

//	Open the cat.txt file in read only mode
	fd = open("cat.txt", O_RDONLY);
	if (fd == -1)
		return (1);
//	Print the first 100 characters from the file descriptor
	read_and_print_100(fd);
//	Reset offset to the beginning of the file
	lseek(fd, 0, SEEK_SET);
	read_and_print_100(fd);
//	Set the offset to 4 characters from the beginning of the file
	lseek(fd, 4, SEEK_SET);
	read_and_print_100(fd);
//	Set the offset to 1 character from current position
	lseek(fd, 1, SEEK_CUR);
	read_and_print_100(fd);
//	Set offset to be at the end of the file
	lseek(fd, 0, SEEK_END);
	read_and_print_100(fd);
//	Close opened file descriptor
	close(fd);
	return (0);
}

Output of a program in C showing the lseek function that changes a file descriptor offset.

Duplicating File Descriptors with dup/dup2

It can sometimes be useful to duplicate a file descriptor in order to save it as a backup or replace another one. This is the case, for example, in the context of a standard input or output redirection towards a file.

The dup and dup2 system calls from the <unistd.h> library enable us to duplicate a file descriptor. Here are their prototypes:

int dup(int oldfd);
int dup2(int oldfd, int newfd);

Both take the file descriptor we want to duplicate ( oldfd) as a parameter and return the new file descriptor, or -1 in case of error. The difference between the two is that dup automatically chooses the smallest unused number for the new file descriptor, whereas with dup2, we can specify which number we want ( newfd).

We have to keep in mind that dup2 is going to try to close the newfd if it is in use before transforming it into a copy of oldfd. However, if oldfd is not a valid file descriptor, the call will fail and newfd will not be closed. If oldfd and newfd are identical and valid, dup2 will just return the newfd without doing anything further.

The Interchangeability of Duplicated Descriptors

After a successful call to dup or dup2, the old and new file descriptors are interchangeable. They refer to the exact same open file and share its offset and other attributes. For example, if we read the first few characters of a file with the read function with one of the descriptors, the offset will be changed for both file descriptors, not only the one we read with.

Yet we previously saw that if we opened the same file twice, the two descriptors did not share their offsets in this way. So why does it work differently for duplicated file descriptors? The reason for this is best illustrated with a diagram:

Diagram showing the difference between opening a file twice and duplicating its file descriptor. In the first case, each file descriptor refers to different entries in the system's open files table. In the second, the two descriptors point to the same entry in the open files table.
The difference between opening a file twice and duplicating its file descriptor diagram. Made with draw.io on lacontrevoie.fr.

A file descriptor opened with open has its own entry in the system’s open files table, with its own offset. However, a duplicated file descriptor shares the open files table entry with its clone, which means they also share an offset.

Usage Example of dup/dup2

In the following example, we will open a alpha.txt file, which just contains the alphabet, and then duplicate the open file descriptor with dup. We will read from the file descriptor, and then from the file descriptor’s copy. After that, we will once again duplicate the file descriptor, this time on the standard input (fd 0) with dup2. Finally, we will create an infinite loop so we can have the time to go check the open file descriptors in our process.

#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>

int	main(void)
{
	int	fd;
	int	fd_copy;
	int	nb_read;
	char	buf[15];

//	Print this process' PID to be able to check open fds
	printf("My PID is %d\n", getpid());
//	Open the file and get its file descriptor
	fd = open("alpha.txt", O_RDONLY);
	if (fd == -1)
		return (1);
//	Duplicate the fd on the smallest unused fd
	fd_copy = dup(fd);
	if (!fd_copy)
		return (1);
//	Read the original fd
	nb_read = read(fd, buf, 10);
	if (nb_read == -1)
		return (1);
	buf[nb_read] = '\0';
	printf("fd %d contains : %s\n", fd, buf);
//	Read the duplicated fd
	nb_read = read(fd_copy, buf, 10);
	if (nb_read == -1)
		return (0);
	buf[nb_read] = '\0';
	printf("fd %d contains : %s\n", fd_copy, buf);
//	Duplicate fd on standard input
	dup2(fd, STDIN_FILENO);
//	Infinite loop to go check open fds in this
//	process with ls -la /proc/PID/fd
	while (1)
		;
}

The alpha.txt file contains:

ABCDEFGHIJKLMNOPQRSTUVWXYZ

Output of a test program in C that shows file descriptor duplication with the dup and dup2 system calls.

Here we can see that reading from the first file descriptor also changed its copy’s offset. Before killing the process with ctrl-c, we can go open another terminal to see our process’s list of open file descriptors with this command:

$ ls -la /proc/PID/fd

List of file descriptors opened in a process that duplicates fds.

We can see that fd 3 and fd 4 are both indeed referring to our alpha.txt file, but that’s not all! The standard input (fd 0) was also redirected to our file thanks to dup2.


A little tip to share, a nagging question to ask, or a strange discovery to discuss about file and file descriptor handling in C? I’d love to read and respond to it all in the comments. Happy coding !

Sources and Further Reading

  • Linux Programmer’s Manual:
  • Bryant, R. E., O’Hallaron, D. R., 2016, Computer Systems: A Programmer’s Perspective, Chapter 10: System-Level I/O, pp. 925-949
  • Kernighan, B. W., Ritchie, D. M., 1988, The C Programming Language, Second Edition, Chapter 8 - The UNIX System Interface, pp. 169-175
  • Simard, E., Everything You Need to Know About inodes in Linux [linuxhandbook.com]

Comments

Related Posts

Sockets and Network Programming in C

In this hyper-connected electronic world, knowing how to send and receive data remotely with sockets is crucial.

Read More

Coloring Terminal Text: tput and ANSI Escape Sequences

A terminal with black-on-white text or vice versa is not very interesting or attractive or informative.

Read More

CTF Walkthrough: Wonderland on TryHackMe

Wonderland is a freely-available capture the flag (CTF) challenge created by NinjaJc01 on TryHackMe.

Read More