Handling a File by its Descriptor in C
- Mia Combeau
- C
- October 26, 2022
Table of Contents
The available system calls to create or open, read, write, and delete a file in C all make use of a file descriptor. So let’s discover how the operating system handles references to open files and how to manipulate files in our programs.
What is a File Descriptor?
In Unix type systems, a file descriptor ( fd for short) is a small positive integer used as reference to an open file in a process. A process, as we’ve seen in a previous article about processes, is a currently running program.
However, from the operating system’s point of view, a file is not only a text file as we might think of it as a user. A file can also be a directory or even another type of input/output resource such as a keyboard, a screen, a pipe or a network socket.
By default, each process systematically inherits three open file descriptors :
File Descriptor | Name | <unistd.h> |
<stdio.h> |
---|---|---|---|
0 |
Standard Input | STDIN_FILENO |
stdin |
1 |
Standard Output | STDOUT_FILENO |
stdout |
2 |
Standard Error | STDERR_FILENO |
stderr |
But why use file descriptors as identifiers? An integer is much simpler to process for a computer than a long path string to a file. What’s more, the reference to a file must contain much more than its location: it must also include its permissions, access mode, size, etc… And wouldn’t it be wasteful to keep several entire references to the same file if several processes opened it?
So how does the operating system maintain all of the information about every open file?
The System’s Representation of Open Files
To represent open files, the system uses three data structures:
- A table of file descriptors per process. Each process has its own table containing a series of indexes, each one referring to an entry in the open file table.
- An open file table, shared between all processes. Each entry in this table contains, among other things, the access mode, an offset describing the current location within the file, and a pointer to the corresponding entry in the inode table. This table also keeps count of the number references there are to this file in all of the file descriptor tables of all processes. When a process closes the file, this reference count is decremented and if it gets to 0, the entry is deleted from the table.
- An inode (index node) table which is also shared between all processes. Each entry in the inode table describes the file in detail: the path to its location on the disk, its size, its permissions, etc.
This diagram shows the organization of references to files in use by three processes.
Two processes can of course have the same file open: process A can access file B through its descriptor 4, and the same is true for process B and its descriptor 3. This fact can help facilitate inter-process communication.
A process can also have two or more references to the same file, as with process C. This can happen when we open the same file twice. Later, we will see why such a thing could come in handy.
Opening or Creating a File in C
To manipulate a file in C, we must first inform the operating system of our intentions with the open
function of the <fcntl.h>
library. This system call allows us to open an existing file, or create the file if it doesn’t already exist. We must at least specify the path towards the file we’d like to open, as well as the way in which we want to access it:
int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);
So there are two versions of this open
system call. Their parameters are:
- pathname: the path towards the file in the form of a string of characters,
- flags: an integer representing the flags indicating the access mode, which we will take a closer look at below,
- mode: an integer representing the permissions to give to the file upon creation. This is an “optional” parameter that will be ignored if we don’t ask to create the file if it doesn’t already exist.
The open
function returns the newly created file descriptor if the file was successfully added to the inode and open file tables. This file descriptor will typically be larger than 2, since fds 0, 1, and 2, are reserved for the standard input, output and error respectively. However, if there is any issue, for example if we ask to open a file that doesn’t exist, or a file we don’t have permissions for, open
will return -1.
The Open Access Mode
The flags parameter in the open system call allows us to choose the file’s access mode. The <fcntl.h>
library contains a large selection of symbolic constants (see the full list in the manual page for open). Among them, the most common ones are:
Symbolic Constant | Description |
---|---|
O_RDONLY |
Read only |
O_WRONLY |
Write only |
O_RDWR |
Read and write |
O_TRUNC |
Truncated mode. If the file exists and the access mode allows writing to it ( O_WRONLY ou O_RDWR ), truncates its contents to 0 size when it is opened, with the effect of overwriting the file. |
O_APPEND |
Append mode. The file offset is set to the end of the file at each write, with the effect of adding text to the end of the file instead of overwriting it. |
O_CREAT |
Creates the file if it does not already exist. Its permissions must then be specified in open ’s mode parameter. |
For example, we can open a file in read only mode this way:
open("path/to/file", O_RDONLY);
But we can also combine flags with the bitwise |
(OR) operator. For example, if we wanted to open a file in truncated write mode, we could do:
open("path/to/file", O_WRONLY | O_TRUNC);
Let’s note that indicating an access mode with O_RDONLY
, O_WRONLY
or O_RDWR
is mandatory. We could not, for example, simply specify O_CREAT
without any other indication of an access mode.
Creating a File with Open’s O_CREAT Option
As we’ve previously seen, open allows us to create a file if the specified file does not already exist. For that, we need to indicate the O_CREAT
symbolic constant in its flags parameter. In this case, we must describe in the following optional parameter, mode, which permissions to give the new file.
Symbolic Constant | Decimal | Description |
---|---|---|
S_IRWXU |
700 | The owner (user) has read, write and execute rights |
S_IRUSR |
400 | The owner has read rights |
S_IWUSR |
200 | The owner has write rights |
S_IXUSR |
100 | The owner has execute rights |
S_IRWXG |
070 | The group has read, write and execute rights |
S_IRGRP |
040 | The group has read rights |
S_IWGRP |
020 | The group has write rights |
S_IXGRP |
010 | The group has execute rights |
S_IRWXO |
007 | Other users have read, write and execute rights |
S_IROTH |
004 | Others have read rights |
S_IWOTH |
002 | Others have write rights |
S_IXOTH |
001 | Others have execute rights |
We might notice that the numbers encoded in these symbolic constants are exactly the same as those we may use with the chmod
command to change file permissions.
Of course, we can also combine these symbolic constants with the same bitwise OR operator as before. For example, to create a file in append write only mode with read and write permissions for the owner but only the read permission for the group:
open("path/to/file", O_WRONLY | O_APPEND | O_CREAT, S_IRUSR | S_IWUSR | S_IRGRP);
An alternative to shorten this permissions specification is to directly use their added numeric values, preceded by a 0
. The following is exactly equivalent to the previous example:
open("path/to/file", O_WRONLY | O_APPEND | O_CREAT, 0640);
Closing a File Descriptor in C
When we are done manipulating a file, we must of course de-reference its file descriptor with the close
function of the <unistd.h>
library. Its prototype could not be simpler:
int close(int fd);
We supply it with a file descriptor and the system de-references it, and, if no other process has that file opened, deletes it from its open file and inode tables. Upon success, the close
function returns 0, but on failure, it returns -1 and sets errno to indicate the error.
However, the close function only closes the file descriptor, it does not delete the file itself! That is unlink
’s prerogative.
Deleting a File in C with Unlink
If we wish to completely delete a file on the hard drive with a C program, we can use the unlink
system call from the <unistd.h>
library. Its prototype is :
int unlink(const char *pathname);
All we have to indicate here is the path towards the file and it will be deleted once all processes using it have closed their descriptors referencing it. This system call returns 0 on success or -1 on failure.
Of course, closing and deleting a file right after opening it serves no purpose: we probably want to do something with that opened file. Why not write to it?
Writing to a File Descriptor in C
Once a file descriptor is opened with an access mode permitting writing, we will be able to write to the file it references thanks to the write
system call from the <unistd.h>
library. Its prototype is:
ssize_t write(int fd, const void *buf, size_t count);
And its parameters are as follows:
- fd: the file descriptor to write to,
- buf: a pointer to a memory zone to write to the file, typically containing a string of characters,
- count: the number of bytes (i.e. characters) to write, typically the length of the string of characters specified in the previous parameter.
On success, the write
function returns the number of bytes it has written, However, upon failure, it returns -1 and sets errno to indicate the error it encountered.
Let’s write a little program to test the behavior of the open
, write
and close
functions:
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
int main(void)
{
int fd;
// Open a file descriptor for the "test.txt" file, in
// truncated write only mode, and create the file if it doesn't
// already exist. Read and write permissions for the owner, read
// permissions for the group.
fd = open("test.txt", O_WRONLY | O_TRUNC | O_CREAT, 0640);
// Stop everything if the file could not be opened
if (fd == -1)
return (1);
// Print the fd of the new file
printf("fd = %d\n", fd);
// Write into the file descriptor
write(fd, "Hello World!\n", 13);
// Close the file
close(fd);
return (0);
}
In this output, we can see that the test.txt file was indeed created since it did not exist before we ran the program. And it does contain the sentence we wrote into it with write
!
If we want to append text to the end of a file rather than overwriting it, we need to remember that that is not the responsibility of the write function. The access mode specified in the open
function ( O_TRUNC
ou O_APPEND
) controls where and how write
adds to the file.
Reading from a File Descriptor in C
The read
function, from the <unistd.h>
library, loads the contents of a file into memory, in part or in full, using its file descriptor. Here is its prototype:
ssize_t read(int fd, void *buf, size_t count);
Its parameters are as follows:
- fd: the file descriptor to read from,
- buf: a pointer towards a memory area where we can temporarily store the read characters,
- count: a size in bytes to read, in other words, the number of characters to read. This size will often correlate with the size of the memory area indicated in the previous parameter.
Then, the read
function returns the number of characters that it has read, or -1 in case of error. When the read
function reaches the end of the file, it will naturally return 0.
So the read
function stops reading when it reaches the number of characters we indicated, or the end of the file ( EOF).
Let’s try to open a file in read only mode and read 100 characters at a time until the end of the file:
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#define BUFFER_SIZE 100
int main(void)
{
char buf[BUFFER_SIZE + 1]; // stores the characters read
int fd; // file descriptor to read
int nb_read; // stores read's return value
int count; // counts the number of reads
// Open the cat.txt file in read only mode
fd = open("cat.txt", O_RDONLY);
if (fd == -1)
return (1);
// Initialize the count variables
nb_read = -1;
count = 0;
// Loop as long as read does not return 0 (which would mean that
// there is nothing more to read in the file)
while (nb_read != 0)
{
// Read 100 characters with read from the
// opened file descriptor
nb_read = read(fd, buf, BUFFER_SIZE);
// Stop everything if read encounters an error
if (nb_read == -1)
{
printf("Read error!\n");
return (1);
}
// Read does not add the terminating \0
// We can use the number of read characters as the index
// of the last character in the string
buf[nb_read] = '\0';
// Print the buffer contents after read
printf("\e[36m%d : [\e[0m%s\e[36m]\e[0m\n", count, buf);
count++;
}
// Close the opened file descriptor
close(fd);
return (0);
}
The cat.txt
file will contain an extract from Wikipedia:
The cat (Felis catus) is a domestic species of small carnivorous mammal. It is the only domesticated species in the family Felidae and is commonly referred to as the domestic cat or house cat to distinguish it from the wild members of the family. A cat can either be a house cat, a farm cat, or a feral cat; the latter ranges freely and avoids human contact. Domestic cats are valued by humans for companionship and their ability to kill rodents. About 60 cat breeds are recognized by various cat registries.
Once executed, this code outputs:
The read
function seems to have an integrated bookmark! At each call, read
resumes where it left off last time. In truth, the read function does not remember its last position in the file itself: it only increments the file descriptor’s offset.
A File Descriptor’s Offset
As we may have noticed in the diagram at the beginning of this article, the references in the common table of open files contain an offset. The offset represents the current number of bytes from the beginning of the file, which gives us the current position in the file. This is what the read
function increments at the end of its execution.
So when we open a file, the offset is typically 0, which places us at the beginning of the file. When we read, let’s say 12 characters, the offset is set to 12. The next time we access the file descriptor to read or even write, we will start from the current offset from the beginning of the file, in this case, the 13th character.
Both the read
and write
system calls are affected by the file descriptor’s offset.
So how can we alter this offset? How do we return to the beginning of the file once we’ve read from it?
Resetting the File Descriptor Offset with a New File Descriptor
The simplest solution might be to open the same file again with the open
system call. This creates a new entry in the system’s open file table, with an offset of 0 by default.
Let’s test this idea by modifying our previous program. We will open the same file twice to have two references to it, then we will read from the first file descriptor and then the second:
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#define BUFFER_SIZE 100
void read_and_print_100(int fd)
{
char buf[BUFFER_SIZE + 1]; // store the read characters
int nb_read; // store read's return value
// Read 100 characters from the file descriptor
nb_read = read(fd, buf, BUFFER_SIZE);
// Stop everything in case of error
if (nb_read == -1)
{
printf("Read error!\n");
return ;
}
// Add \0 to the end of the buffer
buf[nb_read] = '\0';
// Print the file descriptor and then the buffer contents
printf("\e[36mfd %d : [\e[0m%s\e[36m]\e[0m\n", fd, buf);
return ;
}
int main(void)
{
int fd1; // file descriptor 1 (cat.txt)
int fd2; // file descriptor 2 (cat.txt)
// Open the cat.txt file in read only mode
fd1 = open("cat.txt", O_RDONLY);
// Open the cat.txt file in read only mode again
fd2 = open("cat.txt", O_RDONLY);
if (fd1 == -1 || fd2 == -1)
return (1);
// Print the first 100 characters from descriptor 1
read_and_print_100(fd1);
// Print the next 100 characters from descriptor 1
read_and_print_100(fd1);
// Print the first 100 characters from descriptor 2
read_and_print_100(fd2);
// Close opened file descriptors
close(fd1);
close(fd2);
return (0);
}
As this result shows, we have two file descriptors, 3 and 4, which both refer to the cat.txt
file. We read 100 characters twice from fd 3, which means the offset is now 200. Then, when we read from fd 4, which we haven’t touched until now, we get the 100 first characters in the file. This means that even though they refer to the same file, both file descriptors are distinct: increasing the offset of one does not affect the other.
Adjusting the File Descriptor Offset with Lseek
There is another option to set a file descriptor’s offset to the position we want. Its the lseek
function of the <unistd.h>
library. It allows us a much finer control over our position in the file. Its prototype is:
off_t lseek(int fd, off_t offset, int whence);
Let’s take a closer look at its parameters:
- fd: the file descriptor for which to change the offset,
- offset: the number of bytes to move the offset,
- whence: the position from which to apply the offset. Possible options are:
SEEK_SET
: apply the offset relative to the beginning of the file (overwrites the offset with the one specified in the previous parameter),SEEK_CUR
: apply the offset relative to the current position in the file (add the offset specified in the previous parameter to the current offset),SEEK_END
: apply the offset relative to the end of the file (add the offset to the end of the file).
We must be careful with lseek
since it allows us to set our offset past the end of the file! If we write past the end of the file, the file’s size will not change and it will create “holes”, zones filled with \0
in the middle of the file. See the manual page for lseek for more information on this topic.
If lseek
succeeds, it returns the new offset, meaning the total number of bytes from the beginning of the file. On failure, it returns -1 and sets errno to indicate the error code.
So, let’s open our file again to read 100 characters, then let’s experiment with lseek
to change the offset between two reads:
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#define BUFFER_SIZE
void read_and_print_100(int fd)
{
char buf[BUFFER_SIZE + 1]; // store the read characters
int nb_read; // store read's return value
// Read 100 characters from the file descriptor
nb_read = read(fd, buf, BUFFER_SIZE);
// Stop everything in case of error
if (nb_read == -1)
{
printf("Read error!\n");
return ;
}
// Add \0 to the end of the buffer
buf[nb_read] = '\0';
// Print the file descriptor and then the buffer contents
printf("\e[36mfd %d : [\e[0m%s\e[36m]\e[0m\n", fd, buf);
return ;
}
int main(void)
{
int fd; // file descriptor to read
// Open the cat.txt file in read only mode
fd = open("cat.txt", O_RDONLY);
if (fd == -1)
return (1);
// Print the first 100 characters from the file descriptor
read_and_print_100(fd);
// Reset offset to the beginning of the file
lseek(fd, 0, SEEK_SET);
read_and_print_100(fd);
// Set the offset to 4 characters from the beginning of the file
lseek(fd, 4, SEEK_SET);
read_and_print_100(fd);
// Set the offset to 1 character from current position
lseek(fd, 1, SEEK_CUR);
read_and_print_100(fd);
// Set offset to be at the end of the file
lseek(fd, 0, SEEK_END);
read_and_print_100(fd);
// Close opened file descriptor
close(fd);
return (0);
}
Duplicating File Descriptors with dup/dup2
It can sometimes be useful to duplicate a file descriptor in order to save it as a backup or replace another one. This is the case, for example, in the context of a standard input or output redirection towards a file.
The dup
and dup2
system calls from the <unistd.h>
library enable us to duplicate a file descriptor. Here are their prototypes:
int dup(int oldfd);
int dup2(int oldfd, int newfd);
Both take the file descriptor we want to duplicate ( oldfd) as a parameter and return the new file descriptor, or -1 in case of error. The difference between the two is that dup
automatically chooses the smallest unused number for the new file descriptor, whereas with dup2
, we can specify which number we want ( newfd).
We have to keep in mind that dup2
is going to try to close the newfd if it is in use before transforming it into a copy of oldfd. However, if oldfd is not a valid file descriptor, the call will fail and newfd will not be closed. If oldfd and newfd are identical and valid, dup2
will just return the newfd without doing anything further.
The Interchangeability of Duplicated Descriptors
After a successful call to dup
or dup2
, the old and new file descriptors are interchangeable. They refer to the exact same open file and share its offset and other attributes. For example, if we read the first few characters of a file with the read
function with one of the descriptors, the offset will be changed for both file descriptors, not only the one we read with.
Yet we previously saw that if we opened the same file twice, the two descriptors did not share their offsets in this way. So why does it work differently for duplicated file descriptors? The reason for this is best illustrated with a diagram:
A file descriptor opened with open
has its own entry in the system’s open files table, with its own offset. However, a duplicated file descriptor shares the open files table entry with its clone, which means they also share an offset.
Usage Example of dup/dup2
In the following example, we will open a alpha.txt
file, which just contains the alphabet, and then duplicate the open file descriptor with dup
. We will read from the file descriptor, and then from the file descriptor’s copy. After that, we will once again duplicate the file descriptor, this time on the standard input (fd 0) with dup2
. Finally, we will create an infinite loop so we can have the time to go check the open file descriptors in our process.
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
int main(void)
{
int fd;
int fd_copy;
int nb_read;
char buf[15];
// Print this process' PID to be able to check open fds
printf("My PID is %d\n", getpid());
// Open the file and get its file descriptor
fd = open("alpha.txt", O_RDONLY);
if (fd == -1)
return (1);
// Duplicate the fd on the smallest unused fd
fd_copy = dup(fd);
if (!fd_copy)
return (1);
// Read the original fd
nb_read = read(fd, buf, 10);
if (nb_read == -1)
return (1);
buf[nb_read] = '\0';
printf("fd %d contains : %s\n", fd, buf);
// Read the duplicated fd
nb_read = read(fd_copy, buf, 10);
if (nb_read == -1)
return (0);
buf[nb_read] = '\0';
printf("fd %d contains : %s\n", fd_copy, buf);
// Duplicate fd on standard input
dup2(fd, STDIN_FILENO);
// Infinite loop to go check open fds in this
// process with ls -la /proc/PID/fd
while (1)
;
}
The alpha.txt
file contains:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
Here we can see that reading from the first file descriptor also changed its copy’s offset. Before killing the process with ctrl-c
, we can go open another terminal to see our process’s list of open file descriptors with this command:
$ ls -la /proc/PID/fd
We can see that fd 3 and fd 4 are both indeed referring to our alpha.txt
file, but that’s not all! The standard input (fd 0) was also redirected to our file thanks to dup2
.
A little tip to share, a nagging question to ask, or a strange discovery to discuss about file and file descriptor handling in C? I’d love to read and respond to it all in the comments. Happy coding !
Sources and Further Reading
- Linux Programmer’s Manual:
- Bryant, R. E., O’Hallaron, D. R., 2016, Computer Systems: A Programmer’s Perspective, Chapter 10: System-Level I/O, pp. 925-949
- Kernighan, B. W., Ritchie, D. M., 1988, The C Programming Language, Second Edition, Chapter 8 - The UNIX System Interface, pp. 169-175
- Simard, E., Everything You Need to Know About inodes in Linux [linuxhandbook.com]