1. Overview
- fd (file descriptor) is a positive integer, represents the number of opened files
- When a program starts, by default it’ll open 3 files in
/dev
fd = 0 : represents for stdin : keyboard
fd = 1 : represents for stdout : srceen
fd = 2 : represents for stderr : srceen
- If we open a new file, we’ll get the fd = 3, the next file will have fd = 4 and so on
#include <iostream>
#include <sys/types.h> // for using O_XXX
#include <sys/stat.h> // -- same --
#include <fcntl.h> // -- same --
#include <unistd.h> // for using open(), write(), close(), fsync()
#include <string.h> // for using strlen()
int main()
{
int fd;
// O_CREAT : create file, overwrite if exist
// O_WRONLY : write-only
fd = open("mylog", O_CREAT | O_WRONLY);
printf("fd = %d\n", fd);
write(fd, "Hello World\n", strlen("Hello World\n"));
// sync data from RAM into the hard drive
//sync()
fsync(fd);
close(fd);
return 0;
}
btnguyen@DESKTOP-UAIA29B:/mnt/h/DEVELOPER/Linux/prog$ gcc test.cpp -lstdc++ -o test
btnguyen@DESKTOP-UAIA29B:/mnt/h/DEVELOPER/Linux/prog$ ./test
fd = 3
btnguyen@DESKTOP-UAIA29B:/mnt/h/DEVELOPER/Linux/prog$ cat mylog
Hello World
btnguyen@DESKTOP-UAIA29B:/mnt/h/DEVELOPER/Linux/prog$
2. File table of process
- Every cell in “file descriptor table” is a pointer points to an address on RAM of a particular file.
- By default, when the system is booted, the first 3 rows are used for
stdin
,stdout
,stderr
- When we open a new file, the system will look up a nearest empty cell which doesn’t point to any file, then it’ll open a file, store the memory address of that file into the cell, then return to us the index of the cell.
What if 2 processes open files with the same fd ? Or if 2 fd have the same value, if they point to the same file ?
- If 2 processes open a same file, it means that file has a same name
- There is no relation between file name and the number of elements in “file descriptor table”
- The number of elements in “file descriptor table” just reprents for the opening files order, does not represent for the file name
- If 2 fd = 5, then it means 2 processes open a file for the 5th time, and it’s not possible to know which file is opened at 5th, maybe they open the same file, maybe not but the fd is always 5, there will be no conflict
3. Redirect the flow of stdin, stdout, stderr
- We could use
|
to redicrect the stdin flow - We could use
>
to redicrect the stdout flow, instead of output to screen, we could output to a file
#include <iostream>
int main()
{
printf("Hello World\n");
return 0;
}
btnguyen@DESKTOP-UAIA29B:/mnt/h/DEVELOPER/Linux/prog$ gcc test.cpp -lstdc++ -o test
btnguyen@DESKTOP-UAIA29B:/mnt/h/DEVELOPER/Linux/prog$ ./test > log
btnguyen@DESKTOP-UAIA29B:/mnt/h/DEVELOPER/Linux/prog$ cat log
Hello World
btnguyen@DESKTOP-UAIA29B:/mnt/h/DEVELOPER/Linux/prog$
4. Block devices
- Most storage devices (e.g : hard drive) are of the block device type
- Block devices often specify a minimum size for each read (block). For example when we format a hard drive, it usually ask us what is the size of block (i.e : 512 bytes, 1 kb, 4 kb, …) due to the physical characteristics
How does the hard drive work ?
- The hard drive has a reading eye, a rotating disc underneath
- When we need to read a data (e.g 1 byte), it’ll determine where the data is located on the hard drive, then move the “eye reader” to that location by rotating the disk
- And the disk doen’t stop there, it maybe continues rotatings to move the “eye reader” to the new location to read the new data for the other programs.
- Because the physical rotation speed is fast, so it can’t read exactly 1 byte, but a range of bytes nearby. That’s why it specifies the minimum size of reading bytes (i.e : 512 bytes), because it’s just be able to read when at least 512 bytes once read.
What is the idea ?
- Reading from RAM is 10 times faster than the hard drive
- The idea is since it took time to move the “eye reader”, even when we just read 1 byte, in fact the OS will read a block of 512 kb nearby (called as a block or a sector) then load into RAM and return to us 1 byte, 511 bytes will be kept on RAM to be reused next time (if needed)
- Next time when we read another 1 byte or 10 bytes nearby the 1st read, it’ll look up and return directly from RAM instead of moving the “eye reader” again to that location and read a new 512 bytes which is time consuming (same for write action)
5. Asynchronus File I/O
- RAM acts as a transit place (cached memory)
- The system will use RAM as a memory cache for read/write file because read/write file directly from the hard drive is time consuming
- We could flush the cache of a file actively or passively
Passive mechanism for hardware optimization
- When we open a file on the hard drive, then write data but we can’t control that the data will be written directly to the hard drive or RAM (cache memory)
- For example, a block is specified as 512 bytes, but we write only 2 bytes, then the OS probably will wait for us to write the additional 510 bytes to fully the block, then write the whole block to the hard drive
- Periodically, the OS will sync from RAM (cache memory) into the hard drive and we can’t control this passive mechanism
Active flush cache
- We could force the OS to sync from RAM to the hard drive by using the fucntion
sync()
orfsync()
Practical examples
- We have 2 programs, program A will write into a file, and program B will periodically read from that file to process.
- Sometime even program A already wrote the data, but the program B can’t see the data because the data is not synchronized
- So we need to use
sync()
,fsync()
in the program A.
6. Useful functions
Below are the synchronous functions which means the blocking one
int open(const char* pathname, int flags);
It’ll malloc a memory for the struct file and create an inode. Every inode points to its cached memory
int close(int fd);
It’ll free the row in the File table, free the initialized memory for the struct file, and sometime sync data from RAM to the hard drive for the regular file
For a device file (
/dev
) usually it will write directly to the hardware instead of via RAM
ssize_t read(int fd, void* buf, size_t count);
ssize_t write(int fd, const void* buf, size_t count);
buf : buffer to read, it’ll init a memory, after reading it’ll copy data into that memory
count : number of bytes to read
off_t lseek(int fd, off_t offset, int whence);
When open a file, but we don’t want to read from the first 1 byte, but from the 100th byte, then we could use the
lseek()
to move the “reader” to that 100 bytes by the :
offset : the position to read (byte unit) e.g : we need to read 100th byte then offset = 100
whhence : the hook calculation, when we pass it as the end of file, it’ll move the “reader” to the end of file then read backward 100 bytes,
void fsync(int fd);
Actively force the OS to sync data from RAM into the hard drive of only the currrent file
void sync(void);
Sync all the data of all the files of all the programs on RAM to the hard drive.
Therefore when we just write the config file for just several hundred bytes to the hard drive then callingsync()
it could take minutes to be finished, which could hang the application.
7. Asynchronus File I/O
- Read/Write functions block the program until it finishes
For example, when user click a save button, then it’ll write 20 MB data into the hard drive, whick takes 10 seconds. In the meantime, the GUI will be blocked/hanged.
- To resolve it, we could use the asynchronous read/write or create a new thread to read/write a file.
int main()
{
printf("Hello World\n");
int fd = open("text_aoi.txt", O_RDONLY, 0);
if(fd == -1)
{
printf("Unable to open file !\n");
return 1;
}
// create a buffer
char* buffer = (char*)calloc(SIZE_TO_READ, 1);
// create a control block structure
struct aiocb cb;
// init the call back
memset(&cb, 0, sizeof(struct aiocb));
cb.aio_nbytes = SIZE_TO_READ;
cb.aio_fildes = fd;
cb.aio_offset = 0;
cb.aio_buf = buffer;
// it'll impliclitly create a new thread to read and jump immediately into the next code without waiting the return
if(aio_read(&cb) == -1)
{
printf("Unable to create the request!\n");
close(file);
}
// do_anything_we_want_without_waiting
printf("Request enqueued!\n");
// wait until the request has finished (to check if it is done)
while(aio_error(&cb) == EINPROGRESS)
{
printf("Working...\n");
}
// success ?
int numBytes = aio_return(&cb);
if(numBytes != -1)
{
pritnf("Success!\n");
}
return 0;
}
callback is a pointer which points to a function
when the read is done, it’ll call our registered function
8. open() vs fopen()
int open(const char *pathname, int flags, mode_t mode);
FILE *fopen(const char *path, const char *mode);
fopen
is a library function whileopen
is a system call.fopen
provides buffered IO which may be faster compared toopen
which is non-buffered.fopen
is portable whileopen
not portable (open
is environment specific).fopen
does line ending translation if the file is not opened in binary mode, which can be very helpful if your program is ever ported to a non-Unix environment (though the world appears to be converging on LF-only (except IETF text-based networking protocols like SMTP and HTTP and such)).fopen
returns a pointer to aFILE
structure (FILE *
) whileopen
returns an integer that identifies the file.- A
FILE *
gives you the ability to usefscanf
and otherstdio.h
functions. - Your code may someday need to be ported to some other platform that only supports ANSI C and does not support the
open
function.
Why fopen() is portable ?
open()
is a system call and specific to Unix-based systems and it returns a file descriptor. You can write to a file descriptor usingwrite()
which is another system callfopen()
is an ANSI C function call which returns a file pointer and it is portable to other OSes. We can write to a file pointer usingfprintf
How about fdopen(), fileno() ?
- As far as
fdopen
is concerned, if you aren't playing with file descriptors, you don't need that call. fdopen
is what you would use if you first calledopen
and then wanted aFILE *
. There is no sense doing that if you have the choice to just callfopen
insteadfdopen
converts an os-level file descriptor to the higher-level FILE-abstraction of the C language.fdopen
callsopen
in the background and gives you a FILE-pointer directly.- In Unix, we can get a file pointer from the file descriptor using
fP = fdopen(fD, "a");
- In Unix, we can get a file descriptor from the file pointer using
fD = fileno (fP);
What is recommened ?
fopen
and its family of methods (fwrite
,fread
,fprintf
,fscanf
,fget
…)
9. Advanced functions
9.1 Read and write file properties
int stat(const char* restrict_pathname, struct stat* restrict_buf);
int chmod(const char* pathname, mode_t mode);
int chown(const char* pathname, uid_t owner, gid_t group);
int link(const char* existingpath, const char* newpath);
9.2 Manipulate directories
int mkdir(const char* pathname, mode_t mode);
DIR* opendir(const char* pathname);
//open and read a folder then return the info (i.e: name, size, modify time) of files in that folder with the fixed path in the code
struct dirent* readdir(DIR* dp);
int closedir(DIR* dp);