Home arrow BrainDump arrow Advising the Linux Kernel on File I/O

Advising the Linux Kernel on File I/O

In this fifth part to a seven-part series on Linux I/O file system calls, you'll learn how to give advice to the Linux kernel, and more. This article is excerpted from chapter four of the book Linux System Programming: Talking Directly to the Kernel and C Library, written by Robert Love (O'Reilly, 2007; ISBN: 0596009585). Copyright © 2007 O'Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O'Reilly Media.

  1. Advising the Linux Kernel on File I/O
  2. Advice Is Cheap
  3. Synchronized, Synchronous, and Asynchronous Operations
  4. Asynchronous I/O
By: O'Reilly Media
Rating: starstarstarstarstar / 1
December 24, 2008

print this article



Advice for Normal File I/O

In the previous subsection, we looked at providing advice on memory mappings. In this section, we will look at providing advice to the kernel on normal file I/O. Linux provides two interfaces for such advice-giving: posix_fadvise() and readahead().

The posix_fadvise( ) System Call

The first advice interface, as its name alludes, is standardized by POSIX 1003.1-2003:

  #include <fcntl.h>

  int posix_fadvise (int fd,
off_t offset,
off_t len,
int advice);

A call to posix_fadvise() provides the kernel with the hint advice on the file descriptor fd in the interval[offset,offset+len). Iflenis0, the advice will apply to the range[offset,length of file]. Common usage is to specify0forlenandoffset, applying the advice to the entire file.

The availableadvice options are similar to those formadvise(). Exactly one of the following should be provided foradvice:

   The application has no specific advice to give on this
   range of the file. It should be treated as normal.

   The application intends to access the data in the
   specified range in a random (nonsequential) order.

   The application intends to access the data in the
   specified range sequentially, from lower to higher

   The application intends to access the data in the
   specified range in the near future.

   The application intends to access the data in the
   specified range in the near future, but only once.

   The application does not intend to access the pages
   in the specified range in the near future.

As withmadvise(), the actual response to the given advice is implementation-specific—even different versions of the Linux kernel may react dissimilarly. The following are the current responses:

   The kernel behaves as usual, performing a moderate
   amount of readahead.

   The kernel disables readahead, reading only the
   minimal amount of data on each physical read

   The kernel performs aggressive readahead, doubling
   the size of the readahead window.

   The kernel initiates readahead to begin reading into
   memory the given pages.

   Currently, the behavior is the same as for
   POSIX_FADV_WILLNEED; future kernels may perform
   an additional optimization to exploit the “use once”
   behavior. This hint does not have anmadvise()

   The kernel evicts any cached data in the given range
   from the page cache. Note that this hint, unlike the
   others, is different in behavior from itsmadvise()

As an example, the following snippet instructs the kernel that the entire file represented by the file descriptorfdwill be accessed in a random, nonsequential manner:

  int ret;

  ret = posix_fadvise (fd, 0, 0, POSIX_FADV_RANDOM);
  if (ret == -1)
perror ("posix_fadvise");

On success, posix_fadvise() returns 0. On failure,
-1is returned, anderrnois set to one of the following values:

The given file descriptor is invalid.

   The given advice is invalid, the given file descriptor
   refers to a pipe, or the specified advice cannot be
   applied to the given file.

The readahead( ) System Call

The posix_fadvise() system call is new to the 2.6 Linux kernel. Before, the readahead() system call was available to provide behavior identical to the POSIX_FADV_WILLNEEDhint. Unlikeposix_fadvise(),readahead()is a Linux-specific interface:

  #include <fcntl.h>

  ssize_t readahead (int fd,
off64_t offset,
size_t count);

A call to readahead() populates the page cache with the region [offset,offset+count) from the file descriptor fd.

On success, readahead() returns 0. On failure, it returns -1, anderrnois set to one of the following values:

   The given file descriptor is invalid.

   The given file descriptor does not map to a file that
   supports readahead.

>>> More BrainDump Articles          >>> More By O'Reilly Media

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort


- Apple Founder Steve Jobs Dies
- Steve Jobs` Era at Apple Ends
- Google's Chrome Developer Tool Updated
- Google's Chrome 6 Browser Brings Speed to th...
- New Open Source Update Fedora 13 is Released...
- Install Linux with Knoppix
- iPad Developers Flock To SDK 3.2
- Managing a Linux Wireless Access Point
- Maintaining a Linux Wireless Access Point
- Securing a Linux Wireless Access Point
- Configuring a Linux Wireless Access Point
- Building a Linux Wireless Access Point
- Migrating Oracle to PostgreSQL with Enterpri...
- Demystifying SELinux on Kernel 2.6
- Yahoo and Microsoft Create Ad Partnership

Developer Shed Affiliates


Dev Shed Tutorial Topics: