Home arrow BrainDump arrow Page 5 - Using mmap() for Advanced File I/O

Giving Advice on a Mapping - BrainDump

In this fourth part of a seven-part series on Linux I/O file system calls, you will learn how to use mmap(). It is excerpted from chapter four of the book Linux System Programming: Talking Directly to the Kernel and C Library, written by Robert Love (O'Reilly, 2007; ISBN: 0596009585). Copyright © 2007 O'Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O'Reilly Media.

  1. Using mmap() for Advanced File I/O
  2. Resizing a Mapping
  3. Changing the Protection of a Mapping
  4. Synchronizing a File with a Mapping
  5. Giving Advice on a Mapping
By: O'Reilly Media
Rating: starstarstarstarstar / 5
December 18, 2008

print this article



Linux provides a system call named madvise() to let processes give the kernel advice and hints on how they intend to use a mapping. The kernel can then optimize its behavior to take advantage of the mapping’s intended use. While the Linux kernel dynamically tunes its behavior, and generally provides optimal performance without explicit advice, providing such advice can ensure the desired caching and readahead behavior for some workloads.

A call tomadvise()advises the kernel on how to behave with respect to the pages in the memory map starting ataddr, and extending forlenbytes:

  #include <sys/mman.h>

  int madvise (void *addr,
size_t len,
int advice);

If len is 0, the kernel will apply the advice to the entire mapping that starts at addr. The parameteradvice delineates the advice, which can be one of:

   The application has no specific advice to give on this
   range of memory. It should be treated as normal.

   The application intends to access the pages in the
   specified range in a random (nonsequential) order.

   The application intends to access the pages in the
   specified range sequentially, from lower to higher

   The application intends to access the pages in the
   specified range in the near future.

   The application does not intend to access the pages
   in the specified range in the near future.

The actual behavior modifications that the kernel takes in response to this advice are implementation-specific: POSIX dictates only the meaning of the advice, not any potential consequences. The current 2.6 kernel behaves as follows in response to theadvicevalues:

   The kernel behaves as usual, performing a moderate
   amount of readahead.

   The kernel disables readahead, reading only the 
   minimal amount of data on each physical read

   The kernel performs aggressive readahead.

   The kernel initiates readahead, reading the given
   pages into memory.

   The kernel frees any resources associated with the
   given pages, and discards any dirty and not-yet-
   synchronized pages. Subsequent accesses to the
   mapped data will cause the data to be paged in from
   the backing file.

Typical usage is:

  int ret;

  ret = madvise (addr, len, MADV_SEQUENTIAL);
  if (ret < 0)
perror ("madvise");

This call instructs the kernel that the process intends to access the memory region[addr,addr+len)sequentially.


When the Linux kernel reads files off the disk, it performs an optimization known as readahead. That is, when a request is made for a given chunk of a file, the kernel also reads the following chunk of the file. If a request is subsequently made for that chunk—as is the case when reading a file sequentially—the kernel can return the requested data immediately. Because disks have track buffers (basically, hard disks perform their own readahead internally), and because files are generally laid out sequentially on disk, this optimization is low-cost.

Some readahead is usually advantageous, but optimal results depend on the question of how much readahead to perform. A sequentially accessed file may benefit from a larger readahead window, while a randomly accessed file may find readahead to be worthless overhead.

As discussed in “Kernel Internals” in Chapter 2, the kernel dynamically tunes the size of the readahead window in response to the hit rate inside that window. More hits imply that a larger window would be advantageous; fewer hits suggest a smaller window. Themadvise()system call allows applications to influence the window size right off the bat.

On success, madvise() returns 0. On failure, it returns -1, anderrnois set appropriately. The following are valid errors:

   An internal kernel resource (probably memory) was
   unavailable. The process can try again.

   The region exists, but does not map a file.

   The parameterlenis negative,addris not page-
   aligned, theadviceparameter is invalid, or the
   pages were locked or shared withMADV_DONTNEED.

   An internal I/O error occurred withMADV_WILLNEED.

   The given region is not a valid mapping in this
   process’ address space, or MADV_WILLNEEDwas
   given, but there is insufficient memory to page in the
   given regions.

Please check back next week for the continuation of this article. 

>>> More BrainDump Articles          >>> More By O'Reilly Media

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort


- Apple Founder Steve Jobs Dies
- Steve Jobs` Era at Apple Ends
- Google's Chrome Developer Tool Updated
- Google's Chrome 6 Browser Brings Speed to th...
- New Open Source Update Fedora 13 is Released...
- Install Linux with Knoppix
- iPad Developers Flock To SDK 3.2
- Managing a Linux Wireless Access Point
- Maintaining a Linux Wireless Access Point
- Securing a Linux Wireless Access Point
- Configuring a Linux Wireless Access Point
- Building a Linux Wireless Access Point
- Migrating Oracle to PostgreSQL with Enterpri...
- Demystifying SELinux on Kernel 2.6
- Yahoo and Microsoft Create Ad Partnership

Developer Shed Affiliates


Dev Shed Tutorial Topics: