BrainDump
  Home arrow BrainDump arrow Page 2 - Configuring and Optimizing Your I/O Scheduler
Dev Shed Forums  
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Smartphone Development  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Mobile Linux  
App Generation ROI  
IBM® developerWorks  
Forums Sitemap  
E-Commerce Hosting  
Linux Web Hosting  
Managed Hosting  
Small Business Hosting  
VPS Hosting  
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid  
Request Media Kit
Contact Us  
Site Map  
Privacy Policy  
Support  
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
BRAINDUMP

Configuring and Optimizing Your I/O Scheduler
By: O'Reilly Media
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: starstarstarstarstar / 4
    2009-01-08


    Table of Contents:
  • Configuring and Optimizing Your I/O Scheduler
  • Scheduling I/O in user space
  • Conclusion

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      error-file:tidyout.log Del.ici.ous error-file:tidyout.log Digg
      error-file:tidyout.log Blink error-file:tidyout.log Simpy
      error-file:tidyout.log Google error-file:tidyout.log Spurl
      error-file:tidyout.log Y! MyWeb error-file:tidyout.log Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article

     
     
    ADVERTISEMENT


    Configuring and Optimizing Your I/O Scheduler - Scheduling I/O in user space
    ( Page 2 of 3 )

    I/O-intensive applications that issue a large number of I/O requests and need to extract every ounce of performance can sort and merge their pending I/O requests, performing the same duties as the Linux I/O scheduler. *

    Why perform the same work twice, if you know the I/O scheduler will sort requests block-wise, minimizing seeks, and allowing the disk head to move in a smooth, linear fashion? Consider an application that submits a large number of unsorted I/O requests. These requests arrive in the I/O scheduler’s queue in a generally random order. The I/O scheduler does its job, sorting and merging the requests before sending them out to the disk—but the requests start hitting the disk while the application is still generating I/O and submitting requests. The I/O scheduler is able to sort only a small set of requests—say, a handful from this application, and whatever other requests are pending—at a time. Each batch of the application’s requests is neatly sorted, but the full queue, and any future requests are not part of the equation.

    Therefore, if an application is generating many requests—particularly if they are for data all over the disk—it can benefit from sorting the requests before submitting them, ensuring they reach the I/O scheduler in the desired order.

    A user-space application is not bestowed with access to the same information as the kernel, however. At the lowest levels inside the I/O scheduler, requests are already specified in terms of physical disk blocks. Sorting them is trivial. But, in user space, requests are specified in terms of files and offsets. User-space applications must probe for information, and make educated guesses about the layout of the filesystem.

    Given the goal of determining the most seek-friendly ordering given a list of I/O requests to specific files, user-space applications have a couple of options. They can sort based on:

    1. The full path
    2. The inode number
    3. The physical disk block of the file

    Each of these options involves a tradeoff. Let’s look at each briefly.

    Sorting by path. Sorting by the pathname is the easiest, yet least effective, way of approximating a block-wise sort. Due to the layout algorithms used by most filesystems, the files in each directory—and thus the directories sharing a parent directory—tend to be adjacent on disk. The probability that files in the same directory were created around the same time only amplifies this characteristic.

    Sorting by path, therefore, roughly approximates the physical locations of files on the disk. It is definitely true that two files in the same directory have a better chance of being located near each other than two files in radically different parts of the filesystem. The downside of this approach is that it fails to take into account fragmentation: the more fragmented the filesystem, the less useful is sorting by path. Even ignoring fragmentation, a path-wise sort only approximates the actual block-wise ordering. On the upside, a path-wise sort is at least somewhat applicable to all filesystems. No matter the approach to file layout, temporal locality suggests a path-wise sort will be at least mildly accurate. It is also an easy sort to perform.

    Sorting by inode. Inodes are Unix constructs that contain the metadata associated with individual files. While a file’s data may consume multiple physical disk blocks, each file has exactly one inode, which contains information such as the file’s size, permissions, owner, and so on. We will discuss inodes in depth in Chapter 7. For now, you need to know two facts: that every file has an inode associated with it, and that the inodes are assigned unique numbers.

    Sorting by inode is better than sorting by path, assuming that this relation:

      file i's inode number < file j's inode number

    implies, in general, that:

      physical blocks of file i < physical blocks of file j

    This is certainly true for Unix-style filesystems such as ext2 and ext3. Anything is possible for filesystems that do not employ actual inodes, but the inode number (whatever it may map to) is still a good first-order approximation.

    Obtaining the inode number is done via the stat() system call, also discussed in Chapter 7. Given the inode associated with the file involved in each I/O request, the requests can be sorted in ascending order by inode number.

    Here is a simple program that prints out the inode number of a given file:

      #include <stdio.h >
      #include <stdlib.h>
      #include <fcntl.h>
      #include <sys/types.h>
      #include <sys/stat.h>

      /*
      
    * get_inode - returns the inode of the file associated
      
    * with the given file descriptor, or -1 on failure
      
    */
      int get_inode (int fd)
      {
             
    struct stat buf;
              int ret;

              ret = fstat (fd, &buf);
              if (ret < 0) {
                     
    perror ("fstat");
                     
    return -1;
             
    }

              return buf.st_ino;
      }

      int main (int argc, char *argv[])
      {
             
    int fd, inode;

              if (argc < 2) {
                      
    fprintf (stderr, "usage: %s <file>\n", argv[0]);
                     
    return 1;
             
    }

              fd = open (argv[1], O_RDONLY);
             
    if (fd < 0) {
                     
    perror ("open");
                     
    return 1;
             
    }

              inode = get_inode (fd);
              printf ("%d\n", inode);

              return 0;
      }

    The get_inode() function is easily adaptable for use in your programs.

    Sorting by inode number has a few upsides: the inode number is easy to obtain, is easy to sort on, and is a good approximation of the physical file layout. The major downsides are that fragmentation degrades the approximation, that the approximation is just a guess, and that the approximation is less accurate for non-Unix filesystems. Nonetheless, this is the most commonly used method for scheduling I/O requests in user space.

    Sorting by physical block. The best approach to designing your own elevator algorithm, of course, is to sort by physical disk block. As discussed earlier, each file is broken up into logical blocks, which are the smallest allocation units of a filesystem. The size of a logical block is filesystem-dependent; each logical block maps to a single physical block. We can thus find the number of logical blocks in a file, determine what
    physi cal blocks they map to, and sort based on that.

    The kernel provides a method for obtaining the physical disk block from the logical block number of a file. This is done via the ioctl() system call, discussed in Chapter 7, with the FIBMAP command:

      ret = ioctl (fd, FIBMAP, &block) ;
      if (ret < 0)
             
    perror ("ioctl");

    Here, fd is the file descriptor of the file in question, and block is the logical block whose physical block we want to determine. On successful return, block is replaced with the physical block number. The logical blocks passed in are zero-indexed and file-relative. That is, if a file is made up of eight logical blocks, valid values are 0 through 7.

    Finding the logical-to-physical-block mapping is thus a two-step process. First, we must determine the number of blocks in a given file. This is done via the stat()
    sys tem call. Second, for each logical block, we must issue an ioctl() request to find the corresponding physical block.

    Here is a sample program to do just that for a file passed in on the command line:

      #include <stdio.h>
      #include <stdlib.h>
      #include <fcntl.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <sys/ioctl.h>
      #include <linux/fs.h>

      /*
      
    * get_block - for the file associated with the given fd, returns
      
    * the physical block mapping to logical_block
      
    */
      int get_block (int fd, int logical_block)
      {
             
    int ret;

              ret = ioctl (fd, FIBMAP, &logical_block);
             
    if (ret < 0) {
                     
    perror ("ioctl");
                     
    return -1;
             
    }

              return logical_block;
      }

      /*
      
    * get_nr_blocks - returns the number of logical blocks
      
    * consumed by the file associated with fd
       */
      int get_nr_blocks (int fd)
      {
             
    struct stat buf;
              int ret;

              ret = fstat (fd, &buf);
             
    if (ret < 0) {
                     
    perror ("fstat");
                     
    return -1;
             
    }

              return buf.st_blocks;
      }

      /*
      
    * print_blocks - for each logical block consumed by the file
      
    * associated with fd, prints to standard out the tuple
      
    * "(logical block, physical block)"
      
    */
      void print_blocks (int fd)
      {
             
    int nr_blocks, i;

              nr_blocks = get_nr_blocks (fd);
             
    if (nr_blocks < 0) {
                      fprintf (stderr, "get_nr_blocks failed!\n");
                      return;
             
    }

              if (nr_blocks == 0) {
                      printf ("no allocated blocks\n");
                      return;
             
    } else if (nr_blocks == 1)
                      printf ("1 block\n\n");
              else
                      printf ("%d blocks\n\n", nr_blocks);

              for (i = 0; i < nr_blocks; i++) {
                      int phys_block;

                      phys_block = get_block (fd, i);
                     
    if (phys_block < 0) {
                              fprintf (stderr, "get_block failed!\n");
                              return;
                     
    }
                      if (!phys_block)
                              continue;

                      printf ("(%u, %u) ", i, phys_block);
              }

              putchar ('\n');
      }

      int main (int argc, char *argv[])
      {
              int fd;

              if (argc < 2) {
                      fprintf (stderr, "usage: %s <file>\n", argv[0]);
                      return 1;
             
    }

              fd = open (argv[1], O_RDONLY);
              if (fd < 0) {
                     
    perror ("open");
                     
    return 1;
              }

              print_blocks (fd);

              return 0;
      }

    Because files tend to be contiguous, and it would be difficult (at best) to sort our I/O requests on a per-logical-block basis, it makes sense to sort based on the location of just the first logical block of a given file. Consequently, get_nr_blocks() is not needed, and our applications can sort based on the return value from:

      get_block (fd, 0);

    The downside of FIBMAP is that it requires the CAP_SYS_RAWIO capability—effectively, root privileges. Consequently, nonroot applications cannot make use of this approach. Further, while the FIBMAP command is standardized, its actual implemen tation is left up to the filesystems. While common systems such as ext2 and ext3 support it, a more esoteric beast may not. The ioctl() call will return EINVAL if FIBMAP is not supported.

    Among the pros of this approach, however, is that it returns the actual physical disk block at which a file resides, which is exactly what you want to sort on. Even if you sort all I/O to a single file based on the location of just one block (the kernel’s I/O scheduler sorts each individual request on a block-wise basis), this approach comes very close to the optimal ordering. The root requirement, however, is a bit of a nonstarter for many.



     
     
    >>> More BrainDump Articles          >>> More By O'Reilly Media
     

       

    BRAINDUMP ARTICLES

    - Demystifying SELinux on Kernel 2.6
    - Yahoo and Microsoft Create Ad Partnership
    - The Advantages of Obscure Open Source Browse...
    - Dell Announces CSI-style Digital Forensics S...
    - Milepost GCC Speeds Open-Source Development
    - Learn These 10 Programming Languages
    - Tomcat Capacity Planning
    - Internal and External Performance Tuning wit...
    - Tomcat Benchmark Procedure
    - Benchmarking Tomcat Performance
    - Tomcat Performance Tuning
    - Wubi: Windows-based Ubuntu Installer
    - Configuring and Optimizing Your I/O Scheduler
    - Linux I/O Schedulers
    - Advising the Linux Kernel on File I/O





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 5 Hosted by Hostway
    Stay green...Green IT