Practices
  Home arrow Practices arrow Page 2 - More Techniques for Finding Things
Dev Shed Forums 
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Forums Sitemap 
IBM® developerWorks 
Sun Developer Network 
E-Commerce Hosting 
Linux Web Hosting 
Managed Hosting 
Small Business Hosting 
Mobile Linux 
App Generation ROI 
VPS Hosting 
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
PRACTICES

More Techniques for Finding Things
By: O'Reilly Media
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 3
    2008-07-17

    Table of Contents:
  • More Techniques for Finding Things
  • Binary Search
  • Binary Search Trade-offs
  • Escaping the Loop
  • Searching the Web

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    More Techniques for Finding Things - Binary Search


    (Page 2 of 5 )

    Nobody gets a Computer Science degree without studying a wide variety of search algorithms: trees, heaps, hashes, lists, and more. My favorite among all these is binary search. Let’s try it on the who-fetched-what-when problem and then look at what makes it beautiful.

    My first attempt at putting binary search to use was quite disappointing; while the data took 10 minutes less to load, it required almost 100 MB more memory than with the hash. Clearly, there are some surprising things about the Ruby array implementation. The search also ran several times slower (but still in the range of thousands per second), but this is not surprising at all because the algorithm is running in Ruby code rather than with the underlying hardcoded hash implementation.

    The problem is that in Ruby everything is an object, and arrays are fairly abstracted things with lots of built-in magic. So, let’s reimplement the program in Java, in which integers are just integers, and arrays come with very few extras.*

    Nothing could be simpler, conceptually, than binary search. You divide your search space in two and see whether you should be looking in the top or bottom half; then you repeat the exercise until done. Instructively, there are a great many ways to code this algorithm incorrectly, and several widely published versions contain bugs. The implementation mentioned in “On the Goodness of Binary Search,” and shown in Java in Example 4-6, is based on one I learned from Gaston Gonnet, the lead developer of the Maple language for symbolic mathematics and currently Professor of Computer Science at ETH in Zürich.

    EXAMPLE 4-6. Binary search

    1 package binary;
    2
    3  public class Finder {
    4    public static int find(String[] keys, String target) {
    5      int high = keys.length;
    6      int low = -1;
    7      while (high - low > 1) {
    8        int probe = (low + high) >>> 1;
    9        if (keys[probe].compareTo(target) > 0)
    10         high = probe;
    11       else
    12         low = probe;
    13     }
    14     if (low == -1 || keys[low].compareTo(target) != 0)
    15       return -1;
    16     else
    17       return low;
    18   }
    19 }

    Key aspects of this program are as follows:

    • In lines 5–6, note that thehighandlowbounds are set one off the ends of the array, so neither are initially valid indices. This eliminates all sorts of corner cases.
    • The loop that starts in line 7 runs until the high and low bounds are adjacent; there is no testing to see whether the target has been found. Think for a minute whether you agree with this choice; we’ll return to the question later.

      The loop has two invariants.lowis either –1 or points to something less than or equal to the target value.high is either one off the top of the array or points to something strictly greater than the target value.
    • Line 8 is particularly interesting. In an earlier version it read:

        probe = (high + low) / 2;

      but in June 2006, Java guru Josh Bloch showed how, in certain obscure circumstances, that code could lead to integer overflow (see http://googleresearch.blogspot.com/2006/06/ extra-extra-read-all-about-it-nearly.html). It is sobering indeed that, many decades into the lifetime of computer science, we are still finding bugs in our core algorithms. (The issue is also discussed by Alberto Savoia in Chapter7.)

      At this point, Rubyists will point out that modern dynamic languages such as Ruby and Python take care of integer overflow for you, and thus don’t have this bug. 
    • Because of the loop invariant, once I’m done with the loop, I just need to checklow(lines 14–17). If it’s not –1, either it points to something that matches the target, or the target isn’t there.

    The Java version took only six and a half minutes to load, and it ran successfully, using less than 1 GB of heap. Also, while it’s harder to measure CPU time in Java than in Ruby, there was no perceptible delay in running the same 2,000 searches.

    More Practices Articles
    More By O'Reilly Media


       · This article is an excerpt from the book "Beautiful Code: Leading Programmers...
     

    Buy this book now. This article is excerpted from chapter four of Beautiful Code: Leading Programmers Explain How They Think, written by Andy Oram and Greg Wilson (O'Reilly, 2007; ISBN: 0596510047). Check it out today at your favorite bookstore. Buy this book now.

       

    PRACTICES ARTICLES

    - More Techniques for Finding Things
    - Finding Things
    - Finishing the System`s Outlines
    - The System in So Many Words
    - Basic Data Types and Calculations
    - What`s the Address? Pointers
    - Design with ArgoUML
    - Pragmatic Guidelines: Diagrams That Work
    - Five-Step UML: OOAD for Short Attention Span...
    - Five-Step UML: OOAD for Short Attention Span...
    - Introducing UML: Object-Oriented Analysis an...
    - Class and Object Diagrams
    - Class Relationships
    - Classes
    - Basic Ideas

     
    Application Delivery: Everything You Wanted to Know, but Didn`t Know You Needed to Ask
    A comprehensive guide to examining the topics of Wide-area Data Services and app....

     
    Best Practices: Safe and Secure Hardware Asset Recovery
    Companies increasingly must meet EPA and local requirements for the disposal of ....

     
    Managing SSL Security in Multi-Server Environments
    Read this white paper to learn how to simplify management of your organization's....

     
    Open Source Security Myths
    Open Source Software (OSS) is computer software whose source code is available t....

     
    Power and Cooling Capacity Management for Data Centers
    This paper describes the principles for achieving power and cooling capacity man....

     




    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 3 hosted by Hostway
    Stay green...Green IT