Home arrow Practices arrow Page 2 - More Techniques for Finding Things

Binary Search - Practices

In this second part of a two-part series that provides an overview of search techniques for the developer, you'll learn more about the challenges and trade-offs of various approaches. It is excerpted from chapter four of Beautiful Code: Leading Programmers Explain How They Think, written by Andy Oram and Greg Wilson (O'Reilly, 2007; ISBN: 0596510047). Copyright © 2007 O'Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O'Reilly Media.

TABLE OF CONTENTS:
  1. More Techniques for Finding Things
  2. Binary Search
  3. Binary Search Trade-offs
  4. Escaping the Loop
  5. Searching the Web
By: O'Reilly Media
Rating: starstarstarstarstar / 4
July 17, 2008

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

Nobody gets a Computer Science degree without studying a wide variety of search algorithms: trees, heaps, hashes, lists, and more. My favorite among all these is binary search. Letís try it on the who-fetched-what-when problem and then look at what makes it beautiful.

My first attempt at putting binary search to use was quite disappointing; while the data took 10 minutes less to load, it required almost 100 MB more memory than with the hash. Clearly, there are some surprising things about the Ruby array implementation. The search also ran several times slower (but still in the range of thousands per second), but this is not surprising at all because the algorithm is running in Ruby code rather than with the underlying hardcoded hash implementation.

The problem is that in Ruby everything is an object, and arrays are fairly abstracted things with lots of built-in magic. So, letís reimplement the program in Java, in which integers are just integers, and arrays come with very few extras.*

Nothing could be simpler, conceptually, than binary search. You divide your search space in two and see whether you should be looking in the top or bottom half; then you repeat the exercise until done. Instructively, there are a great many ways to code this algorithm incorrectly, and several widely published versions contain bugs. The implementation mentioned in ďOn the Goodness of Binary Search,Ē and shown in Java in Example 4-6, is based on one I learned from Gaston Gonnet, the lead developer of the Maple language for symbolic mathematics and currently Professor of Computer Science at ETH in ZŁrich.

EXAMPLE 4-6. Binary search

1 package binary;
2
3  public class Finder {
4    public static int find(String[] keys, String target) {
5      int high = keys.length;
6      int low = -1;
7      while (high - low > 1) {
8        int probe = (low + high) >>> 1;
9        if (keys[probe].compareTo(target) > 0)
10         high = probe;
11       else
12         low = probe;
13     }
14     if (low == -1 || keys[low].compareTo(target) != 0)
15       return -1;
16     else
17       return low;
18   }
19 }

Key aspects of this program are as follows:

  • In lines 5Ė6, note that thehighandlowbounds are set one off the ends of the array, so neither are initially valid indices. This eliminates all sorts of corner cases.
  • The loop that starts in line 7 runs until the high and low bounds are adjacent; there is no testing to see whether the target has been found. Think for a minute whether you agree with this choice; weíll return to the question later.

    The loop has two invariants.lowis either Ė1 or points to something less than or equal to the target value.high is either one off the top of the array or points to something strictly greater than the target value.
  • Line 8 is particularly interesting. In an earlier version it read:

      probe = (high + low) / 2;

    but in June 2006, Java guru Josh Bloch showed how, in certain obscure circumstances, that code could lead to integer overflow (see http://googleresearch.blogspot.com/2006/06/ extra-extra-read-all-about-it-nearly.html). It is sobering indeed that, many decades into the lifetime of computer science, we are still finding bugs in our core algorithms. (The issue is also discussed by Alberto Savoia in Chapter7.)

    At this point, Rubyists will point out that modern dynamic languages such as Ruby and Python take care of integer overflow for you, and thus donít have this bug. 
  • Because of the loop invariant, once Iím done with the loop, I just need to checklow(lines 14Ė17). If itís not Ė1, either it points to something that matches the target, or the target isnít there.

The Java version took only six and a half minutes to load, and it ran successfully, using less than 1 GB of heap. Also, while itís harder to measure CPU time in Java than in Ruby, there was no perceptible delay in running the same 2,000 searches.



 
 
>>> More Practices Articles          >>> More By O'Reilly Media
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

PRACTICES ARTICLES

- Calculating Development Project Costs
- More Techniques for Finding Things
- Finding Things
- Finishing the System`s Outlines
- The System in So Many Words
- Basic Data Types and Calculations
- What`s the Address? Pointers
- Design with ArgoUML
- Pragmatic Guidelines: Diagrams That Work
- Five-Step UML: OOAD for Short Attention Span...
- Five-Step UML: OOAD for Short Attention Span...
- Introducing UML: Object-Oriented Analysis an...
- Class and Object Diagrams
- Class Relationships
- Classes

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: