Nobody gets a Computer Science degree without studying a wide variety of search algorithms: trees, heaps, hashes, lists, and more. My favorite among all these is binary search. Let’s try it on the who-fetched-what-when problem and then look at what makes it beautiful. My first attempt at putting binary search to use was quite disappointing; while the data took 10 minutes less to load, it required almost 100 MB more memory than with the hash. Clearly, there are some surprising things about the Ruby array implementation. The search also ran several times slower (but still in the range of thousands per second), but this is not surprising at all because the algorithm is running in Ruby code rather than with the underlying hardcoded hash implementation. The problem is that in Ruby everything is an object, and arrays are fairly abstracted things with lots of built-in magic. So, let’s reimplement the program in Java, in which integers are just integers, and arrays come with very few extras.* Nothing could be simpler, conceptually, than binary search. You divide your search space in two and see whether you should be looking in the top or bottom half; then you repeat the exercise until done. Instructively, there are a great many ways to code this algorithm incorrectly, and several widely published versions contain bugs. The implementation mentioned in “On the Goodness of Binary Search,” and shown in Java in Example 4-6, is based on one I learned from Gaston Gonnet, the lead developer of the Maple language for symbolic mathematics and currently Professor of Computer Science at ETH in Zürich. EXAMPLE 4-6. Binary search 1 package binary; Key aspects of this program are as follows:
The Java version took only six and a half minutes to load, and it ran successfully, using less than 1 GB of heap. Also, while it’s harder to measure CPU time in Java than in Ruby, there was no perceptible delay in running the same 2,000 searches.
blog comments powered by Disqus |