Practices
  Home arrow Practices arrow Page 5 - Finding Things
Dev Shed Forums 
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Forums Sitemap 
IBM® developerWorks 
Sun Developer Network 
E-Commerce Hosting 
Linux Web Hosting 
Managed Hosting 
Small Business Hosting 
Mobile Linux 
App Generation ROI 
VPS Hosting 
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
PRACTICES

Finding Things
By: O'Reilly Media
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 1
    2008-07-10

    Table of Contents:
  • Finding Things
  • Regular Expressions
  • Putting Regular Expressions to Work
  • Content-Addressable Storage
  • Time to Optimize?

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Finding Things - Time to Optimize?


    (Page 5 of 5 )

     

    I was wondering whether my sample run was really unreasonably slow, so I pulled together a very similar program in Perl, a language that is less beautiful than Ruby but is extremely fast. Sure enough, the Perl version took half the time. So, should we try to optimize?

    We need to think about time again. Yes, we might be able to make this run faster, and thus reduce the program execution time and the time a user spends waiting for it, but to do this we’d have to burn some of the programmer’s time, and thus the time the user waits for the programmer to get the program written. In most cases, my instinct would be that 13.54 seconds to process a week’s data is OK, so I’d declare victory. But let’s suppose we’re starting to get gripes from people who use the program, and we’d like to make it run faster.

    Glancing over Example 4-4, we can see that the program falls into two distinct parts. First, it reads all the lines and tabulates the fetches; then it sorts them to find the top 10.

    There’s an obvious optimization opportunity here: why bother sorting all the fetch tallies when all we really want to do is pick the top 10? It’s easy enough to write a little code to run through the array once and pick the 10 highest elements.

    Would that help? I found out by instrumenting the program to find out how much time it spent doing its two tasks. The answer was (averaging over a few runs) 7.36 seconds in the first part and 0.07 in the second. Which is to say, “No, it wouldn’t help.”

    Might it be worthwhile to try to optimize the first part? Probably not; all it does is match regular expressions, and store and retrieve data using a Hash, and these are among the most heavily optimized parts of Ruby.

    So, getting fancy in replacing that sort would probably waste the time of the programmer and the customer waiting for the code, without saving any noticeable amount of computer or waiting-user time. Also, experience would teach that you’re not apt to go much faster than Perl does for this kind of task, so the amount of speedup you’re going to get is pretty well bounded.

    We’ve just finished writing a program that does something useful and turns out to be all about search. But we haven’t come anywhere near actually writing any search algorithms. So, let’s do that.


    SOME HISTORY OF TALLYING

    In the spirit of credit where credit is due, the notion of getting real work done by scanning lines of textual input using regular expressions and using a content-addressable store to build up results was first popularized in the awk programming language, whose name reflects the surnames of its inventors Aho, Weinberger, and Kernighan.

    This work, of course, was based on the then-radical Unix philosophy—due mostly to Ritchie and Thompson—that data should generally be stored in files in lines of text, and to some extent validated the philosophy.

    Larry Wall took the ideas behind awk and, as the author of Perl, turned them into a high-performance, industrial-strength, general-purpose tool that doesn’t get the credit it deserves. It served as the glue that has held together the world’s Unix systems, and subsequently large parts of the first-generation Web.


    Please check back next week for the continuation of this article.
    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

       · This article is an excerpt from the book "Beautiful Code: Leading Programmers...
     

    Buy this book now. This article is excerpted from chapter four of Beautiful Code: Leading Programmers Explain How They Think, written by Andy Oram and Greg Wilson (O'Reilly, 2007; ISBN: 0596510047). Check it out today at your favorite bookstore. Buy this book now.

       

    PRACTICES ARTICLES

    - More Techniques for Finding Things
    - Finding Things
    - Finishing the System`s Outlines
    - The System in So Many Words
    - Basic Data Types and Calculations
    - What`s the Address? Pointers
    - Design with ArgoUML
    - Pragmatic Guidelines: Diagrams That Work
    - Five-Step UML: OOAD for Short Attention Span...
    - Five-Step UML: OOAD for Short Attention Span...
    - Introducing UML: Object-Oriented Analysis an...
    - Class and Object Diagrams
    - Class Relationships
    - Classes
    - Basic Ideas

     
    Application Delivery: Everything You Wanted to Know, but Didn`t Know You Needed to Ask
    A comprehensive guide to examining the topics of Wide-area Data Services and app....

     
    Best Practices: Safe and Secure Hardware Asset Recovery
    Companies increasingly must meet EPA and local requirements for the disposal of ....

     
    Managing SSL Security in Multi-Server Environments
    Read this white paper to learn how to simplify management of your organization's....

     
    Open Source Security Myths
    Open Source Software (OSS) is computer software whose source code is available t....

     
    Power and Cooling Capacity Management for Data Centers
    This paper describes the principles for achieving power and cooling capacity man....

     




    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 4 hosted by Hostway
    Stay green...Green IT