Home arrow Practices arrow Page 2 - Finding Things

Regular Expressions - Practices

Search, whether it's searching the web or the contents of your computer, presents the developer with a major challenge. This article, the first of two parts, provides an overview of several search techniques, and the trade-offs that go with them. It is excerpted from chapter four of Beautiful Code: Leading Programmers Explain How They Think, written by Andy Oram and Greg Wilson (O'Reilly, 2007; ISBN: 0596510047). Copyright © 2007 O'Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O'Reilly Media.

TABLE OF CONTENTS:
  1. Finding Things
  2. Regular Expressions
  3. Putting Regular Expressions to Work
  4. Content-Addressable Storage
  5. Time to Optimize?
By: O'Reilly Media
Rating: starstarstarstarstar / 4
July 10, 2008

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement
 

Regular expressions are special languages designed specifically for matching patterns in text. If you learn how to use them well, you’ll save yourself immense amounts of time and irritation. I’ve never met a really accomplished programmer who wasn’t a master of regular expressions (often called regexps for short). Chapter 1, by Brian Kernighan, is dedicated to the beauty of regular expressions.

Because the filenames on my web site match such a strict, date-based pattern, a very straightforward regular expression can find the logfile lines I’m interested in. Other sites’ logfiles might require a more elaborate one. Here it is:

  "GET /ongoing/When/\d\d\dx/\d\d\d\d/\d\d/\d\d/[^ .]+ "

A glance at this line instantly reveals one of the problems with regular expressions; they’re not the world’s most readable text. Some people might challenge their appearance in a book called Beautiful Code. Let’s put that issue aside for a moment and look at this particular expression. The only thing you need to know is that in this particular flavor of regular expression:

\d
   Means “match any digit, 0 through 9”

[^ .]
   Means “match any character that’s not a space or
   period”*

+
   Means “match one or more instances of whatever
   came just before the+“

That[^ .]+, then, means that the last slash has to be followed by a bunch of nonspace and nonperiod characters. There’s a space after the+sign, so the regular expression stops when that space is found.

This regular expression won’t match a line where the filename contains a period. So it will matchGrief-Lessons, the example I showed earlier from my logfile, but notIMG0038.jpg.



 
 
>>> More Practices Articles          >>> More By O'Reilly Media
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

PRACTICES ARTICLES

- Calculating Development Project Costs
- More Techniques for Finding Things
- Finding Things
- Finishing the System`s Outlines
- The System in So Many Words
- Basic Data Types and Calculations
- What`s the Address? Pointers
- Design with ArgoUML
- Pragmatic Guidelines: Diagrams That Work
- Five-Step UML: OOAD for Short Attention Span...
- Five-Step UML: OOAD for Short Attention Span...
- Introducing UML: Object-Oriented Analysis an...
- Class and Object Diagrams
- Class Relationships
- Classes

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: