Regular expressions are special languages designed specifically for matching patterns in text. If you learn how to use them well, you’ll save yourself immense amounts of time and irritation. I’ve never met a really accomplished programmer who wasn’t a master of regular expressions (often called regexps for short). Chapter 1, by Brian Kernighan, is dedicated to the beauty of regular expressions.
Because the filenames on my web site match such a strict, date-based pattern, a very straightforward regular expression can find the logfile lines I’m interested in. Other sites’ logfiles might require a more elaborate one. Here it is:
A glance at this line instantly reveals one of the problems with regular expressions; they’re not the world’s most readable text. Some people might challenge their appearance in a book called Beautiful Code. Let’s put that issue aside for a moment and look at this particular expression. The only thing you need to know is that in this particular flavor of regular expression:
\d Means “match any digit, 0 through 9”
[^ .] Means “match any character that’s not a space or period”*
+ Means “match one or more instances of whatever came just before the+“
That[^ .]+, then, means that the last slash has to be followed by a bunch of nonspace and nonperiod characters. There’s a space after the+sign, so the regular expression stops when that space is found.
This regular expression won’t match a line where the filename contains a period. So it will matchGrief-Lessons, the example I showed earlier from my logfile, but notIMG0038.jpg.