Home arrow Perl Programming arrow Completing Regular Expression Basics

Completing Regular Expression Basics

In this conclusion to a four-part series on parsing and regular expression basics in Perl, we finish our study of regular expressions; you'll even learn how to create your own. This article is excerpted from chapter one of the book Pro Perl Parsing, written by Christopher M. Frenz (Apress; ISBN: 1590595041).

TABLE OF CONTENTS:
  1. Completing Regular Expression Basics
  2. Universal Flags
  3. Subroutine-Based Usage
  4. Creating Your Own Expressions
By: Apress Publishing
Rating: starstarstarstarstar / 4
June 10, 2010

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

Regexp::Common::Comments

This module generates regular expressions that match comments inserted into computer code written in a variety of programming languages (currently 43). The syntax to call these regular expressions is as follows, where {comments} refers to the base comment matching functionality and {LANGUAGE} provides the descriptor that indicates the particular programming language:

$RE{comments}{LANGUAGE}

For example, to match Perl and C++ comments, you can use the following:

$RE{comments}{Perl}
$RE{comments}{C++}

Regexp::Common::Delimited

This base module provides the functionality required to match delimited strings. The syntax is similar to that shown for the Text::Balanced module:

$RE{delimited}{-delim=>'"'}

In this case, the-delimflag specifies the delimiter that the regular expression will search for and is a required flag, since the module does not have a default delimiter.


Note  Table 1-8 summarizes all theRegexp::Commonflags.

 


 

Regexp::Common::List

The List module can match lists of data such as tab-separated lists, lists of numbers, lists of words, and so on. The type of list matched depends on the flags specified in the expression. Its syntax is as follows:

$RE{list}{-pat}{-sep}{-lastsep}

The pattern flag specifies the pattern that will correspond to each substring that is contained in the list. The pattern can be in the form of a regular expression such as\w+or can be another hash value created by theRegexp::Commonmodule. The-sepflag defines a type of separator that may be present between consecutive list elements, such as a tab or a space (the default). The-lastsepflag specifies a separator that may be present between the last two elements in the list. By default, this value is the same as that specified by-sep. As an example, if you wanted to search a document for lists that were specified in theItem A, Item B, ..., and Item Nformat, you could easily identify such listings using the following expression:

$RE{list}{-pat}{-sep=>', '}{-lastsep=>', and '}

Regexp::Common::Net

The Net module generates hash values that contain patterns designed to match IPv4 and MAC addresses, and the first hash key specifies which type to match. The next hash key allows you to specify whether the address will be decimal (default), hexadecimal, or octal. You can also use the -sep flag to specify a separator, if required. The following is a sample:

$RE{net}{IPv4}{hex}

This module comes in handy if you want to monitor the domains that different e-mails you have received originated from. This information is found in most e-mail headers in a format similar to the following:

from [64.12.116.134] by web51102.mail.yahoo.com via HTTP;
Mon, 29 Nov 2004 23:33:11 -0800 (PST)

You can easily parse this header information to find the IPv4 address 64.12.116.134 by using the following expression:

$RE{net}{IPv4}

Regexp::Common::Number

The Number module can match a variety of different number types, including integers, reals, hexadecimals, octals, binaries, and even Roman numerals. The base syntax is of the following form, but you should also be aware of a diversity of flags:

$RE{num}{real}

For example, you can apply the-baseflag to change the base of the number to something other than the default of base 10. The-radixflag specifies the pattern that will serve as the decimal point in case you desire something other than the default value (.). If you are dealing with significant figures, you may find the-placesflag useful, since it can specify the number of places after the decimal point. As in previous modules,-sepspecifies separators; however, in this module, you can also specify the appropriate number of digits that should be present between separators using the-groupflag. The default value for this flag is3, so if you specified a comma (,) as your separator, your expression would be able to recognize values such as123,456,789. The-exponflag specifies the pattern that will be used to specify that an exponent is present. The default value for this property is [Ee].



 
 
>>> More Perl Programming Articles          >>> More By Apress Publishing
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

PERL PROGRAMMING ARTICLES

- Perl Turns 25
- Lists and Arguments in Perl
- Variables and Arguments in Perl
- Understanding Scope and Packages in Perl
- Arguments and Return Values in Perl
- Invoking Perl Subroutines and Functions
- Subroutines and Functions in Perl
- Perl Basics: Writing and Debugging Programs
- Structure and Statements in Perl
- First Steps in Perl
- Completing Regular Expression Basics
- Modifiers, Boundaries, and Regular Expressio...
- Quantifiers and Other Regular Expression Bas...
- Parsing and Regular Expression Basics
- Hash Functions

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: