SunQuest
 
       Python
  Home arrow Python arrow Page 4 - String Manipulation
Dev Shed Forums 
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Forums Sitemap 
IBM® developerWorks 
Sun Developer Network 
Dedicated Servers 
E-Commerce Hosting 
Linux Web Hosting 
Managed Hosting 
Small Business Hosting 
Actuate Whitepapers 
VeriSign Whitepapers 
VPS Hosting 
Weekly Newsletter

 
Developer Updates  
Free Website Content 
IBM developerWorks
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
PYTHON

String Manipulation
By: Peyton McCullough
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 59
    2005-05-02

    Table of Contents:
  • String Manipulation
  • Splitting strings, making cases
  • Numbers and spaces
  • Regular Expressions

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT

    Stay one step ahead of the competition. Evaluate and give feedback on some of the hottest web development tools on the market today. Make your opinion heard! Click Here

    String Manipulation - Regular Expressions


    (Page 4 of 4 )

    Regular expressions are a very powerful tool in any language. They allow patterns to be matched against strings. Actions such as replacement can be performed on the string if the regular expression pattern matches. Python's module for regular expressions is the re module. Open the Python interactive interpreter, and let's take a closer look at regular expressions and the re module:

    >>> import re

    Let's create a simple string we can use to play around with:

    >>> test = 'This is for testing regular expressions in Python.'

    I spoke of matching special patterns with regular expressions, but let's start with matching a simple string just to get used to regular expressions. There are two methods for matching patterns in strings in the re module: search and match. Let's take a look at search first. It works like this:

    >>> result = re.search ( 'This', test )

    We can extract the results using the group method:

    >>> result.group ( 0 )
    'This'

    You're probably wondering about the group method right now and why we pass zero to it. It's simple, and I'll explain. You see, patterns can be organized into groups, like this:

    >>> result = re.search ( '(Th)(is)', test )

    There are two groups surrounded by parenthesis. We can extract them using the group method:

    >>> result.group ( 1 )
    'Th'
    >>> result.group ( 2 )
    'is'

    Passing zero to the method returns both of the groups:

    >>> result.group ( 0 )
    'This'

    The benefit of groups will become more clear once we work our way into actual patterns. First, though, let's take a look at the match function. It works similarly, but there is a crucial difference:

    >>> result =  re.match ( 'This', test )
    >>> print result
    <_sre.SRE_Match object at 0x00994250>
    >>> print result.group ( 0 )
    'This'
    >>> result = re.match ( 'regular', test )
    >>> print result
    None

    Notice that None was returned, even though “regular” is in the string. If you haven't figured it out, the match method matches patterns at the beginning of the string, and the search function examines the whole string. You might be wondering if it's possible, then, to make the match method match “regular,” since it's not at the beginning of the string. The answer is yes. It's possible to match it, and that brings us into patterns.

    The character “.” will match any character. We can get the match method to match “regular” by putting a period for every letter before it. Let's split this up into two groups as well. One will contain the periods, and one will contain “regular”:

    >>> result = re.match ( '(....................)(regular)', test )
    >>> result.group ( 0 )
    'This is for testing regular'
    >>> result.group ( 1 )
    'This is for testing '
    >>> result.group ( 2 )
    'regular'

    Aha! We matched it! However, it's ridiculous to have to type in all those periods. The good news is that we don't have to do that. Take a look at this and remember that there are twenty characters before “regular”:

    >>> result = re.match ( '(.{20})(regular)', test )
    >>> result.group ( 0 )
    'This is for testing regular'
    >>> result.group ( 1 )
    'This is for testing '
    >>> result.group ( 2 )
    'regular'

    That's a lot easier. Now let's look at a few more patterns. Here's how you can use brackets in a more advanced way:

    >>> result = re.match ( '(.{10,20})(regular)', test )
    >>> result.group ( 0 )
    'This is for testing regular'
    >>> result = re.match ( '(.{10,20})(testing)', test )
    'This is for testing'

    By entering two arguments, so to speak, you can match any number of characters in a range. In this case, that range is 10-20. Sometimes, however, this can cause undesired behavior. Take a look at this string:

    >>> anotherTest = 'a cat, a dog, a goat, a person'

    Let's match a range of characters:

    >>> result = re.match ( '(.{5,20})(,)', anotherTest )
    >>> result.group ( 1 )
    'a cat, a dog, a goat'

    What if we only want “a cat” though? This can be done with appending “?” to the end of the brackets:

    >>> result = re.match ( '(.{5,20}?)(,)', anotherTest )
    >>> result.group ( 1 )
    'a cat'

    Appending a question mark to something makes it match as few characters as possible. A question mark that does that, though, is not to be confused with this pattern:

    >>> anotherTest = '012345'
    >>> result = re.match ( '01?', anotherTest )
    >>> result.group ( 0 )
    '01'
    >>> result = re.match ( '0123456?', anotherTest )
    >>> result.group ( 0 )
    '012345'

    As you can see with the example, the character before a question mark is optional. Next is the “*” pattern. It matches one or more of the characters it follows, like this:

    >>> anotherTest = 'Just a silly string.'
    >>> result = re.match ( '(.*)(a)(.*)(string)', anotherTest )
    >>> result.group ( 0 )
    'Just a silly string'

    However, take a look at this:

    >>> anotherTest = 'Just a silly string. A very silly string.'
    >>> result = re.match ( '(.*)(a)(.*)(string)', anotherTest )
    >>> result.group ( 0 )
    'Just a silly string. A very silly string'

    What if, however, we want to only match the first sentence? If you've been following along closely, you'll know that “?” will, again, do the trick:

    >>> result = re.match ( '(.*?)(a)(.*?)(string)', anotherTest )
    >>> result.group ( 0 )
    'Just a silly string'

    As I mentioned earlier, though, “*” doesn't have to match anything:

    >>> result = re.match ( '(.*?)(01)', anotherTest )
    >>> result.group ( 0 )
    '01'

    What if we want to skip past the first two characters? This is possible by using “+”, which is similar to “*”, except that it matches at least one character:

    >>> result = re.match ( '(.+?)(01)', anotherTest )
    >>> result.group ( 0 )
    '0101'

    We can also match a range of characters. For example, we can match only the first four letters of the alphabet:

    >>> anotherTest = 'a101'
    >>> result = re.match ( '[a-d]', anotherTest )
    >>> print result
    <_sre.SRE_Match object at 0x00B47B10>
    >>> anotherTest = 'q101'
    >>> result = re.match ( '[a-d]', anotherTest )
    >>> print result
    None

    We can also match one of a few patterns using “|”::

    >>> testA = 'a'
    >>> testB = 'b'
    >>> result = re.match ( '(a|b)', testA )
    >>> print result
    <_sre.SRE_Match object at 0x00B46D60>
    >>> result = re.match ( '(a|b)', testB )
    >>> print result
    <_sre.SRE_Match object at 0x00B46E60>

    Finally, there are a number of special sequences. “\A” matches at the start of a string. “\Z” matches at the end of a string. “\d” matches a digit. “\D” matches anything but a digit. “\s” matches whitespace. “\S” matches anything but whitespace.

    We can name our groups:

    >>> nameTest = 'hot sauce'
    >>> result = re.match ( '(?P<one>hot)', nameTest )
    >>> result.group ( 'one' )
    'hot'

    We can compile patterns to use them multiple times with the re module, too:

    >>> ourPattern = re.compile ( '(.*?)(the)' )
    >>> testString = 'This is the dog and the cat.'
    >>> result = ourPattern.match ( testString )
    >>> result.group ( 0 )
    'This is the'

    Of course, you can do more than match and extract substrings. You can replace things, too:

    >>> someString = 'I have a dream.'
    >>> re.sub ( 'dream', 'dog', someString )
    'I have a dog.'

    On a final note, you should not use regular expressions to match or replace simple strings.

    Conclusion

    Now you have a basic knowledge of string manipulation in Python behind you. As I explained at the very beginning of the article, string manipulation is necessary to many applications, both large and small. It is used frequently, and a basic knowledge of it is critical.


    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

       · You should probably offer at least a line or two of explanation as to <em>why</em>...
       · I agree, the article should provide a better reason for using string methods rather...
       · I was reading your article because i am looking for a way to pick up a character by...
       · People could learn Python, a wonderful object-oriented language, but moron students...
       · This Article was helpful but I am interested in being able to pull data from a...
       · Try looking at the split method:>>> myParameters = "a=1,b=2,c=3,d=4">>>...
       · i want to word by wordlike example given belowa="hi i am ...
       · The split method will do what you want:>>> a = "some words here">>>...
       · Python's equivalent for Mid$(start, stop) would be someString[start:stop]. Don't use...
     

       

    PYTHON ARTICLES

    - SSH with Twisted
    - Mobile Programming in Python using PyS60: UI...
    - Python: Count on It
    - Python Strings: Spinning Yarns
    - Python: More Fun with Strings
    - Python: Stringing You Along
    - Python Operators
    - Bluetooth Programming in Python: Network Pro...
    - Python Sets
    - Python Conditionals, Lists, Dictionaries, an...
    - Python: Input and Variables
    - Introduction to Python Programming
    - Mobile Programming in Python using PyS60: Ge...
    - Bluetooth Programming using Python
    - Finishing the PyMailGUI Client: User Help To...





    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 3 hosted by Hostway