String and List Python Object Types

Last week, we introduced you to the different Python object types, starting with numbers. This week, we’ll cover strings and begin our discussion of lists. This article, the second in a four-part series, is excerpted from chapter four of the book Learning Python, Third Edition, written by Mark Lutz (O’Reilly, 2008; ISBN: 0596513984). Copyright © 2008 O’Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O’Reilly Media.

Strings

Strings are used to record textual information as well as arbitrary collections of bytes. They are our first example of what we call a sequence in Python—that is, a positionally ordered collection of other objects. Sequences maintain a left-to-right order among the items they contain: their items are stored and fetched by their relative position. Strictly speaking, strings are sequences of one-character strings; other types of sequences include lists and tuples (covered later).

Sequence Operations

As sequences, strings support operations that assume a positional ordering among items. For example, if we have a four-character string, we can verify its length with the built-in len function and fetch its components with indexing expressions:

  >>> S = ‘Spam’
 
>>> len(S)             # Length
  4
  >>> S[0]               # The first item in S, indexing by zero-based position
  ‘S’
  >>> S[1]               # The second item from the left
  ‘p’

In Python, indexes are coded as offsets from the front, and so start from 0: the first item is at index 0, the second is at index 1, and so on. In Python, we can also index backward, from the end:

  >>> S[-1]            # The last item from the end in S
 
‘m ‘
  >>> S[-2]            # The second to last item from the en d
 
‘a ‘

Formally, a negative index is simply added to the string’s size, so the following two operations are equivalent (though the first is easier to code and less easy to get wrong):

  >>> S[-1]          # The last item in S 
  ‘m ‘
  >>> S[len(S)-1]    # Negative indexing, the hard wa y
 
‘m ‘

Notice that we can use an arbitrary expression in the square brackets, not just a hardcoded number literal—anywhere that Python expects a value, we can use a
lit eral, a variable, or any expression. Python’s syntax is completely general this way.

In addition to simple positional indexing, sequences also support a more general form of indexing known as slicing, which is a way to extract an entire section (slice) in a single step. For example:

  >>> S             # A 4-character string 
  ‘Spam ‘
  >>> S[1:3]        # Slice of S from offsets 1 through 2 (not 3 )
 
‘pa ‘

Probably the easiest way to think of slices is that they are a way to extract an entire column from a string in a single step. Their general form, X[I:J] , means “give me everything in X from offset I up to but not including offset J .” The result is returned in a new object. The last operation above, for instance, gives us all the characters in string S from offsets 1 through 2 (that is, 3–1) as a new string. The effect is to slice or “parse out” the two characters in the middle.

In a slice, the left bound defaults to zero, and the right bound defaults to the length of the sequence being sliced. This leads to some common usage variations:

  >>> S[1:]           # Everything past the first (1:len(S))
  ‘pam’
  >>> S               # S itself hasn’t changed
  ‘Spam’
  >>> S[0:3]         # Everything but the last
  ‘Spa’
  >>> S[:3]          # Same as S[0:3]
  ‘Spa’
  >>> S[:-1]         # Everything but the last again, but simpler (0:-1)
  ‘Spa’
  >>> S[:]           # All of S as a top-level copy (0:len(S))
  ‘Spam’
 

Note how negative offsets can be used to give bounds for slices, too, and how the last operation effectively copies the entire string. As you’ll learn later, there is no reason to copy a string, but this form can be useful for sequences like lists.

Finally, as sequences, strings also support concatenation with the plus sign (joining two strings into a new string), and repetition (making a new string by repeating another):

  >>> S
 
‘Spam’
  >>> S +’xyz’         # Concatenatio n  
  ‘Spamxyz ‘
  >>> S                  # S is unchange d 
  ‘Spam ‘
  >>> S * 8            # Repetitio n
 
‘SpamSpamSpamSpamSpamSpamSpamSpam ‘

Notice that the plus sign ( + ) means different things for different objects: addition for numbers, and concatenation for strings. This is a general property of Python that we’ll call polymorphism later in the book—in sum, the meaning of an operation depends on the objects being operated on. As you’ll see when we study dynamic typing, this polymorphism property accounts for much of the conciseness and flexi bility of Python code. Because types aren’t constrained, a Python-coded operation can normally work on many different types of objects automatically, as long as they support a compatible interface (like the + operation here). This turns out to be a huge idea in Python; you’ll learn more about it later on our tour.

{mospagebreak title=Immutability} 

Notice that in the prior examples, we were not changing the original string with any of the operations we ran on it. Every string operation is defined to produce a new string as its result, because strings are immutable in Python—they cannot be changed in-place after they are created. For example, you can’t change a string by assigning to one of its positions, but you can always build a new one and assign it to the same name. Because Python cleans up old objects as you go (as you’ll see later), this isn’t as inefficient as it may sound:

  >>> S
 
‘Spam’
  >>> S[0] = ‘z’             # Immutable objects cannot be change d
 
…error text omittted…
  TypeError: ‘str’ object does not support item assignment

  >>> S = ‘z’ + S[1:]        # But we can run expressions to make new objects
 
>>> S
 
‘zpam’

Every object in Python is classified as immutable (unchangeable) or not. In terms of the core types, numbers, strings, and tuples are immutable; lists and dictionaries are not (they can be changed in-place freely). Among other things, immutability can be used to guarantee that an object remains constant throughout your program.

Type-Specific Methods

Every string operation we’ve studied so far is really a sequence operation—that is, these operations will work on other sequences in Python as well, including lists and tuples. In addition to generic sequence operations, though, strings also have operations all their own, available as methods (functions attached to the object, which are triggered with a call expression).

For example, the string find method is the basic substring search operation (it returns the offset of the passed-in substring, or -1 if it is not present), and the string replace method performs global searches and replacements:

  >>> S.find(‘pa’)             # Find the offset of a substring 
  1
  >>> S
 
‘Spam’
  >>> S.replace(‘pa’, ‘XYZ’)   # Replace occurrences of a substring with anothe r
 
‘SXYZm ‘
  >>> S
  ‘Spam’

Again, despite the names of these string methods, we are not changing the original strings here, but creating new strings as the results—because strings are immutable, we have to do it this way. String methods are the first line of text-processing tools in Python; other methods split a string into substrings on a delimiter (handy as a sim ple form of parsing), perform case conversions, test the content of the string (digits, letters, and so on), and strip whitespace characters off the ends of the string:

  >>> line = ‘aaa,bbb,ccccc,dd’
  >>> line.split(‘,’)                # Split on a delimiter into a list of substrings
 
['aaa', 'bbb', 'ccccc', 'dd' ]

  >>> S = ‘spam’
  >>> S.upper()                    # Upper- and lowercase conversions
 
‘SPAM ‘

  >>> S.isalpha()                  # Content tests: isalpha, isdigit, etc .
 
Tru e

  >>> line = ‘aaa,bbb,ccccc,ddn’
 
>>> line = line.rstrip()         # Remove whitespace characters on the right side
 
>>> line
  ‘aaa,bbb,ccccc,dd’

One note here: although sequence operations are generic, methods are not—string method operations work only on strings, and nothing else. As a rule of thumb, Python’s toolset is layered: generic operations that span multiple types show up as built-in functions or expressions (e.g., len(X) , X[0] ), but type-specific operations are method calls (e.g., aString.upper() ). Finding the tools you need among all these cat egories will become more natural as you use Python more, but the next section gives a few tips you can use right now.

Getting Help

The methods introduced in the prior section are a representative, but small, sample of what is available for string objects. In general, this book is not exhaustive in its look at object methods. For more details, you can always call the built-in dir function, which returns a list of all the attributes available in a given object. Because methods are function attributes, they will show up in this list:

  >>> dir(S)
 
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__',
  '__ge__', '__getattribute__', '__getitem__',
'__getnewargs__', '__getslice__', 
  '__gt_ _', '__hash__', '__init__', '__le__', '__len__', '__lt_ _', '__mod__',
  '__mul__', '__ne__', '__new__',  '__reduce__', '__reduce_ex__', '__repr__',
  '__rmod__', '__rmul__', '__setattr__', '_
_str__', 'capitalize', 'center',
  'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index',
  'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper',
  'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex',
  'rjust', 'rpartition', 'rsplit', 'rstrip',
'split', 'splitlines', 'startswith',
  'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

You probably won’t care about the names with underscores in this list until later in the book, when we study operator overloading in classes—they represent the imple mentation of the string object, and are available to support customization. In general, leading and trailing double underscores is the naming pattern Python uses for implementation details. The names without the underscores in this list are the callable methods on string objects.

The dir function simply gives the methods’ names. To ask what they do, you can pass them to the help function:

  >>> help(S.index)
 
Help on built-in function index:

  index(… )
      S.index(sub [,start [,end]]) -> int

      Like S.find() but raise ValueError when the substring is not found.

help is one of a handful of interfaces to a system of code that ships with Python known as PyDoc—a tool for extracting documentation from objects. Later in the book, you’ll see that PyDoc can also render its reports in HTML format.

You can also ask for help on an entire string (e.g., help(S) ), but you may get more help than you want to see—i.e., information about every string method. It’s gener ally better to ask about a specific method, as we did above.

For more details, you can also consult Python’s standard library reference manual, or commercially published reference books, but dir and help are the first line of documentation in Python.

{mospagebreak title=Other Ways to Code Strings}

So far, we’ve looked at the string object’s sequence operations and type-specific methods. Python also provides a variety of ways for us to code strings, which we’ll explore further later (with special characters represented as backslash escape sequences, for instance):

  >>> S = ‘AnBtC’     # n is end-of-line, t is tab
 
>>> len(S)            # Each stands for just one character
 
5

  >>> ord(‘n’)          # n is a byte with the binary value 10 in ASCI I
 
1 0

  >>> S = ‘ABC’     # , the binary zero byte, does not terminate the strin g
 
>>>
len(S)
  5

Python allows strings to be enclosed in single or double quote characters (they mean the same thing). It also has a multiline string literal form enclosed in triple quotes (single or double)—when this form is used, all the lines are concatenated together, and end-of-line characters are added where line breaks appear. This is a minor syntactic convenience, but it’s useful for embedding things like HTML and XML code in a Python script:

  >>> msg = """
  aaaaaaaaaaaaa 
  bbb”’bbbbbbbbbb""bbbbbbb’bbbb
  
  cccccccccccccc"""
 
>>> msg 
 ‘naaaaaaaaaaaaanbbb”’bbbbbbbbbb""bbbbbbb ‘bbbbncccccccccccccc’

 

Python also supports a “raw” string literal that turns off the backslash escape mecha nism (they start with the letter r), as well as a Unicode string form that supports internationalization (they begin with the letter u and contain multibyte characters). Technically, Unicode string is a different data type than normal string, but it supports all the same string operations. We’ll meet all these special string forms in later chapters.

Pattern Matching

One point worth noting before we move on is that none of the string object’s methods support pattern-based text processing. Text pattern matching is an advanced tool outside this book’s scope, but readers with backgrounds in other scripting languages may be interested to know that to do pattern matching in Python, we import a module called re . This module has analogous calls for searching, splitting, and replacement, but because we can use patterns to specify substrings, we can be much more general:

  >>> import re
 
>>> match = re.match(‘Hello[ t]*(.*)world’, ‘Hello  Python world’)
 
>>> match.group(1)
 
‘Python’

This example searches for a substring that begins with the word “Hello,” followed by zero or more tabs or spaces, followed by arbitrary characters to be saved as a matched group, terminated by the word “world.” If such as substring is found, por tions of the substring matched by parts of the pattern enclosed in parentheses are available as groups. The following pattern, for example, picks out three groups separated by slashes:

  >>> match = re.match(‘/(.*)/(.*)/(.*)’, ‘/usr/home/lumberjack’)
  >>> match.groups()
  (‘usr’, ‘home’, ‘lumberjack’)

Pattern matching is a fairly advanced text-processing tool by itself, but there is also support in Python for even more advanced language processing, including natural language processing. I’ve already said enough about strings for this tutorial, though, so let’s move on to the next type.

{mospagebreak title=Lists} 

The Python list object is the most general sequence provided by the language. Lists are positionally ordered collections of arbitrarily typed objects, and they have no fixed size. They are also mutable—unlike strings, lists can be modified in-place by assignment to offsets as well as a variety of list method calls.

Sequence Operations

Because they are sequences, lists support all the sequence operations we discussed for strings; the only difference is that results are usually lists instead of strings. For instance, given a three-item list:

  >>> L = [123, 'spam', 1.23]          # A list of three different-type objects
 
>>> len(L)                             # Number of items in the list
  3

we can index, slice, and so on, just as for strings:

  >>> L[0]                           # Indexing by positio n
 
12 3

  >>> L[:-1]                         # Slicing a list returns a new lis t
 
[123, 'spam' ]

  >>> L + [4, 5, 6]                  # Concatenation makes a new list to o
 
[123, 'spam', 1.23, 4, 5, 6 ]

  >>> L                              # We’re not changing the original lis t
 
[123, 'spam', 1.23 ]

Type-Specific Operations

Python’s lists are related to arrays in other languages, but they tend to be more powerful. For one thing, they have no fixed type constraint—the list we just looked at, for example, contains three objects of completely different types (an integer, a string, and a floating-point number). Further, lists have no fixed size. That is, they can grow and shrink on demand, in response to list-specific operations:

  >>> L.append(‘NI’)                 # Growing: add object at end of list
 
>>> L
  [123, 'spam', 1.23, 'NI']

  >>> L.pop(2)                       # Shrinking: delete an item in the middle
  1.23

  >>> L                                  # "del L[2]" deletes from a list too
  [123, 'spam', 'NI']

Here, the list append method expands the list’s size and inserts an item at the end; the pop method (or an equivalent del statement) then removes an item at a given offset, causing the list to shrink. Other list methods insert items at an arbitrary position ( insert ), remove a given item by value ( remove ), and so on. Because lists are muta ble, most list methods also change the list object in-place, instead of creating a new one:

  >>> M = ['bb', 'aa', 'cc']
  >>> M.sort()
 
>>> M
  ['aa', 'bb', 'cc']

  >>> M.reverse()
  >>> M
  ['cc', 'bb', 'aa']

The list sort method here, for example, orders the list in ascending fashion by default, and reverse reverses it—in both cases, the methods modify the list directly.

Bounds Checking

Although lists have no fixed size, Python still doesn’t allow us to reference items that are not present. Indexing off the end of a list is always a mistake, but so is assigning off the end:

  >>> L
  [123, 'spam', 'NI']

  >>> L[99]

  …error text omitted…
 
IndexError: list index out of range

  >>> L[99] = 1
 
…error text omitted…
 
IndexError: list assignment index out of range

This is on purpose, as it’s usually an error to try to assign off the end of a list (and a particularly nasty one in the C language, which doesn’t do as much error checking as Python). Rather than silently growing the list in response, Python reports an error. To grow a list, we call list methods such as append instead.

Nesting

One nice feature of Python’s core data types is that they support arbitrary nesting—we can nest them in any combination, and as deeply as we like (for example, we can have a list that contains a dictionary, which contains another list, and so on). One immediate application of this feature is to represent matrixes, or “multidimensional arrays” in Python. A list with nested lists will do the job for basic applications:

  >>> M = [[1, 2, 3],         # A 3 x 3 matrix, as nested lists 
           
[4, 5, 6],
         
[7, 8, 9]]
 
>>> M
  [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Here, we’ve coded a list that contains three other lists. The effect is to represent a 3 × 3 matrix of numbers. Such a structure can be accessed in a variety of ways:

  >>> M[1]                 # Get row 2 
  [4, 5, 6 ]

  >>> M[1][2]              # Get row 2, then get item 3 within the row

The first operation here fetches the entire second row, and the second grabs the third item within that row—stringing together index operations takes us deeper and deeper into our nested-object structure.*

Please check back next week for the continuation of this article.

[gp-comments width="770" linklove="off" ]

antalya escort bayan antalya escort bayan Antalya escort diyarbakir escort