String and List Python Object Types (
Page 1 of 4 )
Last week, we introduced you to the different Python object types, starting with numbers. This week, we'll cover strings and begin our discussion of lists. This article, the second in a four-part series, is excerpted from chapter four of the book Learning Python, Third Edition, written by Mark Lutz (O'Reilly, 2008; ISBN: 0596513984). Copyright © 2008 O'Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O'Reilly Media.
Strings
Strings are used to record textual information as well as arbitrary collections of bytes. They are our first example of what we call a sequence in Python—that is, a positionally ordered collection of other objects. Sequences maintain a left-to-right order among the items they contain: their items are stored and fetched by their relative position. Strictly speaking, strings are sequences of one-character strings; other types of sequences include lists and tuples (covered later).
Sequence Operations
As sequences, strings support operations that assume a positional ordering among items. For example, if we have a four-character string, we can verify its length with the built-in len
function and fetch its components with indexing expressions:
>>> S = 'Spam'
>>> len(S) # Length
4
>>> S[0] # The first item in S, indexing by zero-based position
'S'
>>> S[1] # The second item from the left
'p'
In Python, indexes are coded as offsets from the front, and so start from 0: the first item is at index 0, the second is at index 1, and so on. In Python, we can also index backward, from the end:
>>> S[-1] # The last item from the end in S
'm
'
>>> S[-2] # The second to last item from the en
d
'a
'
Formally, a negative index is simply added to the string’s size, so the following two operations are equivalent (though the first is easier to code and less easy to get wrong):
>>> S[-1] # The last item in
S
'm
'
>>> S[len(S)-1] # Negative indexing, the hard wa
y
'm
'
Notice that we can use an arbitrary expression in the square brackets, not just a hardcoded number literal—anywhere that Python expects a value, we can use a
lit
eral, a variable, or any expression. Python’s syntax is completely general this way.
In addition to simple positional indexing, sequences also support a more general form of indexing known as slicing, which is a way to extract an entire section (slice) in a single step. For example:
>>> S # A 4-character string
'Spam
'
>>> S[1:3] # Slice of S from offsets 1 through 2 (not 3
)
'pa
'
Probably the easiest way to think of slices is that they are a way to extract an entire column from a string in a single step. Their general form,
X[I:J]
, means “give me everything in
X
from offset
I
up to but not including offset
J
.” The result is returned in a new object. The last operation above, for instance, gives us all the characters in string
S
from offsets 1 through 2 (that is, 3–1) as a new string. The effect is to slice or “parse out” the two characters in the middle.
In a slice, the left bound defaults to zero, and the right bound defaults to the length of the sequence being sliced. This leads to some common usage variations:
>>> S[1:] # Everything past the first (1:len(S))
'pam'
>>> S # S itself hasn't changed
'Spam'
>>> S[0:3] # Everything but the last
'Spa'
>>> S[:3] # Same as S[0:3]
'Spa'
>>> S[:-1] # Everything but the last again, but simpler (0:-1)
'Spa'
>>> S[:] # All of S as a top-level copy (0:len(S))
'Spam'
Note how negative offsets can be used to give bounds for slices, too, and how the last operation effectively copies the entire string. As you’ll learn later, there is no reason to copy a string, but this form can be useful for sequences like lists.
Finally, as sequences, strings also support concatenation with the plus sign (joining two strings into a new string), and repetition (making a new string by repeating another):
>>> S
'Spam'
>>> S +'xyz' # Concatenatio
n
'Spamxyz
'
>>> S # S is unchange
d
'Spam
'
>>> S * 8 # Repetitio
n
'SpamSpamSpamSpamSpamSpamSpamSpam
'
Notice that the plus sign (
+
) means different things for different objects: addition for numbers, and concatenation for strings. This is a general property of Python that we’ll call polymorphism later in the book—in sum, the meaning of an operation depends on the objects being operated on. As you’ll see when we study dynamic typing, this polymorphism property accounts for much of the conciseness and flexi
bility of Python code. Because types aren’t constrained, a Python-coded operation can normally work on many different types of objects automatically, as long as they support a compatible interface (like the
+
operation here). This turns out to be a huge idea in Python; you’ll learn more about it later on our tour.