A built-in string object (plain or Unicode) is a sequence of characters used to store and represent text-based information (plain strings are also sometimes used to store and represent arbitrary sequences of binary bytes). Strings in Python are immutable, meaning that when you perform an operation on strings, you always produce a new string object, rather than mutating an existing string. String objects provide many methods, as discussed in detail in "Methods of String Objects" on page 186.
A string literal can be quoted or triple-quoted. A quoted string is a sequence of zero or more characters enclosed in matching quotes, single (') or double ("). For example:
'This is a literal string' "This is another string"
The two different kinds of quotes function identically; having both allows you to include one kind of quote inside of a string specified with the other kind without needing to escape them with the backslash character (\):
'I\'m a Python fanatic' # a quote can be escaped "I'm a Python fanatic" # this way is more readable
All other things being equal, using single quotes to denote string literals is a more common Python style. To have a string literal span multiple physical lines, you can use a backslash as the last character of a line to indicate that the next line is a continuation:
"A not very long string\ that spans two lines" # comment not allowed on previous line
To make the string output on two lines, you can embed a newline in the string:
"A not very long string\n\ that prints on two lines" # comment not allowed on previous line
A better approach is to use a triple-quoted string, which is enclosed by matching triplets of quote characters (''' or """):
"""An even bigger string that spans three lines""" # comments not allowed on previous lines
In a triple-quoted string literal, line breaks in the literal are preserved as newline characters in the resulting string object.
The only character that cannot be part of a triple-quoted string is an unescaped backslash, while a quoted string cannot contain unescaped backslashes, nor line ends, nor the quote character that encloses it. The backslash character starts an escape sequence, which lets you introduce any character in either kind of string. Python's string escape sequences are listed in Table 4-1.
Table 4-1. String escape sequences
End of line is ignored
Octal value DDD
Hexadecimal value XX
Any other character
0x5c + as given
A variant of a string literal is a raw string. The syntax is the same as for quoted or triple-quoted string literals, except that an r or R immediately precedes the leading quote. In raw strings, escape sequences are not interpreted as in Table 4-1, but are literally copied into the string, including backslashes and newline characters. Raw string syntax is handy for strings that include many backslashes, as in regular expressions (see "Pattern-String Syntax" on page 201). A raw string cannot end with an odd number of backslashes; the last one would be taken as escaping the terminating quote.
Multiple string literals of any kind (quoted, triple-quoted, raw, Unicode) can be adjacent, with optional whitespace in between. The compiler concatenates such adjacent string literals into a single string object. If any literal in the concatenation is Unicode, the whole result is Unicode. Writing a long string literal in this way lets you present it readably across multiple physical lines and gives you an opportunity to insert comments about parts of the string. For example:
marypop = ('supercalifragilistic' # Open paren -> logical line continues 'expialidocious') # Indentation ignored in continuation
The string assigned to marypop is a single word of 34 characters.