Data Types in Python

In this second of nine parts focusing on a quick overview of the Python language for experienced programmers, you’ll learn how Python handles data types such as strings, and more. This article is excerpted from chapter four of Python in a Nutshell, Second Edition, written by Alex Martelli (O’Reilly; ISBN: 0596100469). Copyright © 2007 O’Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O’Reilly Media.

Data Types

The operation of a Python program hinges on the data it handles. All data values in Python are objects, and each object, or value, has a type. An object’s type determines which operations the object supports, or, in other words, which operations you can perform on the data value. The type also determines the object’s attributes and items (if any) and whether the object can be altered. An object that can be altered is known as a mutable object, while one that cannot be altered is an immutable object. I cover object attributes and items in detail in "Object attributes and items" on page 46.

The built-in type(obj) accepts any object as its argument and returns the type object that is the type of obj . Built-in function isinstance(obj, type) returns True if object obj has type type (or any subclass thereof); otherwise, it returns False.

Python has built-in types for fundamental data types such as numbers, strings, tuples, lists, and dictionaries, as covered in the following sections. You can also create user-defined types, known as classes, as discussed in "Classes and Instances" on page 82.

Numbers

The built-in number objects in Python support integers (plain and long), floating-point numbers, and complex numbers. In Python 2.4, the standard library also offers decimal floating-point numbers, covered in "The decimal Module" on page 372. All numbers in Python are immutable objects, meaning that when you perform any operation on a number object, you always produce a new number object. Operations on numbers, also known as arithmetic operations, are covered in "Numeric Operations" on page 52.

Note that numeric literals do not include a sign: a leading + or -, if present, is a separate operator, as discussed in "Arithmetic Operations" on page 52.

Integer numbers

Integer literals can be decimal, octal, or hexadecimal. A decimal literal is represented by a sequence of digits in which the first digit is nonzero. To denote an octal literal, use 0 followed by a sequence of octal digits
(0 to 7). To indicate a hexadecimal literal, use 0x followed by a sequence of hexadecimal digits
(0 to 9 and A to F, in either upper- or lowercase). For example:

  1, 23, 3493         # Decimal integers
  01, 027, 06645      # Octal integers
  0x1, 0x17, 0xDA5    # Hexadecimal integers

In practice, you don’t need to worry about the distinction between plain and long integers in modern Python, since operating on plain integers produces results that are long integers when needed (i.e., when the result would not fit within the range of plain integers). However, you may choose to terminate any kind of integer literal with a letter L (or l) to explicitly denote a long integer. For instance:

  1L, 23L, 99999333493L         # Long decimal integers
  01L, 027L, 01351033136165L    # Long octal integers
  0x1L, 0x17L, 0x17486CBC75L    # Long hexadecimal integers

Use uppercase L here, not lowercase l, which might look like the digit 1. The difference between long and plain integers is one of implementation. A long integer has no predefined size limit; it may be as large as memory allows. A plain integer takes up just a few bytes of memory and its minimum and maximum values are dictated by machine architecture. sys.maxint is the largest positive plain integer available, while
-sys.maxint-1 is the largest negative one. On 32-bit machines, sys.maxint is 2147483647 .

Floating-point numbers

A floating-point literal is represented by a sequence of decimal digits that includes a decimal point (.), an exponent part (an e or E, optionally followed by + or
-, followed by one or more digits), or both. The leading character of a floating-point literal cannot be e or E; it may be any digit or a period (.). For example:

  0., 0.0, .0, 1., 1.0, 1e0, 1.e0, 1.0e0

A Python floating-point value corresponds to a C double and shares its limits of range and precision, typically 53 bits of precision on modern platforms. (Python offers no way to find out the exact range and precision of floating-point values on your platform.)

{mospagebreak title=Complex numbers}

A complex number is made up of two floating-point values, one each for the real and imaginary parts. You can access the parts of a complex object z as read-only attributes z.real and z.imag. You can specify an imaginary literal as a floating-point or decimal literal followed by a j or J:

  0j, 0.j, 0.0j, .0j, 1j, 1.j, 1.0j, 1e0j, 1.e0j, 1.0e0j

The j at the end of the literal indicates the square root of -1, as commonly used in electrical engineering (some other disciplines use i for this purpose, but Python has chosen j). There are no other complex literals. To denote any constant complex number, add or subtract a floating-point (or integer) literal and an imaginary one. For example, to denote the complex number that equals one, use expressions like 1+0j or 1.0+0.0j.

Sequences

A sequence is an ordered container of items, indexed by nonnegative integers. Python provides built-in sequence types known as strings (plain and Unicode), tuples, and lists. Library and extension modules provide other sequence types, and you can write yet others yourself (as discussed in "Sequences" on page 109). You can manipulate sequences in a variety of ways, as discussed in "Sequence Operations" on page 53.

Iterables

A Python concept that generalizes the idea of "sequence" is that of iterables, covered in "The for Statement" on page 64 and "Iterators" on page 65. All sequences are iterable: whenever I say that you can use an iterable, you can, in particular, use a sequence (for example, a list).

Also, when I say that you can use an iterable, I mean, in general, a bounded iterable, which is an iterable that eventually stops yielding items. All sequences are bounded. Iterables, in general, can be unbounded, but if you try to use an unbounded iterable without special precautions, you could easily produce a program that never terminates, or one that exhausts all available memory.

{mospagebreak title=Strings} 

A built-in string object (plain or Unicode) is a sequence of characters used to store and represent text-based information (plain strings are also sometimes used to store and represent arbitrary sequences of binary bytes). Strings in Python are immutable, meaning that when you perform an operation on strings, you always produce a new string object, rather than mutating an existing string. String objects provide many methods, as discussed in detail in "Methods of String Objects" on page 186.

A string literal can be quoted or triple-quoted. A quoted string is a sequence of zero or more characters enclosed in matching quotes, single () or double ("). For example:

  ‘This is a literal string’
  "This is another string"

The two different kinds of quotes function identically; having both allows you to include one kind of quote inside of a string specified with the other kind without needing to escape them with the backslash character ():

  ‘I’m a Python fanatic’      # a quote can be escaped
  "I’m a Python fanatic"       # this way is more readable

All other things being equal, using single quotes to denote string literals is a more common Python style. To have a string literal span multiple physical lines, you can use a backslash as the last character of a line to indicate that the next line is a continuation:

  "A not very long string
  that spans two lines"        # comment not allowed on previous line

To make the string output on two lines, you can embed a newline in the string:

  "A not very long stringn
  that prints on two lines"    # comment not allowed on previous line

A better approach is to use a triple-quoted string, which is enclosed by matching triplets of quote characters (”’ or """):

  """An even bigger
  string that spans
  three lines"""               # comments not allowed on previous lines

In a triple-quoted string literal, line breaks in the literal are preserved as newline characters in the resulting string object.

The only character that cannot be part of a triple-quoted string is an unescaped backslash, while a quoted string cannot contain unescaped backslashes, nor line ends, nor the quote character that encloses it. The backslash character starts an escape sequence, which lets you introduce any character in either kind of string. Python’s string escape sequences are listed in Table 4-1.

Table 4-1. String escape sequences

Sequence Meaning ASCII/ISO code
<newline> End of line is ignored None
\ Backslash 0x5c
Single quote 0x27
" Double quote 0x22
a Bell 0x07
b Backspace 0x08
f Form feed 0x0c
n Newline 0x0a
r Carriage return 0x0d
t Tab 0x09
v Vertical tab 0x0b
DDD Octal value DDD As given
xXX Hexadecimal value XX As given
other Any other character 0x5c + as given

A variant of a string literal is a raw string. The syntax is the same as for quoted or triple-quoted string literals, except that an r or R immediately precedes the leading quote. In raw strings, escape sequences are not interpreted as in Table 4-1, but are literally copied into the string, including backslashes and newline characters. Raw string syntax is handy for strings that include many backslashes, as in regular expressions (see "Pattern-String Syntax" on page 201). A raw string cannot end with an odd number of backslashes; the last one would be taken as escaping the terminating quote.

Unicode string literals have the same syntax as other string literals, with a u or U immediately before the leading quote. Unicode string literals can use u followed by four hex digits to denote Unicode characters and can include the escape sequences listed in Table 4-1. Unicode literals can also include the escape sequence N{name}, where name  is a standard Unicode name, as listed at http:// www.unicode.org/charts/. For example, N{Copyright Sign} indicates a Unicode copyright sign character (©). Raw Unicode string literals start with ur, not ru. Note that raw strings are not a different type from ordinary strings: raw strings are just an alternative syntax for literals of the usual two string types, plain (a.k.a. byte strings) and Unicode.

Multiple string literals of any kind (quoted, triple-quoted, raw, Unicode) can be adjacent, with optional whitespace in between. The compiler concatenates such adjacent string literals into a single string object. If any literal in the concatenation is Unicode, the whole result is Unicode. Writing a long string literal in this way lets you present it readably across multiple physical lines and gives you an opportunity to insert comments about parts of the string. For example:

  marypop = (‘supercalifragilistic’   # Open paren -> logical line continues
            ‘expialidocious’)         # Indentation ignored in continuation

The string assigned to marypop is a single word of 34 characters.

{mospagebreak title=Tuples} 

A tuple is an immutable ordered sequence of items. The items of a tuple are arbitrary objects and may be of different types. To specify a tuple, use a series of expressions (the items of the tuple) separated by commas (,). You may optionally place a redundant comma after the last item. You may group tuple items within parentheses, but the parentheses are necessary only where the commas would otherwise have another meaning (e.g., in function calls), or to denote empty or nested tuples. A tuple with exactly two items is often known as a pair. To create a tuple of one item (often known as a singleton), add a comma to the end of the expression. To denote an empty tuple, use an empty pair of parentheses. Here are some tuples, all enclosed in the optional parentheses:

  (100, 200, 300)          # Tuple with three items
  (3.14,)                  # Tuple with one item
  ()                       # Empty tuple (parentheses NOT optional!)

You can also call the built-in type tuple to create a tuple. For example:

  tuple(‘wow’)

This builds a tuple equal to:

  (‘w’, ‘o’, ‘w’)

tuple() without arguments creates and returns an empty tuple. When x is iterable, tuple(x) returns a tuple whose items are the same as the items in x .

Lists

A list is a mutable ordered sequence of items. The items of a list are arbitrary objects and may be of different types. To specify a list, use a series of expressions (the items of the list) separated by commas (,) and within brackets ([]). You may optionally place a redundant comma after the last item. To denote an empty list, use an empty pair of brackets. Here are some example lists:

  [42, 3.14, 'hello']      # List with three items
  [100]                    # List with one item
  []                       # Empty list

You can also call the built-in type list to create a list. For example:

  list(‘wow’)

This builds a list equal to:

  ['w', 'o', 'w']

list() without arguments creates and returns an empty list. When x  is iterable, list(x) creates and returns a new list whose items are the same as the items in x. You can also build lists with list comprehensions, as discussed in "List comprehensions" on page 67.

Sets

Python 2.4 introduces two built-in set types, set and frozenset, to represent arbitrarily ordered collections of unique items. These types are equivalent to classes Set and ImmutableSet found in standard library module sets, which also exists in Python 2.3. To ensure that your module uses the best available sets, in any release of Python from 2.3 onwards, place the following code at the start of your module:

  try:
   
set
  except NameError:
    from sets import Set as set, ImmutableSet as frozenset

Items in a set may be of different types, but they must be hashable (see hash on page 162). Instances of type set are mutable, and therefore not hashable; instances of type frozenset are immutable and hashable. So you can’t have a set whose items are sets, but you can have a set (or frozenset) whose items are frozensets. Sets and frozensets are not ordered.

To create a set, call the built-in type set with no argument (this means an empty set) or one argument that is iterable (this means a set whose items are the items of the iterable).

Please check back next week for the continuation of this article.

[gp-comments width="770" linklove="off" ]
antalya escort bayan antalya escort bayan