Home arrow Python arrow Page 2 - The Python Language

Character Sets - Python

If you're already an experienced programmer and you're interested in adding Python to your list of languages, this nine-part series gives you a good start. It is excerpted from chapter four of the book Python in a Nutshell, Second Edition, written by Alex Martelli (O'Reilly; ISBN: 0596100469). Copyright 2007 O'Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O'Reilly Media.

  1. The Python Language
  2. Character Sets
  3. Keywords
  4. Statements
By: O'Reilly Media
Rating: starstarstarstarstar / 1
September 11, 2008

print this article



Normally, a Python source file must be entirely made up of characters from the ASCII set (character codes between 0 and 127). However, you may choose to tell Python that in a certain source file you are using a character set that is a superset of ASCII. In this case, Python allows that specific source file to contain characters outside the ASCII set, but only in comments and string literals. To accomplish this, start your source file with a comment whose form must be as rigid as the following:

  # -*- coding: utf-8 -*-

Between the coding: and the -*-, write the name of a codec known to Python, such as utf-8 or
iso-8859-1. Note that this coding directive comment is taken as such only if it is at the start of a source file (possibly after the "shebang line" covered in "Running Python Programs" on page 28), and that the only effect of a coding directive is to let you use non-ASCII characters in string literals and comments.


Python breaks each logical line into a sequence of elementary lexical components known as tokens. Each token corresponds to a substring of the logical line. The normal token types are identifiers, keywords, operators, delimiters, and literals, as covered in the following sections. You may freely use whitespace between tokens to separate them. Some whitespace separation is necessary between logically adjacent identifiers or keywords; otherwise, Python would parse them as a single, longer identifier. For example, printx is a single identifier; to write the keyword print followed by the identifier x, you need to insert some whitespace (e.g., print x).


An identifier is a name used to identify a variable, function, class, module, or other object. An identifier starts with a letter (A to Z or a to z) or an underscore (_) followed by zero or more letters, underscores, and digits (0 to 9). Case is significant in Python: lowercase and uppercase letters are distinct. Python does not allow punctuation characters such as @, $, and % within identifiers.

Normal Python style is to start class names with an uppercase letter and all other identifiers with a lowercase letter. Starting an identifier with a single leading underscore indicates by convention that the identifier is meant to be private. Starting an identifier with two leading underscores indicates a strongly private identifier; if the identifier also ends with two trailing underscores, the identifier is a language-defined special name. The identifier _ (a single underscore) is special in interactive interpreter sessions: the interpreter binds _ to the result of the last expression statement it has evaluated interactively, if any.

>>> More Python Articles          >>> More By O'Reilly Media

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort


- Python Big Data Company Gets DARPA Funding
- Python 32 Now Available
- Final Alpha for Python 3.2 is Released
- Python 3.1: String Formatting
- Python 3.1: Strings and Quotes
- Python 3.1: Programming Basics and Strings
- Tuples and Other Python Object Types
- The Dictionary Python Object Type
- String and List Python Object Types
- Introducing Python Object Types
- Mobile Programming using PyS60: Advanced UI ...
- Nested Functions in Python
- Python Parameters, Functions and Arguments
- Python Statements and Functions
- Statements and Iterators in Python

Developer Shed Affiliates


Dev Shed Tutorial Topics: