Data Structures in Python: Lists and Tuples

Different languages use different kinds of data structures to handle the way data is arranged in memory. This article looks at two of the more common data structures used by Python.

The arrangement of data within memory forms the basis of computing. The arrangement of data is known as data structure. Every language provides a variety of built-in data structures. Languages such as C provide basic data structure mechanisms, using which complex structures can be built, whereas languages such as Java use a full-blown object-oriented approach for almost all types of data structures.

Then there are languages like Python, which takes a middle approach. In this article I will be discussing the middle path used by Python, examining two of its built-in data structures: lists and tuples. The first section will focus on lists, while the second section will focus on tuples. In the final section of this article I will develop a real world application which makes use of lists and/or tuples. That’s the agenda for this discussion.

More about Lists and Tuples

Lists and tuples are two of Python’s basic built-in data structures. As with all other features of Python, these in-built data structures provide both flexibility and power. The answer to how flexible and powerful they are is what I am going to discuss now. First let’s look at the list data structure.

{mospagebreak title=List}

By definition a list is "An instance of an abstract data type (ADT), formalizing the concept of an ordered collection of entities." In other words, a list is a collection of objects. In contrast with an array, a list contains different objects. That’s how Python also views lists. Data types (that’s what Python calls built-in data structures too) such as list are known as compound data types in Python. The operations that can be performed on a list are:

  1. Creation
  2. Addition of elements
  3. Accessing and searching
  4. Deletion

Many of the library functions essentially create lists. These functions may be working on lists themselves during such tasks as accessing and searching. Here are the details:

Creation: A list is defined as a list of comma-separated values between square brackets thus:

a = ['spam', 'eggs', 100, 1234]

where the items of the List a are of different data types. This is the most common method of creating a list. The other technique is to use the range() function. The range() function provides a list of consecutive integers. The range() function takes two arguments. The list returned contains all the integers from the first to the second, including the first but not including the second. For example,

>>range(1, 5)

Gives

[1,2,3,4]

Since a list itself is a data type, a list can contain another list. For example,

[1,"a",[2,3]]

is a perfectly valid list.

Addition of elements: There are three ways to add elements to an existing List using member functions of the list. These are:

  1. Append, which adds a single element to the end of the list. For example, let li be a List, then

    >>> li
    ['a', 'b', 'mpilgrim', 'z', 'example']
    >>> li.append("new")
    >>> li
    ['a', 'b', 'mpilgrim', 'z', 'example', 'new']
  2. Insert is a method that inserts a single element into the list. The numeric argument is the index of the first element that gets shifted out of position. Also there can be two elements of the same value at two different positions.

    >>> li.insert(2, "new")
    >>> li
    ['a', 'b', 'new', 'mpilgrim', 'z', 'example', 'new']
  3. Extend concatenates the list passed as argument with the existing list.
    For example:

    >>> li.extend(["two", "elements"])
    >>> li
    ['a', 'b', 'new', 'mpilgrim', 'z', 'example', 'new', 'two', 'elements']

Accessing and Searching: For accessing and searching, one of the following techniques can be used:

  1. Slicing: A slice is a sub-list of a list. Slicing is done using the [n : m] which returns the part of the list from the nth character to the m-nth character, including the first but excluding the last. For example, if a_list is a list of letters of the alphabet from a to f, then

    >>> a_list = ['a', 'b', 'c', 'd', 'e', 'f']
    >>> a_list[1:3]
    ['b', 'c']
    >>> a_list[:4]
    ['a', 'b', 'c', 'd']
    >>> a_list[3:]
    ['d', 'e', 'f']
    >>> a_list[:]
    ['a', 'b', 'c', 'd', 'e', 'f']
  2. Indexing: An index of a given element can be found using the index() function of list. It returns the first occurrence of the element supplied as an argument. That means even if the element occurs twice, only the position of the first element would be returned. If the element is not found an exception is raised. For example:

    >>>li=['a', 'b', 'new', 'mpilgrim', 'z', 'example', 'new', 'two', 'elements']
    >>> li
    ['a', 'b', 'new', 'mpilgrim', 'z', 'example', 'new', 'two', 'elements']
    >>> li.index("example")
    5
    >>> li.index("new")
    2
    >>> li.index("c")
    Traceback (innermost last):
    File "<interactive input>", line 1, in ?
    ValueError: list.index(x): x not in list

Apart from these, a list can be accessed as one would access an array by specifying the index thus:

>>>li=['a', 'b', 'new', 'mpilgrim', 'z', 'example', 'new', 'two', 'elements']
>>>li[0]
‘a’

Deletion: To delete an element from a list, one can use del. It removes an element from the specified list. For example:

>>> a = ['one', 'two', 'three']
>>> del a[1]
>>> a
['one', 'three']

del also can have slice thus:
>>> a_list = ['a', 'b', 'c', 'd', 'e', 'f']
>>> del a_list[1:5]
>>> print a_list
['a', 'f']

That covers the operations on lists. The way the operations are implemented shows the flexibility of Python’s middle path approach. The flexibility doesn’t just stop here. Though strings are immutable and lists are mutable, strings and lists can be converted into each other. 

Two of the most useful functions in the string module involve lists of strings: split and join. The former returns a list composed of individual elements of string separated by the delimiter passed as an argument. The latter creates a string out of a supplied list. Take a look at the following examples:

>>> import string
>>> song = "The rain in Spain…"
>>> string.split(song)
['The', 'rain', 'in', 'Spain...']

Here the string is split on the basis of space as a delimiter, which is the default delimiter. The next example shows the reverse of the above functionality, a string from a list:

>>> lst = ['The', 'rain', 'in', 'Spain...']
>>> string.join(lst)
‘The rain in Spain…’

The joining is done on the basis of a space delimiter, which again is the default delimiter. We’re finished with lists for the moment. Now let’s look at the other compound data type, the tuple.

{mospagebreak title=Tuple}

According to its definition, "A tuple is a finite sequence (also known as an "ordered list") of objects, each of a specified type." In other words, a tuple is an ordered collection of objects of different types. In Python, a tuple is a list of comma-separated values but immutable, unlike a list. So the operations on a tuple are restricted to the ones that don’t affect its immutability. Accordingly, the following are the operations that can be performed on a tuple:

  1. Creation
  2. Accessing

Due to a tuple’s immutability, deletion and addition of new elements are not possible.

Creation: Creating or defining a tuple is as simple as giving a list of values within parenthesis. For example

>>> tup = (2, 4, 6, 8, 10)

is a tuple. If a tuple has only one element, it would be defined as follows:

>>> tup = (5,)

Accessing: Since lists and tuples are almost the same, the accessing mechanisms also work in a similar fashion. Elements of a tuple can be accessed in two ways:

  • Index based: The elements of a tuple can be accessed using their indices. The index starts at 0. For example, to access an element at index 0 of a tuple tup, the statement would be:

    >>> tup = (‘a’, ‘b’, ‘c’, ‘d’, ‘e’)
    >>> tup[0]
    ‘a’
  • Slicing: As with a list, slice operator can be used to access elements of a tuple. It selects a range of elements. For example the following statement selects elements between 1 and 3 (excluding 3):

    >>> tup[1:3]
    (‘b’, ‘c’)

That covers operations on a tuple. So the question arises, when do you use a list and when do you use a tuple? A list can be used in almost all cases. However, there are certain contexts where a tuple is more useful. These are:

  • When you need speed. Tuples are faster than lists. If you’re defining a constant set of values and all you’re ever going to do with it is iterate through it, use a tuple instead of a list.
  • When your data will not change. It makes your code safer if you "write-protect" data that does not need to be changed. Using a tuple instead of a list is like having an implied assert statement that this data is constant, and that special thought (and a specific function) is required to override that.

In all other contexts, lists can be used. That brings us to the next section, which is a real world example using a list.

{mospagebreak title=Lists and Tuples in the Real World}

The example application will implement a ring buffer or a bounded buffer using a list. In a bounded buffer, once the capacity is full, then the first element i.e. the oldest element is replaced with newest element. The implementation uses a class switch pattern. Here is the code:

class RingBuffer(object):
 
""" class that implements a not-yet-full buffer """

  def __init__(self, size_max):
   
self.max = size_max
   
self.data = [ ]

  class __Full(object):
   
""" class that implements a full buffer """
   
def append(self, x):
     
""" Append an element overwriting the oldest one. """
     
self.data[self.cur] = x
     
self.cur = (self.cur+1) % self.max

    def tolist(self):
     
""" return list of elements in correct order. """
     
return self.data[self.cur:] + self.data[:self.cur]

    def append(self, x):
      
""" append an element at the end of the buffer. """
     
self.data.append(x)
     
if len(self.data) == self.max:
       
self.cur = 0
       
# Permanently change self’s class from non-full to full
       
self.__class__ = __Full

    def tolist(self):
     
""" Return a list of elements from the oldest to the newest. """
      
return self.data

The class defines an empty list as a buffer. The nested class __Full implements the logic to overwrite the oldest element if the buffer is full. The tolist method returns the list in the correct order. Next the append method of the outer class i.e. RingBuffer appends a new item to the end of the list. It also checks whether the length of the buffer is equal to the maximum size. If it is equal, then the buffer’s state is changed to full. The tolist method returns the buffer. The following is a sample implementation that shows how to use the RingBuffer class:

if __name__ == ‘__main__’:
 
x = RingBuffer(5)
 
x.append(1); x.append(2); x.append(3); x.append(4)
 
print x.__class__, x.tolist( )
 
x.append(5)
 
print x.__class__, x.tolist( )
 
x.append(6)
 
print x.data, x.tolist( )
 
x.append(7); x.append(8); x.append(9); x.append(10)
 
print x.data, x.tolist( )

That brings us to the end of this discussion. List and tuple are the basic data structures found in Python. On the basis of these, more complex structures such as trees can be developed. How to implement them will be a topic for discussion in the future. Till then…

Google+ Comments

Google+ Comments