In the last article, I talked about the need to work at the bit level in Perl and in other languages. In order to understand bits, we first took a look at how binary numbers are represented in Perl, and then we took a look at two bitwise operators, AND and OR. In looking at the operators, we looked at their most common uses in programming. We also began to look at an algorithm for finding prime numbers, but we stopped upon seeing that there is no easy implementation, at least not without a knowledge of bits.

Up until now, we've just been storing plain binary numbers as scalars and then manipulating them with various bit operators. This is fine for many operations, but there are some serious disadvantages to it. First, we're limited to a certain number of bits (32 bits of 64 bits, depending on your machine). If we go over this, we'll get an error. Second, they are a bit difficult to work with. Extracting the value of a particular bit or set of bits requires some ugly code.

Now is actually a good time to return to the Sieve of Eratosthenes algorithm that we examined at the beginning of the previous article. Recall that in order to implement the algorithm, we needed a way to eliminate non-prime numbers from a list. We looked at two ways to do this, but both were horribly inefficient. The second method, however, was heading in the right direction.

In the second method, we had a large array whose indexes were the actual numbers (we would start at 2, discarding 0 and 1, since the latter two are not prime). If the value at a particular index was 1, then we knew the number represented by the index was prime. Otherwise, it was not prime.

This is inefficient because the numbers 0 and 1 are still going to be treated as integers, and that requires quite a bit of memory. However, you know that only one bit is required to represent either value. So, why not just use one bit? Well, because we can't just create a normal array of “bits.” We can, however, create a bit vector.

A bit vector is simply a long sequence of bits which may be divided up into individual units of data, or elements. These elements are sets of bits that may have varying sizes, depending on what exactly we want to store. In our situation, we only need to store one of two values, so each unit only needs to have one bit.

So, if we're storing prime numbers from 0 to 10, we'd have a bit vector that looks like this internally:

00110101000

This indicates that the numbers 2, 3, 5 and 7 are prime, and that the numbers 0, 1, 4, 6, 8, 9 and 10 are not prime.