More Perl Bits

In the last article, I talked about the need to work at the bit level in Perl and in other languages. In order to understand bits, we first took a look at how binary numbers are represented in Perl, and then we took a look at two bitwise operators, AND and OR. In looking at the operators, we looked at their most common uses in programming. We also began to look at an algorithm for finding prime numbers, but we stopped upon seeing that there is no easy implementation, at least not without a knowledge of bits.

In this article, we’ll finish learning about bits. We’ll take a look at a few more operators and an important Perl function for working with bits. Finally, we’ll implement the algorithm we looked at. 

You may want to read the previous article if you haven’t (or review it if you have) before you dive into this one.

The bitwise XOR and NOT operators 

Maybe you recall that, from the last article, I said that the bitwise OR operator would set a resulting bit to 1 if one or both of the corresponding bits in the operands were 1. The key thing to notice here are that if both of the bits are 1, then the result is 1. The OR operator may be more appropriately termed an inclusive OR.

However, there is another operator worth mentioning, the exclusive OR, more commonly known as XOR. The bitwise XOR operator is represented by a carat symbol (^). The bitwise XOR operator will set a resulting bit to 1 if and only if one of the operands features a 1 as the corresponding bit. 

For example, say that we have these two operands:

 

1010

0000

 

If we apply the bitwise XOR, then we get this result:

 

1010

 

However, say we have these two operands:

 

1010

1111

 

If we apply the bitwise XOR to these operands, we get this result:

 

0000

 

That’s because, for the first and third bits, both operands feature a 1. The bitwise XOR operator requires that only one of the operands have a 1. 

Remember how the bitwise AND operator can be used with a bit mask to check the value of a particular bit or set of bits? The bitwise XOR operator can also be used with a bit mask, but for a different purpose. Using the bitwise XOR operator and the appropriate bit mask, it is possible to toggle a specific bit or set of bits. For example, consider this number:

 

1100

 

If we want to toggle the second bit, we could use the XOR operator with the appropriate bit mask (0100) to get this result:

 

1000

 

Or, if we wanted to toggle the third bit, we could use the XOR operator with a different bit mask (0010):

 

1110

 

Again, this is interesting and very useful behavior. 

In order to see how this looks in Perl, let’s recreate what we just did, but in Perl this time. These Perl lines apply the operations we just looked at, and then print the results:

 

my $bits = 0b1100;

 

printf "%bn", $bits ^ 0b0100;

printf "%bn", $bits ^ 0b0010;

 

Finally, there is the bitwise NOT operator (also called the bitwise negation operator, among other names), represented in Perl by a tilde symbol (~). If we oversimplify it, it’s very easy to understand. The bitwise NOT operator, at an oversimplified level, “flips” each bit in an operand (it only takes one operand), turning each 1 into a 0 and each 0 into a 1. 

In reality, however, the NOT operator is a bit more complex. The NOT operator returns a result based on your machine’s integer size. So, on my machine, if I apply the operator on this operand:

 

0111

 

Then I get this result:

 

1111111111111111111111111111111111111111111111111111111111111000

 

Looking at the operand, you would expect the bitwise NOT operator to return 1000, and it does—but the result comes after a string of 1s. On my 64-bit machine, integers are 64 bits, so the four bits we want come after a string of sixty 1s. 

Thankfully, it’s easy to fix this using something you already know. In the above case, we want to extract the last four bits. Does this sort of thing sound familiar? We can use the bitwise AND operator and a bit mask to trim off all of the extra bits:

 

my $bits = ~0b1111;

$bits = $bits & 0b1111;

 

printf "%bn", $bits;

 

As you can see, it’s not very difficult to get the behavior you want. 

{mospagebreak title=Bit shift operators} 

Let’s now take a look at the bit shift operators. A bit shift operator simply shifts bits around, as its name implies. Perl offers the left bit shift operator, represented by two leftward-pointing angle brackets (<<), and the right bit shift operator, represented by two rightward-pointing angle brackets (>>). 

Suppose I have a binary number that looks like this:

 

11110000

 

If I shift everything to the right four bits, I get this:

 

1111

 

Here’s what this looks like in Perl:

 

printf "%bn", 0b11110000 >> 4;

 

Then, if I shift everything back to the left four bits, I get this:

 

11110000

 

As you can see, zeros are brought in to fill the new space. 

Here’s the operation in Perl:

 

printf "%bn", 0b1111 << 4;

 

If, instead, I had shifted things to the right four more bits, I would get 0. The bits would simply “fall off the edge.” Once again, this is what this would look like in Perl:

 

printf "%bn", 0b1111 >> 4;

 

{mospagebreak title=Bit vectors} 

Up until now, we’ve just been storing plain binary numbers as scalars and then manipulating them with various bit operators. This is fine for many operations, but there are some serious disadvantages to it. First, we’re limited to a certain number of bits (32 bits of 64 bits, depending on your machine). If we go over this, we’ll get an error. Second, they are a bit difficult to work with. Extracting the value of a particular bit or set of bits requires some ugly code. 

Now is actually a good time to return to the Sieve of Eratosthenes algorithm that we examined at the beginning of the previous article. Recall that in order to implement the algorithm, we needed a way to eliminate non-prime numbers from a list. We looked at two ways to do this, but both were horribly inefficient. The second method, however, was heading in the right direction. 

In the second method, we had a large array whose indexes were the actual numbers (we would start at 2, discarding 0 and 1, since the latter two are not prime). If the value at a particular index was 1, then we knew the number represented by the index was prime. Otherwise, it was not prime. 

This is inefficient because the numbers 0 and 1 are still going to be treated as integers, and that requires quite a bit of memory. However, you know that only one bit is required to represent either value. So, why not just use one bit? Well, because we can’t just create a normal array of “bits.” We can, however, create a bit vector. 

A bit vector is simply a long sequence of bits which may be divided up into individual units of data, or elements. These elements are sets of bits that may have varying sizes, depending on what exactly we want to store. In our situation, we only need to store one of two values, so each unit only needs to have one bit. 

So, if we’re storing prime numbers from 0 to 10, we’d have a bit vector that looks like this internally:

 

00110101000

 

This indicates that the numbers 2, 3, 5 and 7 are prime, and that the numbers 0, 1, 4, 6, 8, 9 and 10 are not prime. 

{mospagebreak title=The vec function} 

So, how do we work with bit vectors? Fortunately, Perl provides an easy way to do this using the vec function. The vec function, in its most basic usage, returns the bits of a particular element in the bit vector. The function takes three arguments. The first is a string, which is the actual bit vector. You see, using a string as a bit vector allows us to store an arbitrary number of bits, unlike before when we used integers to store things.

The second argument is the index of the element we want to retrieve, and the third argument is the width, in bits, of each element. Note that the width must be a power of two, so you’ll need to waste some bits if your data doesn’t fit nicely into a proper width. 

For example, consider the bit vector we looked at earlier, that represents the numbers 0 through 10 and indicates whether or not each number is prime:

 

00110101000

 

(Actually, the bit vector wouldn’t quite look this this! But don’t worry—that’s another topic, and we can get by without discussing it). 

If we want to check if the number 2 is prime, we would do so like this:

 

if (vec($numbers, 2, 1)) {

 print "2 is prime.n";

}

 

Above, we retrieve the second element (third in the sequence, since we start counting at zero, just like with arrays) of one bit in width. Since it is 1, we know that 2 is prime. 

The vec function can also be used to assign a value to a particular bit or bits. For example, say we wanted to explicitly mark the number 2 as prime. We would do so like this:

 

vec($numbers, 2, 1) = 0b1;

 

Again, note that you don’t have to work with one-bit elements. It’s just that this width is relevant to our example. You could just as easily work with two-bit elements:

 

vec($numbers, 0, 2) = 0b10;

 

Just make sure that the width is a power of two, and you’ll be fine. 

{mospagebreak title=Finding prime numbers} 

We now have everything we need to implement the algorithm, so let’s get to it. The first thing we need to do is create a bit vector that contains a bit for all of the values up to a maximum. Let’s find the first one million prime numbers:

 

use constant MAX => 1_000_000;

 

In order to create the bit vector, we can use the vec function and an empty string. The vec function, if you pass it an index outside of the string, will simply fill all of the missing elements with zeros. So, we can easily create a vector with one million elements like this:

 

my $sieve = '';

vec($sieve, MAX, 1) = 0;

 

However, all of the elements are zero, which means that all of them are initially marked as not prime. We want all of them to start as prime, and then eliminate non primes from this list. So, let’s flip all of the bits:

 

$sieve = ~$sieve;

 

(We could assume that 0 means prime and save ourselves an operation, but that’s probably a bit less intuitive). 

Now we have a bit vector to use, and we can begin to apply the algorithm. The algorithm is simple. We just need to find the first unexamined prime number (2 to start with) and then eliminate its multiples. We then repeat this until we have reached the square root of the maximum number (which would be 1,000). Here’s the code:

 

for (my $i = 2; $i <= sqrt(MAX); $i++) {

 next unless (vec($sieve, $i, 1));

 for (my $x = $i * 2; $x <= MAX; $x += $i) {

 vec($sieve, $x, 1) = 0;

 }

}

 

As you can see, we loop over the numbers 2 through 1,000. If the number is known to be not prime (that is, if the bit value is 0), then we move on to the next number. Otherwise, we mark its multiples as not prime by setting their corresponding bits to 0. 

That’s it! We’re done! We can now print the results:

 

for (2..MAX) {

 print "$_n" if (vec $sieve, $_, 1);

}

 

Here’s the full code:

 

#!/usr/bin/perl

 

use strict;

use warnings;

 

use constant MAX => 1_000_000;

 

my $sieve = '';

vec($sieve, MAX, 1) = 0;

$sieve = ~$sieve;

 

for (my $i = 2; $i <= sqrt(MAX); $i++) {

 next unless (vec($sieve, $i, 1));

 for (my $x = $i * 2; $x <= MAX; $x += $i) {

 vec($sieve, $x, 1) = 0;

 }

}

 

for (2..MAX) {

 print "$_n" if (vec $sieve, $_, 1);

}

 

There is, of course, a lot more to using bits that cannot possibly be explained in the space provided by two articles. Even the things I have explained here can be elaborated on significantly, and some of the things I’ve left out are actually pretty important. But even so, you should now have a basic knowledge of bits, along with, hopefully, an appreciation of them. Good luck!

[gp-comments width="770" linklove="off" ]

chat