You can easily visit the Apache Cassandra website to download this latest version of the open source database. You’ll find a video, wiki, instructions for how to get started, and much more. You’ll also be joining a large number of well-known companies that need to handle lots of data, such as Netflix, Twitter, and Reddit. “The largest known Cassandra cluster has over 300 TB of data in over 400 machines,” the Cassandra project page notes.
So what can you do with the latest version of Cassandra? New capabilities include clustering across virtual nodes, inter-node communication, atomic batches, and request tracing. You can check out the full list of changes, but be prepared for lots of small type. It documents everything.
Don’t be fooled by the 1.2 into thinking that this release is a minor change. Andrew Brust at ZDNet notes that this version of Cassandra also offers “better management around disk failures” and a whole variety of performance improvements, “including to memory usage and column indexes.” For instance, those virtual nodes help speed recovery when a physical node fails. If you’re running your system on inexpensive hardware with a tendency to fail, this could save your mission-critical data in a big way.
And what, you may be wondering, is an atomic batch? It is a group of data that cannot be divided. This is important when sections of a database are being updated; if an operation’s “coordinator” node failed during an update, the database might be left in a “partially updated” state. That’s unacceptable for mission-critical data. The atomic batches in Cassandra 1.2 prevent partial updates; either all of the updates succeed, or they all fail. This safeguard brings Cassandra in line with the major relational databases.
It’s worth noting, with cloud computing becoming increasingly important, that Cassandra clusters can be implemented as cloud databases.
Jonathan Ellis, vice president of Apache Cassandra, was clearly pleased to announce the new release. In a statement quoted by eWeek, he observed that “By improving support for dense clusters – powering multiple terabytes per node – as well as simplifying application modeling, and improving data cell storage/design/representation, systems are able to effortlessly scale petabytes of data.”
Cassandra 1.2 is a NoSQL database, and its current improvement signal that such systems are ready to move beyond their original market of start-up companies. As Brust observes, “manageability and database consistency are being addressed in a very studious fashion. Such a focus on reliability and atomic operations indicates a noteworthy maturity in the NoSQL market.”