VMware and the Apache Hadoop community will be working together to create extensions for the Hadoop Distributed File System (HDFS) and Hadoop MapReduce projects. These extensions will make the projects “virtualization-aware” so that they’ll scale more elastically and deliver a better performance in virtual environments.
Open source solutions for so-called “big data processing” have been gaining momentum among enterprises, according to Forbes. Reporting the findings of a survey of 651 companies, Forbes noted that 64 percent plan to use a private cloud option, and that the majority of those companies are opting for open source private cloud solutions. Among these, 29 percent plan to use a combination of open source and VMware options, while another 30 percent plan to use VMware-only based private cloud options.
VMware is hardly the only company packaging Apache Hadoop distributions, of course. Other commercial vendors distributing this open source big data solution include Cloudera, Ibm, EMC Greenplum, MapR, and Hortonworks. But VMware has been really pushing its way into this territory lately.
For example, VMware partnered with Hortonworks earlier this week on a High Availability architecture for version one of Hortonworks Data Platform, an Apache Hadoop distribution. This is Hortonworks’s first product since spinning off from Yahoo about a year ago. Additionally, VMware bought Cetas last month. Cetas is a startup company specializing in virtualization scaling – definitely a skill set that will help VMware as it looks to compete with the likes of Amazon’s cloud services.
Indeed, Dave Rosenberg, writing for CNET, notes that all of these moves fit well into VMware’s overall software strategy. On Serengeti, he says that “it sounds a lot like how Amazon Web Services (AWS) Elastic Map Reduce (EMR) works — where the underlying infrastructure scales across virtual instances — in this case in enterprise-oriented packaging.”
Since Serengeti is an open source project, it will have the effect of creating more potential vendors/distributors rather than closing up the market. Rosenberg notes that “The release of Serengeti could also usher in a new world of hosted Hadoop providers — assuming the open-source angle takes hold and the software supports other hypervisors beyond VMware.”