Hadoop – Whose to Choose (Part 1)

Which Hadoop image

By David Teplow


Big Data is the new normal in data centers today – the inevitable result of the fact that so much of what we buy and what we do is now digitally recorded, and so many of the products we use are leaving their own “digital footprint” (known as the “Internet of Things / IoT”). The cornerstone technology of the Big Data era is Hadoop, which is now a common and compelling component of the modern data architecture. The question these days is not so much whether to embrace Hadoop but rather which distribution to choose. The three most popular and viable distributions come from Cloudera, Hortonworks and MapR Technologies. Their respective products are CDH (Cloudera Distribution of Apache Hadoop), HDP (Hortonworks Data Platform) and MapR. This series of posts will look at the differences between CDH, HDP and MapR. This first post will focus on The Companies behind them; the second on their respective Management/Administration Tools; the third will tackle the important differences between their primary SQL-on-Hadoop Offerings; and the fourth and final post will take a look at some recent and relevant Performance Benchmarks.

Part 1 – The Three Contenders: Cloudera, Hortonworks and MapR

The table below shows some key facts and figures related to each company.

Which Hadoop - table 1

Of the three companies, only Hortonworks is traded publicly (as of 12/12/14; NASDAQ: HDP). So valuations, revenues and other financial measures are harder to ascertain for Cloudera and MapR.

Cloudera’s valuation is based on a $740M investment by Intel in March 2014 for an 18% stake in the company. Hortonworks valuation is based on it’s stock price of $27 on 12/31/14, which happens to be equivalent to their valuation in July 2014 when HP invested $50M for just under 5% of the company. When Hortonworks went public in December 2014, they raised $110M; on top of $248M they had raised privately before their Initial Public Offering (IPO), which totals the $358M in the table above. That’s one third of what Cloudera has raised but twice what MapR has. I haven’t found any information that would allow me to determine a valuation for MapR, though Google Capital and others made an $80M investment in MapR (for an undisclosed equity stake) in June 2014.

Cloudera announced that for the twelve months ending 1/31/15 “Preliminary unaudited total revenue surpassed $100 million.” Hortonworks’ $46M revenue is for the year ending 12/31/14. I haven’t seen revenue figures for MapR for 2014 or any recent 12-month period, so the figure above is Wikibon’s estimate of 2013 revenue. My best guess is that their 2014 revenue was in the $40M to $45M range. Price/Earnings or P/E is a common financial measure for comparing companies, but since none of the three companies have yet to earn any profits, I’ve used Price/Sales in the table above. For comparison, Oracle typically trades at around 5.1X Sales; RedHat at around 7.8X Sales. So 41X for Cloudera and 24X for Hortonworks, while not quite off the scale, are exceedingly high.

The last two rows in the table above show how many employees each company has on the Project Management Committee and as additional Committers on the Apache Hadoop project. This shows their level of involvement in and commitment to Hadoop being sustained and enhanced by the open source community. From the first four rows of Table 1, it is clear that Cloudera has the lead in terms of being first to market, raising the most money, having the highest valuation, and selling the most software. Hortonworks, on the other hand, is the leading proponent of Hadoop as a vibrant and innovative open source project. This is true not only for the core Hadoop project and its most essential sub-projects like Hadoop Common, Hadoop MapReduce and Hadoop YARN (the Project Lead for each of these is employed by Hortonworks[1]), but also for most related projects like Ambari, Hive, Pig, etc.

My next post will explore the different Management/Administration Tools offered by Cloudera, Hortonworks and MapR.

[1] Facebook employs the Project Lead for Hadoop HDFS.

Tags: , , , , ,

One Response to “Hadoop – Whose to Choose (Part 1)”

  1. norsontech.com Says:

    Hi there, yes this post is in fact good and I have learned lot of
    things from it regarding hadoop. thanks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: