Microsoft may love Linux now, but can Hadoop vendors ever kiss and make up?
IT vendors commonly co-operate to standardise platforms these days, but Cloudera and MapR say a firm 'no' to Hortonworks' ODP collaboration
Necessity creates some strange bedfellows. Who just a few years back could have imagined Microsoft climbing into the sack with Linux? After all, wasn't it Steve Ballmer who in 2001 described Linux as "a cancer"? And didn't a certain Mr Gates once proclaim that there was no room on the planet for two platforms, stating: "World domination fast - it's either us or Linus"?
And yet here we are in 2015 being told that Microsoft loves not only Linux, but Hadoop too (the picture above is of a "Tux" mascot that was being handed out from the Microsoft stand at the recent Hadoop Summit in Brussels).
It's all very confusing.
Then there's the OpenStack Foundation, a collaboration that includes big beasts such as HP, IBM and Dell, who in a previous era would have been roaming the plains and knocking chunks out of each other at every opportunity.
Another example is Cloud Foundry, started by VMware and now led by EMC stablemate Pivotal, which counts Fujitsu, SAP - and HP and IBM again - among its contributors.
Pivotal's senior vice president of strategy and corporate development, Leo Spiegel (pictured), said that the drive for what has inevitably become known as "co-opertition" is the need for a stable base on top of which the collaborating companies can innovate. Companies like Microsoft have no choice but accept the reality of open source software like Linux and Hadoop as the quasi-monopoly it once enjoyed as provider of the default computing platform fades.
"Enterprises are demanding that companies support open source. And it's almost becoming table stakes if you're a software vendor," said Spiegel.
It is also in the industry's interest to avoid the "forking" problems that afflicted technologies such as Unix in the past, when each vendor created its own proprietary version none of which was compatible with the others. These days, instead of trying to make a land grab for the platform, it is often more efficient for companies to co-operate on what they have in common and compete on what sets them apart.
"It's remarkable that companies like EMC, Teradata and HP can all work together. You have seen this big shift over the past couple of years," Spiegel said.
This change of stance is driven by the rise of cloud, he explained: "The thing about the cloud is that if we're not careful we'll end up with the exact same thing as we found in the mainframe. If you want to have a hybrid environment where you have technologies on premise and in the cloud it means that all these companies have to co-operate together."
So that's the co-operation side of co-opertition, but as well competing on top of a common platform, inevitably there is another form of competition too.
"For better or for worse Amazon, which is great technology, is a proprietary stack," Spiegel said.
"We're not ganging up on Amazon but enterprises want the equivalent of open source and they want to be able to go multi-vendor."
Amazon is big enough to stand up for itself, of course, but it does underline the point that for every group of collaborators there are inevitably going to be those left out in the cold.
Open data or closed shop?
One accusation levelled at another of Spiegel's co-opertition projects, the Open Data Platform (ODP), is that it has been deliberately designed to freeze out the direct competitors of another of ODP's founding members, Hortonworks.
ODP creates a "standard" Hadoop kernel, currently Apache Hadoop 2.6, comprising the file system HDFS, the data crunching programs MapReduce and Yarn, and Ambari for provisioning, monitoring and managing Hadoop. ODP members will ensure that all these elements are upgraded at the same time, in a controlled way, which they say will enable software vendors to certify against a single version of core Hadoop, rather than the many versions on the market now.
HDFS, Yarn, MapReduce and Ambari are all Apache Software projects with no proprietary elements, as is the Hortonworks Hadoop distribution. By contrast competitors MapR and Cloudera include a proprietary file system and a bespoke Hadoop management package, respectively, in their distributions.
Pointing out that ODP was originally a Pivotal initiative, not a Hortonworks one, president Herb Cunitz (pictured) said that both rivals had been asked to join the foundation, but had refused. ODP is a reaction to market demands for standardisation, he claimed.
"Every one of the companies in ODP would embrace Cloudera or MapR joining. The reality is the industry is trying to standardise the common kernel. Pivotal wants to compete on top of the kernel not at the kernel level, the same with IBM," he told Computing.
"They want to back an open-source Apache product very specifically because no one vendor controls it. We are a major contributor to it but it is controlled by Apache. IBM and Pivotal are not dependent on us."
But could MapR and Cloudera really be expected to join such a project? After all MapR's own distributed file system is one of the unique selling points of its Hadoop distribution, while Cloudera Manager competes directly with Ambari.
"MapR would have to embrace HDFS," conceded Cunitz. "But it's not the key differentiator just one of the areas where they've differentiated themselves.
"If the industry is starting to say we'd like to standardise the common kernel so [independent software vendors] can certify once deploy anywhere, then MapR and Cloudera should join that," he went on.
But those two distributors see things very differently.
"By the time we were invited to join [ODP], almost all the Platinum-level sponsorships had already been sold," Alex Gutow, product marketing manager at Cloudera, told Computing. "So we wouldn't have been in a position to really influence anything."
The decision to include Ambari also made joining ODP problematic, she said, going on to state that the argument that ODP will solve the problem of the platform forking is a solution looking for a problem.
"It doesn't solve any problems we see. The fundamental Hadoop is already standardised by the Apcahe Foundation. We haven't seen this as an issue with our customers and partners. We have more than 1,400 partners certified on Cloudera and they're not going away. It's fine to have a standard platform but if not everyone is playing then it doesn't solve anything anyway."
She also had views on Pivotal's motives for starting ODP.
"It seems like Pivotal hasn't had much success in the Hadoop market so this is a way to increase their presence there."
Meanwhile, MapR CEO John Schroeder wrote in a blog:
"Around 75 per cent of Hadoop implementations run on MapR and Cloudera. MapR and Cloudera have both chosen not to participate. The ODP without MapR and Cloudera is a bit like one of the Big Three auto-makers pushing for a standards initiative without the involvement of the other two."
Partners' views
Thomas Fion, big data architect at integration software vendor Talend, presumably the type company that ODP is designed to appeal to, said his company would continue to support all Hadoop platforms.
"We plan on just continuing our strategy of being agnostic. ODP was confusing in the beginning but now that Herb Cunitz has clarified the message it made a lot more sense, but I'm still not quite sure where it's going to be applicable," Fion said.
But while it may not solve any pressing problems now, Mike Merritt-Holmes (pictured), CEO of services company Big Data Partnership, said ODP - or something like it - is a necessary step.
"When you have a number of different companies integrating or OEM-ing distributions on the Hadoop ecosystem you need a consistent version so that when updates happen they happen in a consistent manner," he said.
"It's not a game-changer from a use case perspective, but it will help with maintaining a cluster."
Meritt-Holmes does not believe the intention is to freeze out the other Hadoop distributions, but agrees that this could be a consequence.
"That could be a knock-on effect," he said.
So while the rest of the IT world might be "co-opertating" with blithe abandon, there's still not enough room in the Hadoop bed for three, it would seem.