Will the Hadoop big data skills gap soon be a thing of the past?

Are the actions being taken by Hadoop distributors and customers enough to ensure that we won't be hearing about a dearth of Hadoop talent in 2020?

Despite the future of Hadoop looking rosy for the three main distributors –Hortonworks, Cloudera and MapR – there is a common issue with new technologies in any field, one that all three vendors have to contend with: ensuring that there is a workforce that can actually use those tools.

Of course, it is in the three distributors' interests to look through rose-tinted goggles and shrug off suggestions that their technology is hard to use, or that not enough people are sufficiently skilled to get value from them yet.

Hortonworks president Herb Cunitz told Computing that "skills are not the issue; people can do it and can learn to do it". He referred to Forrester analyst Mike Gualtieri, who in a recent report on Hadoop's prospects for 2015 claimed that "Hadoop is not that hard to understand".

"Digging in to a new open source platform and learning the APIs is nothing new to enterprise Java application developers... the shortage of Hadoop skills will quickly disappear as enterprises turn to their existing application development teams to implement projects such as filling data lakes and developing MapReduce jobs using Java," Gualtieri said in the report.

Computing's own research in 2014 found that the skills gap for Hadoop was one of the biggest in the big data spectrum - with 21 per cent of respondents either considering using the software or already using it, but only eight per cent of organisations having the required skills in-house to fully exploit the software.

But while early adopters of Hadoop may have employees who are now more familiar with the framework, other organisations that are just starting out down the Hadoop route are struggling to find the required skills.

W ith more and more companies deploying Hadoop software, it is inevitable that there will be an arms race for talent. Last year, the number of jobs posted for Apache Hadoop had risen 43 per cent year-on-year, according to Dice.com, while recruiters Hays Information Technology saw a noticeable increase in demand for Hadoop skills within financial services and management consultancy.

According to HR analytics firm Wanted Analytics, there are nearly 200 companies who are on the hunt for employees with Hadoop skills in the UK at the moment.

The 20 companies who are looking to fill the highest number of jobs that require Hadoop skills include the likes of financial heavyweights Barclaycard, Deloitte, EY, PricewaterhouseCoopers and Goldman Sachs. Technology firms such as Oracle, Accenture, Rackspace, Facebook and HP are also among the top 20.

So how hard is it for firms to hire employees with the right skills?

Adam Fletcher, engineering director at games developer and Hortonworks customer MediaTonic, said that it was particularly hard for his company because analytics wasn't something that had been widely used within the gaming industry. He said that there did appear to be a shortage of people with the right skills when the firm attempted to recruit new talent.

Rakesh Rao, lead/advisory software development engineer at Cloudera customer Quaero, agreed that it is hard to find people with the right skills, but added that the talent pool is growing.

"It depends on the specific use case that you're intending to work with, but if they've come from a data warehousing background that's a plus," he said.

Quaero, a data management company managed to build a data management platform on top of Hadoop.

"At the time, two people were enough, but right now we're starting to recruit for developer and Hadoop administrator positions. Initially we could manage with the internal training that myself and Nitin organised, because if you're using Hadoop for interfaces like Hive you don't need someone who is of a higher technical level - anyone with SQL knowledge would be able to use Hive," said Rao.

However, he added that companies looking for "top-notch development skills", such as developing complex MapReduce jobs, may require somebody with extensive knowledge of Hadoop as well as a distributed computing mindset.

Quaero's senior software engineer, Nitin Kak, explained that there are also regional differences when hiring Hadoop experts. He said that in India the quality of candidates isn't really up to the mark, but that in the US, there is a burgeoning talent pool.

"With new technologies like Hadoop, you can't really go for someone who has top-notch experience, but you still have to go with someone who knows how the framework works, and would have at least done a bit of work on that and slowly get them up to speed with internal training," he suggested.

And training is something that all three distributors are keen to offer their customers. Most recently, MapR announced that it was offering free Hadoop training for developers, analysts and administrators.

Scott Russmann, director of software development at MapR customer Solutionary, is a huge fan of the hands-on workshops offered by MapR.

"[The workshops] are absolutely the best, because we get more people through the training. We generally take the class off-site so it is really focused. When you compare that to sending two or three individuals to northern California [to train], it is more cost-effective to bring the trainer to us," he said.

Russmann explained that Solutionary already had good professionals with a data processing mindset, and that the company wanted these same employees to think more about the computing side of things.

"For example, with [analytics search engine] Elasticsearch, we started with source code and started to learn how elastic search was working and interacting with the environment. It was a longer more intensive way of learning but it pays off when you start building it out to scale," he said.

The company is now looking at college graduates as potential recruits, as well as peers that they encounter through the Hadoop User Group. Indeed, Quaero's Kak believes that the amount of college students who are studying subjects that could lead to jobs in Hadoop has grown substantially in the past few years.

But despite an apparent growing talent pool, Russmann admitted that his company has had to offer bigger salaries in order to attract talent, and has also had to go all-out to retain staff.

"Having the experience of those technologies on their CVs has meant that the company has had to up its game to retain them," he said.

Cloudera, MapR, Hortonworks: What's easiest to learn? What's easiest to recruit for?

According to Fletcher, the skills factor played a part in why Mediatonic chose Hortonworks.

"Hortonworks allowed us to use the technology we already had, so rather than starting from scratch we could re-train staff who had experience with big data," he said.

But Quaero's Rao and Solutionary's Russmann believe all three solutions require about the same amount of training to master.

"On a broader level, the three vendors pretty much work in the same way - albeit MapR is slightly different. For someone who has worked using one of the solutions, I don't think it's that hard to move to another one," said Rao.

"I don't think it would influence the reasoning behind going with a particular distribution," he added.

But does one firm have the edge in terms of the amount of experts available?

"I would probably say Cloudera, having had an earlier start, but Hortonworks is catching up," said Rao.

With the three distributors offering training programmes, more students learning relevant skills at university and colleges and the open source community and commercial vendors building better tools to make Hadoop easier to use, the Hadoop skills gap may be plugged faster than previous IT skills shortages.

Forrester's Gualtieri believes that more experts in the field will mean that CIOs should not have to hire high-priced Hadoop consultants to get projects done.

"Hadoop projects will get done faster because the enterprise's very own application developers and operations professionals know the data, the integration points, the applications, and the business challenges. Additional skills for more complicated applications, such as predictive analytics running inside Hadoop, can be built when needed over time," he said.

Readers interested in developments in the big data space should check out Computing's summit on 26 March 2015 on big data and analytics, details can be found here. Qualified end users can attend for free.