Has the data scientist always been around?
Is data science really a new role or just a trendy Silicon Valley badge for something that was already there? Sooraj Shah investigates
The most talked about role in IT is the data scientist and at Computing we've asked what skills it takes to be the perfect data scientist, what the role actually entails, whether the lure of big money could attract the wrong type of candidates for the role, and the effect self-service analytics will have on existing data scientists.
But one thing we keep hearing is that data scientists are rare. At Computing's Big Data Summit in 2014, HP's Dan Wood suggested data scientists were like Yetis - not because of their hairstyles but their rarity. More recently Red Hat CIO Lee Congdon told Computing that the right candidates were hard to find because universities and colleges had failed to adapt to meet the needs of the enterprise.
And yet, could the data scientist role, like many other roles and buzzwords in IT, just be a new description for something that already existed?
Stephen Brobst, CTO of Teradata, believes that this is indeed the case in some industries.
"In financial services - either within market research or risk - they typically had those people, but the role is becoming more important now, and therefore [there is a lot of] hype, because the data is more available," he told Computing at Teradata Universe EMEA in Amsterdam.
Bill Franks, chief analytics officer at Teradata, believes that the role of a data scientist is determined by the tools that a big data specialist uses.
"So a data scientist might be using R or Python to code in against Hadoop, and a classic analyst like me might be using SAS and SQL on a large relational component," he said.
However, he believes that there has been a misrepresentation of the role, with the use of different tools having little to do with the task at hand.
"The fact that you're using a different programming language and toolset doesn't have anything to do with the underlying skillsets - it is like saying athleticism is different if I decided to play football or lacrosse - at the end of the day they both require a lot of the same athleticism," he said.
And Franks suggested that this meant there was a bigger pool of talent to draw from than has been reported.
"When I've emailed people in the past about analytics positions, I wanted to know that they could code in some language to do analytics, so if you bring me someone who has been doing SAS for 15 years, and is really good with SAS, then although they would have a learning curve to be taught R, I wouldn't have any problem hiring them because I know they could do it," he said.
"If you know how to speak a language, to know another language is just a matter of translating it, you don't have to learn to speak again, you just have to translate it, and it is the same with programming languages," he said.
Franks added that he would also hire people with knowledge of R for a role in which they need to know about SAS.
Donal Gahan, head of management information at Vodafone Ireland, told Computing that he believes that the data scientist role has been around in the industry for a while, but said that the telecoms company's shift in strategy meant they had to formally hire someone to take up those responsibilities.
"What we're doing now is moving the team more into analytics and making [the data scientist job] a formal role within the team because we want to focus on doing the heavy analytics on the information as opposed to just reporting and BI; there is a lot more value if you have some strong people within the team that understand the data, the systems and that can derive value out of it," he said.
Responding to Red Hat CIO Congdon's view that the education system in the US has been slow to adjust to the rapid increase in demand from business for data science skills, Franks said he believes that many of the potential candidates for roles are just about to graduate.
"There are now dozens of degree programmes, masters and PhDs that are focused on business analytics, so there is a lot of demand but people are starting to graduate and over the next few years these analytics grads will fill those gaps," he said.
And in the meantime, he believes candidates with skills in SAS or other analytics tools can be trained to be data scientists.
"Before, he or she would have been called an analyst, a statistician, or a data miner - a number of things that analysts have been called, and I've been called all of those things," he said.
But Franks conceded that there are differences between a data scientist and a business analyst.
"A business analyst can code a little SQL, use some visualisation, use a classic BI tool, but it's very simple queries and standard report focused, and that's different to what we're talking about with data scientists, analysts and statisticians - people that can get really deep into the code. It's the deeper skillset as opposed to the general one that we're talking about," he said.
Brobst, however, suggested that data scientists need to understand experimental design, statistics and technology - but they don't necessarily have to be a programmer. He said the key personality trait for data scientists is curiosity.
"Two year olds always ask ‘why, why, why' and data scientists are the same - this is the personality trait I want to see in them, those who want to know why generally make good data scientists," he said.