The data scientist - a role dissected

Computing research investigates whether a data scientist is really any different from a data analyst, data engineer or BI specialist

The data scientist has been the object of fascination, awe and even envy ever since the big data boom began five years ago or so. Often compared with yetis and unicorns on account of their rarity, they are nevertheless the must-have accessory for any organisation that wants to be taken seriously as data-centric and forward-looking, marching boldly into the age of digital.

This is because the ability to make sense of the huge quantities of data accumulated by organisations and flowing in from the outside has become a key commercial differentiator.

"The really good data scientists have three main things: they have stats, they have computer science and they have business acumen," one interviewee told us during our annual big data research programme. "It's very hard to find those people but they do exist. They're usually working at big companies like Netflix or Amazon or Google."

Another respondent mentioned that those with the required skills are also hoovered up by the banks as quantitative financial analysts.

Because of their rarity, the data scientist role has been subject to "definition creep". As an illustration, when Computing asked organisations whether or not they employ data scientists, the most popular answer was: "Yes, but we call them data analysts".

This raises an obvious question: is a data scientist really any different from a data analyst - or indeed a data engineer, a business analyst, or any number of similar roles?

There is certainly some overlap between these and similar positions, and for smaller and less data-centric firms they may be roughly equivalent: a data analyst is a BI specialist is a data scientist. All can be extremely valuable in teasing useful information and insight from raw data.

For large data-centric firms, though, there are distinct differences. Apart from commanding a much higher salary, a data scientist will generally be qualified to Masters or PhD level and will have more advanced statistical and modelling skills, as well as domain expertise in their field. And while data analysts and BI professionals are focused more on historical data, data scientists are likely to be combining this with real-time streaming and external data to build a picture of what is likely to happen by building machine-learning algorithms. This is where they bring real value.

The data scientist - a role dissected

Computing research investigates whether a data scientist is really any different from a data analyst, data engineer or BI specialist

Bill Franks, chief analytics officer at Teradata, said the differences between a data scientist and a business analyst are about depth of focus.

"A business analyst can code a little SQL, use some visualisation, use a classic BI tool, but it's very simple queries and standard report focused, and that's different to what we're talking about with data scientists, analysts and statisticians - people that can get really deep into the code," he told Computing. "It's the deeper skillset as opposed to the general one that we're talking about."

As with the related roles mentioned above, data scientists spend some time cleaning and analysing data, but unlike a data analyst or statistician they also create proprietary algorithms and predictive models to develop tools and products that address business needs. Machine learning frequently plays a major part in this aspect of the job.

A data scientist will also approach data in a more exploratory fashion, experimenting to uncover correlations and trends for further analysis, or to come up with interesting new findings that no-one has thought of before. Indeed, aside from the technical, mathematical and analytical skills, one of the key attributes that employers are looking for is an innate curiosity, a willingness to pursue the truth. However, if they lack experience or their domain knowledge is a little sketchy, they must be steered in the right direction by management, otherwise there is a risk that their supposed "Eureka moment" may turn out to be nothing of the sort.

"A data scientist may not recognise the significance of the results. He may see a big spike and go, 'Oh, we've found something fantastic here'. But when he shows it to a structural engineer he'll say, 'Any old idiot knows that'. It's got to be a combination of the domain expert and the data people working together..." said a chief architect in the construction industry.

The right environment

The balance between experimentation and direction is a crucial one, and the subject of some debate. One data scientist said recently that in his opinion companies need to provide more direction.

"It's not a playground. It is not academic," Gianmario Spacagna of Barclays said. "The company wants to make money and you have to solve a problem".

He added that many data scientists find it hard to turn their initial explorations into relevant projects and applications. "This is where data scientists struggle a lot. Companies will stop hiring data scientists, I promise you, when they realise that the majority of them do not bring value."

Meanwhile Nick Clarke, head of analytics at software and consultancy firm Tessella, told Computing that part of the problem is structural.

"Companies need to adopt new evolved structures which reflect a culture where data scientists are in direct contact with the business functions, the IT department, and the section of the business with which they are tasked with providing insights," he said.

Asked about the optimal environment for data scientists to thrive, the most frequently chosen answers all related to the relationship between the data scientist and business decision makers. As Clarke suggested, data scientists need an organisational structure that allows them to bridge IT and business departments; they require senior backing, so that they are listened to at the right level; and clear guidance as to the sort of questions that need answering.

The data scientist - a role dissected

Computing research investigates whether a data scientist is really any different from a data analyst, data engineer or BI specialist

Followers and leaders

When it comes to the management of data scientists, there was a notable difference in expectations in those working for those organisations that are data-centric, and which have embraced big data analytics (identified as "leading" during the research process).

Among the organisations who said they were employing data scientists, 46 per cent said data scientists should fully understand their business, while the same proportion expected them to be managed by the business.

However, for "leading" organisations the proportion expecting data scientists to have the business acumen themselves rose to 56 per cent. Interestingly, the proportion saying the role is mainly exploratory also rose..

Because many skills fall under the umbrella of data science it is hard for one individual to cover them all. Rather than a single data scientist role, larger organisations tend deploy a number of people in a "data science team" that will include business analysts, coders, DevOps, modelling specialists, data engineers, statisticians and communicators.

"The data scientist is a in creative team, but you don't take initiatives on your own...You need checkpoints to ensure it is what the business wants, but within the checkpoints you have no constraints: just do whatever you want otherwise you don't have data science..." said one data scientist, encapsulating the sometimes conflicting needs for teamwork, direction and the space to experiment.

Computing's exclusive Big Data Review 2016, which summarises the results of an extensive research programme among more than 400 UK organisations, is free to download.