Robust and high-quality data is the key to unlocking the power of GenAI

Because you can’t do analytics without trust in data

Image:
Databricks Data = AI World Tour London case studies

The Databricks Data + AI World Tour pulled into ExCel London last week, with customer use cases at its heart.

Co-founder and SVP Field Engineering Arsalan Tavakoli led the keynote audience through some technical updates and several customer use cases. It was here that the theme of the day began to reveal itself, in terms of the inextricable relationship between data and AI. Tavakoli said:

“Ten years ago AI wasn’t as cool as it is now so we had to yell the data part and whisper the AI par. Now everybody’s screaming the AI part and we have to remind them about data.

“In order to get any value out of these models you have to move them into production. And 85% of folks (maybe more) have not been able to get these workloads into production. That’s a big problem.”

What’s stopping companies moving their many POCs into production? Much of it boils down to data. Cybersecurity and data privacy concerns are also paramount and it’s a perfectly rational fear, Tavakoli acknowledged that GenAI and LLMs have made attack vectors more sophisticated. Companies must secure their data estate before moving what may well be numerous POCs into production.

Of course, the typically fragmented nature of data estates and infrastructure more broadly, with small pockets of use cases dotted here and there makes management a complex burden.

ASDA data transformation

Dr Simon Jury, VP for Data Analytics at ASDA, explained how the sale of ASDA to a combination of private equity and local entrepreneurs several years ago meant a transformation away from the systems put in place by previous owner Walmart.

“This is a business with about 10,000 stores and 15 million customers a week. The 100,000 data feeds that we’re used to had to be replaced with another 100,000 data feeds so at some level that’s a ten to the power of ten combinations that have to be worked out. At the same time, we want the BI layer to be as smooth as possible so that our commercial colleagues almost don’t notice that the change is happening. “

Image
Description
Asda is focusing on data fundamentals Image: Shutterstock

The transformation is still ongoing, but Nia Chandler, Senior Director Data Visualisation and KPIs reflected on some of her learnings.

“I can see an enterprise data solution now and that’s really exciting but once we get through the transformation it’s all about having the right processes in place to maximise the Unity Catalog and people with the right skills to unlock the power that we’ve not got.”

And it could be quite some power. Simon Jury wants to transform ASDA into a leader in data driven retailing. What we put in our shopping baskets reveals a lot about us and our lives.

“Having all that data in one place, clean and connected to pricing data sets and supply chain data sets unlocks a world of opportunity for optimising customer experiences.”

But for now?

“Things like Genie rooms open the possibility for English language querying of complex datasets and the data team are no longer the middleman. Non coders can have their own chain of thought analysis with the data.”

“We’re also using a piece of GenAI to understand how users are interacting with the training manuals on new systems. We’re also using it to help us sift through customer feedback, extracting sentiment analysis from the tens of thousands pieces of written feedback we receive weekly.

Nonetheless, Jury was reluctant to run away with the possibilities of GenAI.

“I want to focus on the basics of having a robust, solid high-quality dataset that unlocks all of the exciting stuff.”

SEGA and Rolls-Royce

It is not just people like Simon Jury at ASDA who are excited about the prospect of data democratisation – the idea that business users can interrogate datasets themselves for business insight. Another high-profile user of this functionality is SEGA.

Felix Baker, Head of Data Services for SEGA Europe explained:

“Anybody, in the middle of a meeting can find an answer to questions like, how did game X sell in China in the last quarter compared to the same quarter last year? You get an answer immediately and this is driving decision making forward no end.”

SEGA is also using GenAI for player sentiment analysis.

“We validate these findings against the underlying game data in Delta Lake and send the feedback to developers so they can tweak the game with frequent game patches.

“The continuous sentiment monitoring has increased player retention in certain titles by up to 40%”

Another case study in the form of Rolls-Royce proved a useful illustration of the fragmented nature of so many data landscapes, with data and insight locked away in silos.

Lois Clifton, Product Manager, Applied Intelligence elaborates:

“We have everything from legacy mainframe systems, uber modern cloud systems and data lakes that we run our engine health monitoring services on, monolithic enterprise systems and everything else in between. They’re not designed to be interoperable. Businesses build their own solutions, bringing data in and storing it in isolation. That leads to lots of version of the truth across the organisation, varying data quality and ultimately mistrust in data.”

The Databricks Lakehouse architecture has helped to move Rolls-Royce into a place where all the structured and unstructured data, some of which was decades old is on a single platform where data analytics and AI workloads can be run with confidence.

So far use cases have been centered on engineering, and the company is now in a position to accelerate this work in order that expensive scientific and computer vision software is no longer required, reducing development costs and speeding up the process.

When asked for her advice on this kind of transformation Clifton returned to the prevailing theme of the day.

“You've got to remember that all this emerging technology and AI is built on a data foundation. It’s really difficult to scale if you work on a project-by-project basis. Treating data and the underlying platforms as building blocks of an innovation ecosystem where we can share these assets across an organisation and build superior value propositions off it is the only way forward.

“The organisational structure and the governance that wraps around that is massive and its imperative to make that work.”