Cloud isn't as efficient as we'd all like to think
Cloud hyperscalers and the wider datacentre industry aren't being transparent about the extent of idle resource.
As businesses seek to enhance their ESG credentials, they're increasingly asking questions of their cloud service providers about the sustainability of their services. At first glance, those CSPs seem very keen to broadcast their sustainability credentials, particularly their efficiency.
Amazon, Microsoft and Google all acknowledge within their ESG reporting the vast quantities of electricity they consume, but also state that economies of scale mean that their datacentres consume energy more efficiently than any standard small-scale datacentre ever could. Whilst this is true to a certain extent, the bigger picture is much more complicated.
Understanding why begins with an examination of exactly what constitutes efficiency.
PUE is not a useful comparative metric
Cloud hyperscalers like to encourage readers of their ESG reports to consider datacentre Power Usage Effectiveness (PUE) as a measure of their efficiency because judged on this metric, they are more efficient than the average enterprise datacentre. PUE is given as a ratio, with the energy consumed by a datacentre as a whole divided by the amount of energy required to run the IT equipment. The closer to a PUE of 1.0 you are, the more efficient your datacentre.
That's the theory.
Most hyperscalers aim for a PUE of 1.1 - 1.2 whereas most enterprise datacentres come in at around 1.9 to 2, which pushes up the global average to 1.58.
When Computing first began to research the sustainability of cloud vendors in late 2021, we used PUE as a comparative metric in comparison with the global average, mainly because it was one of the few comparative metrics available. In this years' updated research, which will be published on Computing over the next few weeks, we do still use PUE as a metric because it remains one of the few available, but we have reduced its weighting as a comparative metric.
Why?
PUE measures the amount of energy required to balance a unit of compute with the cooling required to produce it. What it really measures is how efficient a datacentre is at cooling. That's still worth measuring, hence its retention as a metric, but it was really designed to measure how efficient a datacentre was against itself at various points in time, not how efficient datacentres were in comparison with each other.
What is far more important is the efficiency of the hardware that requires the cooling. If your servers are inefficient you're wasting energy, carbon and probably water. Metrics to measure the efficiency of the hardware in cloud datacentres are not currently provided by any of the big three cloud providers in their ESG reporting.
New doesn't necessarily mean more efficient
If PUE is not a useful measure of datacentre efficiency, what is? More importantly, is anyone actually measuring it?
Richard Kenny is sustainability & research director at Techbuyer.
Since its inception in 2005, TechBuyer has grown into a global organisation boasting partnerships with leading hardware manufacturers, enabling enterprises to both stretch tech budgets and polish their circular economy credentials.
Because of its pivotal role in enabling a more circular economy in datacentre hardware, TechBuyer has won multiple rounds of funding from industry, government and international bodies to conduct research into the environmental impact of ICT, product life extension and circular economy. One such project, funded by Innovate UK and conducted in 2019, in collaboration with the University of East London, set out to investigate whether refurbished hardware could rival the performance of a brand new piece of kit. The results, published by the prestigious IEEE Journal of Sustainable Computing, can be found here.
This research makes fascinating reading for anyone interested in the efficiency of the cloud services that we're all buying into, because it overturns one of the key assumptions that the entire datacentre industry was built on: that the latest chips and hardware will always be more efficient than those which preceded it, better known as Moore's Law.
Kenny's research found that servers that were five or six years old at the time could significantly outperform brand new ones, if they were configured correctly. In case you were wondering, the final part of that sentence is doing some very heavy lifting.
You may also like
/news/4338523/tatas-uk-gigafactory-project-takes-major-step-forward
Components
Tata's UK gigafactory project takes major step forward
Sir Robert McAlpine to build multi-billion-pound factory
/news/4336424/cost-genai-negative-software-companies-gartner
Artificial Intelligence
Cost of GenAI is a negative for software companies, Gartner
'Revenue gains from the sale of GenAI add-ons ... flow back to their AI model provider partner'
/podcasts/4333508/national-grid-analogue-digital-ctrl-alt-lead-podcast
Public Sector
National Grid is turning analogue to digital - Ctrl Alt Lead podcast
'We can't do what we've always done, just more efficiently'
Cloud isn't as efficient as we'd all like to think
Cloud hyperscalers and the wider datacentre industry aren't being transparent about the extent of idle resource.
Measure server configuration for efficiency
This research culminated in the creation of Interact, of which Kenny, in addition to his role at Tech Buyer, is MD.
Among a wider sustainability consulting portfolio, Interact is a machine learning tool which measures the performance of a server estate in terms of the gap between energy consumption and compute output. The tool calculates the configuration necessary to reach the same level of compute with reduced energy consumption - and subsequently lower carbon emissions and cost.
"We've created the world's largest dataset of IT benchmarks. We then feed that into a machine learning tool and all of that data means we're highly accurate on predicting how effective a system is, how energy efficient it is, and how much work it can do on a per-configuration basis."
The benchmarking began with the Innovate project, and Kenny has assembled an impressive team.
"All day every day we benchmark servers in an environmental chamber that was built as part of one of our scientists doctorates. It's the only one in the world and it allows us to identify harmonics, pressure, temperature etc. across any U or rack size. We benchmark in the most advanced facility in the world and we do it all day, every day.
"The single most important thing about servers and data centres is configuration. You can take a server and make it go from doing 10,000 transactions per watt to 20,000 transactions per watt in the same space. You can literally make the exact same hardware twice as efficient by looking at the configuration."
If your hardware is twice as efficient by this measure, you are halving its environmental impact.
"What our tool does is predict very, very accurately any make, model, configuration or generation of server and tell you how good it is, how much work you can do and how effective it is at doing that work," Kenny says.
You can take your current estate, put it into our tool, and we can make recommendations to provide the exact same compute at the same utilisation but much more efficiently.
Based on their work with more than 300 datacentres so far, Interact says it can produce savings of 65-75% of the average data centre's electrical usage, which translates into more than 3.5 million kilos of scope 2 CO2e per datacentre over the course of five years.
Of course, this can also generate some highly significant cost savings, particularly given that sky-high energy costs are going to be with us for some time yet.
The research on which Interact was built also calls into question plenty of received wisdom on cloud efficiency. Kenny makes some eye opening assertions on utilisation.
"The average utilisation is about 25% globally," he says. "I've seen a data centre where utilisation was 4%. Zombie servers in their thousands sat there doing nothing but running updates, using more energy than the most performant ones on the planet to do nothing."
The thought of datacentres full of racks of barely utilised servers throwing out megatons of CO2 into our overheating climate is profoundly troubling. That's before you consider the cost of resources such as scarce minerals consumed in manufacturing hardware that is idle.
If cloud customers, or indeed any company utilising third party datacentres, colocation etc. want to truly understand the nature of their scope 2 and 3 emissions, and the wider environmental impact of their operations, they need to start asking more specific and far-reaching questions of their service providers, instead of relying on marketing heavy ESG reporting.
The cloud and datacentre industry needs to be prepared to answer them.
You may also like
/news/4338523/tatas-uk-gigafactory-project-takes-major-step-forward
Components
Tata's UK gigafactory project takes major step forward
Sir Robert McAlpine to build multi-billion-pound factory
/news/4336424/cost-genai-negative-software-companies-gartner
Artificial Intelligence
Cost of GenAI is a negative for software companies, Gartner
'Revenue gains from the sale of GenAI add-ons ... flow back to their AI model provider partner'
/podcasts/4333508/national-grid-analogue-digital-ctrl-alt-lead-podcast
Public Sector
National Grid is turning analogue to digital - Ctrl Alt Lead podcast
'We can't do what we've always done, just more efficiently'