The sustainability dilemma: Cloud guru Sean Varley on the AI boom

From chatbots to product recommendations, AI tools are a natural presence in our everyday lives. But what is fueling AI’s astonishing trajectory, and is it even sustainable?

Sean Varley is Chief Evangelist & Vice President of Business Development at Silicon Valley-based semiconductor firm Ampere Computing. He took the stage at Rakuten Technology Conference 2023 in Tokyo to talk about the cloud computing complexities behind the most cutting-edge AI tech.

Ampere Computing Chief Evangelist Sean Varley joined the Rakuten Technology Conference to make a case for efficient cloud-native coding.
Ampere Computing Chief Evangelist Sean Varley joined the Rakuten Technology Conference to make a case for efficient cloud-native coding.

“For those of us that are in the background building silicon for this industry, this is a very, very important time,” he told the conference. “The time is really now for us to make changes for a more sustainable future in cloud computing.”

We need to talk about data center emissions

Founded in 2017, Ampere builds and supplies processors optimized for cloud computing. The data centers where this cloud computing takes place form the backbone of the internet. As our reliance on the internet has grown, so has its environmental impact.

“By some estimates, four-and-a-half percent of global carbon emissions are actually contributed by data centers,” Varley reported. “This is comparable to other industries that get much more notoriety, like aviation or shipping.”

Data centers are hot. Much of the energy they consume is directed to managing this heat.

Inefficient data center operations are not only costly to the environment, but also to companies’ bottom lines. Varley referenced a data point raised by Rakuten CEO Mickey Mikitani during an earlier session – Japan’s electricity costs have soared in the last year.

Ampere's Sean Varley took the time to answer hard-hitting questions in Tokyo.
Ampere’s Sean Varley took the time to answer hard-hitting questions in Tokyo.

“Worldwide, we’re seeing the cost of electricity go up astronomically, especially in some hard-hit geos like the European Union. So this is becoming a big problem.”

Some regions are even imposing moratoriums on data center construction: “We’re starting to see data center space become a premium commodity that is very tightly controlled.”

The solution that Varley sees is simple: “Our actual proposal to the industry, coming from an Ampere perspective, is to just consume less power. Then it’s easier to draw that heat away.”

Moore’s Law isn’t dead, but Dennard’s scaling may be

Computing has long followed Moore’s Law – a prediction made by engineer and Intel co-founder Gordon Moore in 1965 that computational power would double roughly every two years.

Chip manufacturers have largely fulfilled this prophecy by shrinking transistors to the very limit of physics and boosting the frequency at which they perform calculations – something called Dennard scaling, named after electrical engineer Robert Dennard.

Ampere is a major proponent of scale-out Cloud Native processing using low-power cores based on the ARM architecture – a more mobile, efficient alternative to the more established x86 architecture of processors used in most PCs. Varley argues that while Moore’s Law isn’t dead, the age of frequency scaling may be coming to an end as the industry approaches Heisenberg’s indeterminacy principle – which could place a physical limit on how small manufacturers can shrink transistors.

Varley referenced a famous chart by Hennessy and Patterson, predicting the flattening of Moore's Law as Dennard scaling tapers off.
Varley referenced a famous chart by Hennessy and Patterson, predicting the flattening of Moore’s Law as Dennard scaling tapers off.

“We’re witnessing it happening now. There is still shrinkage in silicon processes, we all know that, but Dennard scaling actually tailed off in the early 2000s.”

Boosting frequency no longer produces more computing power for the same energy cost. “It’s not linear, it’s actually an exponential curve.”

So is Moore’s Law dead? Varley doesn’t think so.

“It’s not dead. And that’s one of the inspirational parts of this,” he told the audience. “Semiconductor technology overall is still shrinking… but you’re better off utilizing that space putting more cores in at a consistent frequency at a consistent power.”

This is something that Ampere technology based on efficient Arm-based ISA can realize with greater efficiency. Today, it’s used in most mobile devices, as well as the likes of Apple in their line of computers.

“It’s different from the legacy x86 ecosystem in that it actually delivers performance through scaling out in cores,” he explained. “The result of it is to really actually build a processor that is twice the performance and half the power of the x86 ecosystem.”

AI needs cloud power – and plenty of it

So where does AI enter the picture?

“AI does not exist in a vacuum. AI exists where there’s data. AI exists where there are algorithms to compute it,” Varley explained. “It is part of an ecosystem. This ecosystem involves data centers, computing and efficiency.”

AI requires enormous computational power – not only to train new models (some of which run into the trillions of parameters), but to operate those models, or make what AI engineers call inferences – a word that describes each individual decision an AI model makes.

“AI has models. They’re trained first and then you use them to do inference,” Varley explained. “Just think about how many websites you go to that have chatbots, or your favorite e-commerce app that is making recommendations based on your shopping preferences – that’s AI at work.”

Whether it’s large language models like ChatGPT or product recommendation engines – somewhere, a server is churning out inferences to produce results. As the AI boom continues its formidable trajectory, Varley stressed the importance of designing for sustainability from the get-go.

For optimal efficiency, Varley stressed that engineers must design their applications to be cloud-native.
For optimal efficiency, Varley stressed that engineers must design their applications to be cloud-native.

AI is one use case that is particularly well-suited to multi-core computing. AI primarily works by translating everything into numerical vectors – a multitude of cores allows for more of these vectors to be processed simultaneously.

“Old applications are single-threaded. They are not well scalable with cores. In other words, they may be single-threaded-constrained, so they don’t grow and move elastically with changing demands. This is inherently less portable, it’s less efficient, and it’s less elastic.”

To take advantage of Ampere’s Arm-based efficiency and build applications that can handle a highly variable load, engineers must code their applications to be cloud-native.

“There are brand new startups, many of them coming online to actually be able to distribute the AI computational tasks across a large set of resources, whether those resources are general-purpose or specialized. And the awareness in code for where they’re provisioning something is going to become a key aspect of sustainability and bringing down costs,” Varley explained. “This is an entire frontier that is essentially in its infancy as far as power efficiency is concerned.”

Efficient AI through domain-specific architecture

“AI is a major consumer of compute cycles, whether those compute cycles are on a general-purpose processor or an accelerator. All of that consumes power.”

Varley’s company primarily produces general-purpose chips, which are well suited not only to handling large volumes of AI inferences but can also be adapted to practically any task that requires computing power.

But for optimal efficiency, many data centers incorporate accelerators such as GPUs (graphics processing units). This practice is called domain-specific architecture, or heterogeneous computing.

“Utilizing accelerators is a good way to do that, and especially accelerators that are specialized to models and different types of computing needs.”

At its very core, reducing AI’s environmental and financial burden is all about the cost per inference.

Ampere’s AI Platform Alliance on a sustainability mission

“You’re seeing here a really disruptive change in the way that you can actually do AI inference processing. The whole objective of all of this is to bring down the cost per inference, or decision made, when you pass a token to a model,” Varley explained. “We’ve got to bring that cost down by orders of magnitude – to make it more sustainable, to make it more cost effective – because those two things go together.”

Varley and his team at Ampere recently launched an alliance to get the ball rolling on domain-specific computing: “We call it the AI Platform Alliance,” he revealed. “It’s really about AI efficiency and AI sustainability, and the combination of best-in-class accelerators with best-in-class power efficient, general-purpose processing.”

With demand for AI tech unlikely to subside anytime soon, Varley’s sustainability mission could be crucial for the future of the industry.

“The demand is actually growing astronomically, especially right now with the AI hype cycle that’s been going on in the industry,” he told the conference. “The demand for compute is inexorable. It’s always just continuing to build. And why? Because we are very innovative creatures. We keep coming up with new things to do, and AI is now in that wave of building more of what we can actually do.”

As the AI revolution unfolds, Varley and his alliance could be the ones ensuring that sustainability doesn’t fall by the wayside.

“I think that being able to allocate the right compute resources to solve those types of problems and get those sorts of insights is the next infrastructural software problem to solve.”

Cloud computing – the driving force behind the latest AI tech – was a hot topic at Rakuten Technology Conference 2023.
Cloud computing – the driving force behind the latest AI tech – was a hot topic at Rakuten Technology Conference 2023.
Tags
Show More
Back to top button