GPU Shortages Will Prop Up The Clouds In More Ways Than One
For the last two quarters at least, the generic infrastructure server market – the one running databases, application servers, various web layers, and print and file serving workloads the world over – has been in a recession. Companies are not buying as many servers as they might have otherwise, and on the clouds, IT shops have been looking at their monthly bill with a magnifying glass and a scalpel to cut it down to size.
And just as is the case with the server and storage markets at large, we think were it not for AI training and inference workloads, the public clouds would be in recession, too. Luckily for server makers and cloud builders alike, AI workloads – particularly those driven by large language models and recommendation engines – need incredible amounts of GPU compute and hefty CPU hosts to run both training and inference. And this, we think, is what is propping up cloud revenues just like it has been propping up server and storage revenues in the corporate datacenters of the world.
How could it be otherwise? The cloud is a reflection of the corporate datacenter.
And so, even as IT shops are optimizing their capacity on the clouds as well as in their datacenters, they are investing in AI systems, either directly or indirectly via the clouds, and we are seeing revenue growth for systems and for cloud instances despite the belt tightening outside of AI.
Amazon Web Services, the largest cloud builder in the world, is a case in point. In fact, it is the case in point, with roughly a third of the cloud market for the past six years and a remarkable ability to maintain that market share despite the increasing pressure from Microsoft Azure and Google Cloud on a global basis and myriad regional cloud suppliers that are playing to the niches and playing up national sovereignty issues to their advantage.
In the quarter ended in June, AWS posted sales of $22.14 billion, up 12.2 percent, but because of an increasing number of chip projects and systems and application software development efforts, operating income at AWS fell by $5.37 billion.
Revenues at AWS have been decelerating since the beginning, of course, but have been slowing down at a much faster rate since the third quarter of last year. Such slowdowns are natural and are also sort of built in, with AWS cutting prices on compute, storage, and networking over time to help boost demand and therefore future revenues. As one of the world’s largest retailers, Amazon has injected its “make it up in volume” attitude into AWS. But not to the point that AWS is not a highly profitable IT supplier and therefore able to support the retail and media empires that Amazon is also building.
On a call with Wall Street analysts going over the Q2 2023 results for Amazon, chief executive officer Andy Jassy, who ran the AWS division since its inception back in 2006 until Amazon co-founder Jeff Bezos decided to retire and hand the reins to the whole company to Jassy, said repeatedly on a call that getting any IT supplier with an $88 billion annual run rate to grow at 12.2 percent was an accomplishment. Which is true enough.
In the June quarter, cloud watcher Synergy research says Microsoft Azure and Google Cloud grew at slightly higher rates than the 18 percent growth rate for cloud infrastructure market at large, which for IaaS, PaaS, and private hosted cloud services alone accounted for $64.8 billion in sales. AWS and others sell plenty of data and application services atop these cloud infrastructure services, so this is a bit of an apples to apple sauce comparison. But as you can see, Microsoft and Google are gaining share, ever so slowly, and it is not unreasonable to think about a market where Microsoft can catch up to AWS at some point in the future.
If current trends persist, Microsoft will catch up to AWS in terms of cloud infrastructure revenue share in about three years; it would take maybe a dozen years for Google to catch up to AWS. And that assumes that all that AWS does is grow a little less than the rate of the cloud infrastructure market at large. Which it most definitely did not do for raw infrastructure in Q2 2023 if you compare the AWS revenue data with what Synergy says about core cloud infrastructure.
Don’t worry, AWS is making plenty of money higher up the stack, and it is going to print money on AI training and inference instances.
We have said from the beginning that AWS was going to be a platform and application provider, giving customers to ability to create their own software as well as use databases, datastores, and applications created by AWS itself for Amazon’s internal use that are turned around for resale. Our models suggest that slightly less than half of AWS revenues already come from this software:
AI has been propping up AWS compute sales and driving storage, networking, and software sales for many years now, but we think there is tremendous pressure on pricing for storage and networking as companies want to invest in GPU-heavy instances not only because they want to start figuring out how to integrate generative AI in their applications on the cloud but because there is no way in hell they can afford to build AI training systems – which cost on the order of $1 billion – or even get their hands on the modern GPUs that underpin AI training these days. And they are going to pay a very, very hefty premium to rent such GPU instances, as the pricing on the AWS P5 instances based on Nvidia’s “Hopper” GPU accelerators, which just became generally available, so aptly illustrate.
If you rented a single AWS p5.48xlarge instance with eight H100 GPUs plus a host with a pair of “Milan” Epyc 7003 CPUs with 2 TB of main memory on the host, it would cost $1.13 million to rent this on a three-year reserved instance contract. You might need 2,000 to 3,000 of these nodes for three or four months to train a model with several hundreds of billions of parameters and a trillion or so tokens. And renting such a system, even on a reserved basis, is still crazy expensive compared to buying it.
“Most companies tell us that they don’t want to consume that resource building themselves,” Jassy explained on the call with Wall Street going over the AWS Q2 2023 numbers. “Rather, they want access to those large language models and they want to customize them with their own data without leaking their proprietary data into the general model – have all the security, privacy and platform features in AWS work with this new enhanced model and then have it all wrapped in a managed service. This is what our service, Bedrock, does and offers customers all of these aforementioned capabilities with not just one large language model but with access to models from multiple leading large language model companies like Anthropic, Stability AI, AI21 Labs, Cohere and Amazon’s own developed large language models called Titan.”
You can bet that this won’t be cheap. But what else can governments and corporations working on a budget do? They will have to take models created by the cloud titans and adapt them, for a price, using iron on their clouds. And they will have to grin and bear it, or go ask their board of directors for $1 billion or more to buy their own machinery and pay for the expertise to make it actually do something.
And once again, you will be paying for the research and development the cloud titans have done and will do in the future, giving them AI at no cost and to their great benefit.
One more thing: Jassy & Co are sanguine that the cost optimization that has been happening since last year among its cloud customers is abating. We think that this is true so long as the national economies don’t take a turn for the worse. At that point, IT shops will prioritize projects and cut back severely – just as they would have done in the past with physical server purchases. And in some cases, we think there will be a wave of datacenter repatriations when companies see how much cheaper it is to buy and run infrastructure for a long investment cycle in a co-lo than it is to rent it from a cloud. Particularly if GPU compute capacity continues to be scarce.
The saving grace for AWS might be its custom Trainium AI training and Inferentia AI inference processors, which when paired with its Graviton CPUs can potentially cut the costs of AI workloads. These chips are in their second generation now, and we are going to do some digging to see how they compare to Nvidia GPUs for AI workloads.