There is a shortage of GPUs as the demand for generative AI, which is often trained and run on GPUs, grows. The best performing Nvidia chips are IS reported sold until 2024. The CEO of chipmaker TSMC was less optimistic recently, suggests that the shortage of GPUs from Nvidia – as well as from Nvidia’s rivals – could extend to 2025.
To reduce their reliance on GPUs, companies that can afford them (that is, the tech giants) are developing – and in some cases making available to customers – custom-tailored for manufacturing, – repeat and create AI models. One of the companies is Amazon, which today at the annual re:Invent conference revealed its latest generation of chips for model training and inferencing (ie running trained models).
The first of the two, AWS Trainium2, is designed to deliver up to 4x better performance and 2x better energy efficiency than the first-generation Trainium, which was unveiled in December 2020, Amazon said. Set to run on EC Trn2 instances in clusters of 16 chips in the AWS cloud, Tranium2 can scale up to 100,000 chips in AWS’ EC2 UltraCluster product.
The 100,000 Trainium chips deliver 65 exaflops of computing, Amazon says – that works out to 650 teraflops per chip. (“Exaflops” and “teraflops” measure how many computing operations per second a chip can perform.) There are likely complicating factors that make that back-of-the-napkin math unnecessary. very accurate. But if a Tranium2 chip can actually deliver ~200 teraflops of performance, that puts it GOOD beyond the capacity of Google’s custom AI training chips around 2017.
Amazon says a cluster of 100,000 Trainium chips can train a 300-billion-parameter AI large-scale language model in weeks versus months. (“Parameters” are the parts of a model that are learned from training data and essentially define the model’s skill at a problem, such as generating text or code.) That’s about 1.75 times the size of OpenAI’s GPT-3, the text follows. -creating GPT-4.
“Silicon underpins every customer job, making it a critical area of innovation for AWS,” AWS compute and networking VP David Brown said in a press release. “(W)ith the surge of interest in generative AI, Tranium2 will help customers train their ML models faster, at lower cost, and with better energy efficiency.”
Amazon did not say when Trainium2 instances will be available to AWS customers who can save “over the next year.” Make sure we keep an eye out for more information.
The second chip that Amazon announced this morning, the Based on the arm Graviton4, intended for inferencing. The fourth generation of Amazon’s Graviton chip family (as indicated by the “4” added to “Graviton”), it differs from Amazon’s other inferencing chip, Inferentia.
Amazon claims that Graviton4 provides up to 30% better computing performance, 50% more cores and 75% more memory bandwidth than a previous generation Graviton processor, Graviton3 (but not the latest Graviton3E), running on Amazon EC2. In another upgrade from Graviton3, all of Graviton4’s physical hardware interfaces are “encrypted,” Amazon says — better capturing AI training workloads and data for customers with increased encryption requirements. (We’ve asked Amazon what “encrypted” means, exactly, and we’ll update this piece once we hear back.)
“The Graviton4 marks the fourth generation we’ve delivered in just five years and is the most powerful and energy-efficient chip we’ve ever built for a wide range of workloads,” Brown continued in a statement. “By focusing our chip designs on the real workloads that matter to customers, we can deliver the most advanced cloud infrastructure to them.”
Graviton4 is available on Amazon EC2 R8g instances, available in preview today with general availability planned in the coming months.