Google Cloud announces new A3 supercomputer VMs built to power LLMs

2 years ago 160

As we’ve seen LLMs and generative AI come screaming into our consciousness in recent months, it’s clear that these models take enormous amounts of compute power to train and run. Recognizing this, Google Cloud announced a new A3 supercomputer virtual machine today at Google I/O.

The A3 has been purpose-built to handle the considerable demands of these resource-hungry use cases.

“A3 GPU VMs were purpose-built to deliver the highest-performance training for today’s ML workloads, complete with modern CPU, improved host memory, next-generation NVIDIA GPUs and major network upgrades,” the company wrote in an announcement.

Specifically, the company is arming these machines with NVIDIA’s H100 GPUs and combining that with a specialized data center to derive immense computational power with high throughput and low latency, all at what they suggest is a more reasonable price point than you would typically pay for such a package.

If you’re looking for specs, consider it’s powered by 8 NVIDIA H100 GPUs, 4th Gen Intel Xeon Scalable processors, 2TB of host memory and 3.6 TB/s bisectional bandwidth between the 8 GPUs via NVSwitch and NVLink 4.0, two NVIDIA technologies designed to help maximize throughput between multiple GPUs like the ones in this product.

These machines can provide up to 26 exaFlops of power, which should help improve the time and cost related to training larger machine learning models. What’s more, the workloads on these VMs run in Google’s specialized Jupiter data center networking fabric, which the company describes as, “26,000 highly interconnected GPUs.” This enables “full-bandwidth reconfigurable optical links that can adjust the topology on demand.” The company says this approach should also contribute to bringing down the cost of running these workloads.

The idea is to give customers an enormous amount of power designed to train more demanding workloads, whether that involves complex machine learning models or LLMs running generative AI applications, and to do it in a more cost-effective way.

Google will be offering the A3 in a couple of ways: customers can run it themselves, or if they would prefer, as a managed service where Google handles most of the heavy lifting for them. The do-it-yourself approach involves running the A3 VMs on Google Kubernetes Engine (GKE) and Google Compute Engine (GCE), while the managed service runs the A3 VMs on Vertex AI, the company’s managed machine learning platform.

While the new A3 VMs are being announced today at Google I/O, they are only available for now by signing up for a preview waitlist.