Meta unveils Grand Teton, a next-generation platform for large-scale AI

0
Meta unveils Grand Teton, a next-generation platform for large-scale AI
Visuel : Meta

At the Open Compute Project (OCP) 2022 Summit held October 18-20 in San Jose, California, Meta shared its innovations, among them Grand Teton, a next-generation AI hardware platform using H100 GPUs based on the NVIDIA HOPPER architecture.

AI models are becoming increasingly complex and require more and more computing power, thus requiring a high-performance infrastructure to support them.

Alexis Björlin, Meta’s vice president of engineering, said in a blog post about OCP:

“Today, some of the biggest challenges facing our industry on a large scale involve AI. How can we continue to facilitate and execute the models that power the experiences behind today’s innovative products and services? And what will it take to enable the AI behind the innovative products and services of the future? As we move to the next computing platform, the metaverse, the need for new open innovations to power AI becomes even clearer.”

Grand Teton, a next-generation platform for large-scale AI

The name Grand Teton refers to a peak in a Wyoming national park of the same name, highlighting the platform’s performance.

Alexis Björling points out:

“As AI models become more and more sophisticated, so will their associated workloads. Grand Teton has been designed with more compute capacity to better support memory bandwidth related workloads at Meta, such as our open source DLRMs. Grand Teton’s expanded operational compute power envelope also optimizes it for compute-related workloads, such as content understanding.”

Grand Teton uses NVIDIA H100 Tensor Core GPUs based on the NVIDIA Hopper architecture, which includes a transform engine to accelerate work on basic models, which address a wide range of applications including natural language processing, healthcare, and robotics.

The NVIDIA H100 is designed for performance but also for energy efficiency. H100 accelerated servers, when connected to NVIDIA networking across thousands of servers in hyperscale data centers, can be 300 times more energy efficient than CPU-only servers.

Ian Buck, vice president of hyperscale and high-performance computing at NVIDIA, assures:

“NVIDIA Hopper GPUs are designed to meet the world’s tough challenges, delivering accelerated computing with greater energy efficiency and improved performance, while adding scale and reducing costs. With Meta sharing the Grand Teton platform powered by H100, system builders around the world will soon have access to an open design for hyperscale data center computing infrastructure to power AI across industries.”

The new platform brings several performance enhancements to Zion, the previous platform: host-GPU bandwidth is increased by a factor of 4, compute and data network bandwidth by a factor of 2 as is the power envelope.

The Zion platform consists of three enclosures: a main processor node, a switch synchronization system, and a GPU system, connected by external cabling. Grand Teton integrates them into a single chassis for better overall performance, signal integrity and thermal performance.

This high level of integration greatly simplifies system deployment, allowing the Meta fleet to be installed and provisioned much faster and with greater reliability, according to Alexis Bjorlin.

Translated from Meta dévoile Grand Teton, une plateforme de nouvelle génération pour l’IA à grande échelle