January 18, 2025

Nvidia H200: Nvidia unveils new chip for training generative AI models: All the details

[ad_1]

Nvidia has unveiled its latest chip for powering supercomputers that charge generative AI models. Based on Nvidia Hopper architecture, the platform features the Nvidia H200 Tensor Core GPU with advanced memory to handle huge amounts of data for generative AI and high performance computing workloads.
The Nvidia H200, according to the company’s press release, is the first GPU to offer HBM3e — faster, larger memory to fuel the acceleration of generative AI and large language models, while advancing scientific computing for HPC workloads.With HBM3e, the Nvidia H200 delivers 141GB of memory at 4.8 terabytes per second, nearly double the capacity and 2.4x more bandwidth compared with its predecessor, the Nvidia A100, the company noted.

When will the chip be made available?

The Nvidia H200 will be available from global system manufacturers and cloud service providers starting in the second quarter of 2024. Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure will be among the first cloud service providers to deploy H200-based instances starting next year.
Powered by Nvidia NVLink and NVSwitch high-speed interconnects, HGX H200 provides the highest performance on various application workloads, including LLM training and inference for the largest models beyond 175 billion parameters. An eight-way HGX H200 provides over 32 petaflops of FP8 deep learning compute and 1.1TB of aggregate high-bandwidth memory for the highest performance in generative AI and HPC applications, said Nvidia.
Nvidia’s graphics processing units (GPUs) are playing an increasingly important role in the development and deployment of generative AI models. GPUs are designed to handle the massive parallel computations required for training and running these models, making them well-suited for tasks such as image generation, natural language processing among others. The company’s GPUs are able to accelerate the training and running of generative AI models by several orders of magnitude. This is due to their parallel processing architecture, which allows them to perform many calculations simultaneously.



[ad_2]

Source link