Imagine a system that can explore multiple approaches to complex problems, drawing on its understanding of vast amounts of data, from scientific datasets to source code to business documents, and reasoning through the possibilities in real time. This lightning-fast reasoning isn’t waiting on the horizon. It’s happening today in our customers’ AI production environments. The scale of the AI systems that our customers are building today—across drug discovery, enterprise search, software development, and more—is truly remarkable. And there’s much more ahead.
To accelerate innovation across emerging generative AI developments such as reasoning models and agentic AI systems, we’re excited to announce general availability of P6e-GB200 UltraServers, accelerated by NVIDIA Grace Blackwell Superchips. P6e-GB200 UltraServers are designed for training and deploying the largest, most sophisticated AI models. Earlier this year, we launched P6-B200 instances, accelerated by NVIDIA Blackwell GPUs, for diverse AI and high-performance computing workloads.
In this post, we share how these powerful compute solutions build on everything we’ve learned about delivering secure, reliable GPU infrastructure at a massive scale, so that customers can confidently push the boundaries of AI.
Meeting the expanding compute demands of AI workloads
P6e-GB200 UltraServers represent our most powerful GPU offering to date, featuring up to 72 NVIDIA Blackwell GPUs interconnected using fifth-generation NVIDIA NVLink—all functioning as a single compute unit. Each UltraServer delivers a massive 360 petaflops of dense FP8 compute and 13.4 TB of total high bandwidth GPU memory (HBM3e)—which is over 20 times the compute and over 11 times the memory in a single NVLink domain compared to P5en instances. P6e-GB200 UltraServers support up to 28.8 Tbps aggregate bandwidth of fourth-generation Elastic Fabric Adapter (EFAv4) networking.P6-B200 instances are a versatile option for a broad range of AI use cases. Each instance provides 8 NVIDIA Blackwell GPUs interconnected using NVLink with 1.4 TB of high bandwidth GPU memory, up to 3.2 Tbps of EFAv4 networking, and fifth-generation Intel Xeon Scalable processors. P6-B200 instances offer up to 2.25 times the GPU TFLOPs, 1.27 times the GPU memory size, and 1.6 times the GPU memory bandwidth compared to P5en instances.
How do you choose between P6e-GB200 and P6-B200? This choice comes down to your specific workload requirements and architectural needs:
- P6e-GB200 UltraServers are ideal for the most compute and memory intensive AI workloads, such as training and deploying frontier models at the trillion-parameter scale. Their NVIDIA GB200 NVL72 architecture really shines at this scale. Imagine all 72 GPUs working as one, with a unified memory space and coordinated workload distribution. This architecture enables more efficient distributed training by reducing communication overhead between GPU nodes. For inference workloads, the ability to fully contain trillion-parameter models within a single NVLink domain means faster, more consistent response times at scale. When combined with optimization techniques such as disaggregated serving with NVIDIA Dynamo, the large domain size of GB200 NVL72 architecture unlocks significant inference efficiencies for various model architectures such as mixture of experts models. GB200 NVL72 is particularly powerful when you need to handle extra-large context windows or run high-concurrency applications in real time.
- P6-B200 instances support a broad range of AI workloads and are an ideal option for medium to large-scale training and inference workloads. If you want to port your existing GPU workloads, P6-B200 instances offer a familiar 8-GPU configuration that minimizes code changes and simplifies migration from current generation instances. Additionally, although NVIDIA’s AI software stack is optimized for both Arm and x86, if your workloads are specifically built for x86 environments, P6-B200 instances, with their Intel Xeon processors, will be your ideal choice.
Innovation built on AWS core strengths
Bringing NVIDIA Blackwell to AWS isn’t about a single breakthrough—it’s about continuous innovation across multiple layers of infrastructure. By building on years of learning and innovation across compute, networking, operations, and managed services, we’ve brought NVIDIA Blackwell’s full capabilities with the reliability and performance customers expect from AWS.
Robust instance security and stability
When customers tell me why they choose to run their GPU workloads on AWS, one crucial point comes up consistently: they highly value our focus on instance security and stability in the cloud. The specialized hardware, software, and firmware of the AWS Nitro System are designed to enforce restrictions so that nobody, including anyone in AWS, can access your sensitive AI workloads and data. Beyond security, the Nitro System fundamentally changes how we maintain and optimize infrastructure. The Nitro System, which handles networking, storage, and other I/O functions, makes it possible to deploy firmware updates, bug fixes, and optimizations while it remains operational. This ability to update without system downtime, which we call live update, is crucial in today’s AI landscape, where any interruption significantly impacts production timelines. P6e-GB200 and P6-B200 both feature the sixth generation of the Nitro System, but these security and stability benefits aren’t new—our innovative Nitro architecture has been protecting and optimizing Amazon Elastic Compute Cloud (Amazon EC2) workloads since 2017.
Reliable performance at massive scale
In AI infrastructure, the challenge isn’t just reaching massive scale—it’s delivering consistent performance and reliability at that scale. We’ve deployed P6e-GB200 UltraServers in third-generation EC2 UltraClusters, which creates a single fabric that can encompass our largest data centers. Third-generation UltraClusters cut power consumption by up to 40% and reduce cabling requirements by more than 80%—not only improving efficiency, but also significantly reducing potential points of failure.
To deliver consistent performance at this massive scale, we use Elastic Fabric Adapter (EFA) with its Scalable Reliable Datagram protocol, which intelligently routes traffic across multiple network paths to maintain smooth operation even during congestion or failures. We’ve continuously improved EFA’s performance across four generations. P6e-GB200 and P6-B200 instances with EFAv4 show up to 18% faster collective communications in distributed training compared to P5en instances that use EFAv3.
Infrastructure efficiency
Whereas P6-B200 instances use our proven air-cooling infrastructure, P6e-GB200 UltraServers use liquid cooling, which enables higher compute density in large NVLink domain architectures, delivering higher system performance. P6e-GB200 are liquid cooled with novel mechanical cooling solutions providing configurable liquid-to-chip cooling in both new and existing data centers, so we can support both liquid-cooled accelerators and air-cooled network and storage infrastructure in the same facility. With this flexible cooling design, we can deliver maximum performance and efficiency at the lowest cost.
Getting started with NVIDIA Blackwell on AWS
We’ve made it simple to get started with P6e-GB200 UltraServers and P6-B200 instances through multiple deployment paths, so you can quickly begin using Blackwell GPUs while maintaining the operational model that works best for your organization.
Amazon SageMaker HyperPod
If you’re accelerating your AI development and want to spend less time managing infrastructure and cluster operations, that’s exactly where Amazon SageMaker HyperPod excels. It provides managed, resilient infrastructure that automatically handles provisioning and management of large GPU clusters. We keep enhancing SageMaker HyperPod, adding innovations like flexible training plans to help you gain predictable training timelines and run training workloads within your budget requirements.
SageMaker HyperPod will support both P6e-GB200 UltraServers and P6-B200 instances, with optimizations to maximize performance by keeping workloads within the same NVLink domain. We’re also building in a comprehensive, multi-layered recovery system: SageMaker HyperPod will automatically replace faulty instances with preconfigured spares in the same NVLink domain. Built-in dashboards will give you visibility into everything from GPU utilization and memory usage to workload metrics and UltraServer health status.
Amazon EKS
For large-scale AI workloads, if you prefer to manage your infrastructure using Kubernetes, Amazon Elastic Kubernetes Service (Amazon EKS) is often the control plane of choice. We continue to drive innovations in Amazon EKS with capabilities like Amazon EKS Hybrid Nodes, which enable you to manage both on-premises and EC2 GPUs in a single cluster—delivering flexibility for AI workloads.
Amazon EKS will support both P6e-GB200 UltraServers and P6-B200 instances with automated provisioning and lifecycle management through managed node groups. For P6e-GB200 UltraServers, we’re building in topology awareness that understands the GB200 NVL72 architecture, automatically labeling nodes with their UltraServer ID and network topology information to enable optimal workload placement. You will be able to span node groups across multiple UltraServers or dedicate them to individual UltraServers, giving you flexibility in organizing your training infrastructure. Amazon EKS monitors GPU and accelerator errors and relays them to the Kubernetes control plane for optional remediation.
NVIDIA DGX Cloud on AWS
P6e-GB200 UltraServers will also be available through NVIDIA DGX Cloud. DGX Cloud is a unified AI platform optimized at every layer with multi-node AI training and inference capabilities and NVIDIA’s complete AI software stack. You benefit from NVIDIA’s latest optimizations, benchmarking recipes, and technical expertise to improve efficiency and performance. It offers flexible term lengths along with comprehensive NVIDIA expert support and services to help you accelerate your AI initiatives.
This launch announcement is an important milestone, and it’s just the beginning. As AI capabilities evolve rapidly, you need infrastructure built not just for today’s demands but for all the possibilities that lie ahead. With innovations across compute, networking, operations, and managed services, P6e-GB200 UltraServers and P6-B200 instances are ready to enable these possibilities. We can’t wait to see what you will build with them.
Resources
About the author
David Brown is the Vice President of AWS Compute and Machine Learning (ML) Services. In this role he is responsible for building all AWS Compute and ML services, including Amazon EC2, Amazon Container Services, AWS Lambda, Amazon Bedrock and Amazon SageMaker. These services are used by all AWS customers but also underpin most of AWS’s internal Amazon applications. He also leads newer solutions, such as AWS Outposts, that bring AWS services into customers’ private data centers.
David joined AWS in 2007 as a Software Development Engineer based in Cape Town, South Africa, where he worked on the early development of Amazon EC2. In 2012, he relocated to Seattle and continued to work in the broader Amazon EC2 organization. Over the last 11 years, he has taken on larger leadership roles as more of the AWS compute and ML products have become part of his organization.
Prior to joining Amazon, David worked as a Software Developer at a financial industry startup. He holds a Computer Science & Economics degree from the Nelson Mandela University in Port Elizabeth, South Africa.