Chinese Customs Block NVIDIA H200 Shipments, Freezing $54B in Orders
17 1 月, 2026
The Silicon Gold Rush: ByteDance and Global Titans Push NVIDIA Blackwell Demand to Fever Pitch
17 1 月, 2026

How Is Nvidia Planning Its GPU and AI Systems Until 2028?

Published by admin5 on 17 1 月, 2026

Nvidia’s roadmap highlights a combination of GPUs, CPUs, DPUs, and scale-up and scale-out networks. It emphasizes system performance and memory capacity improvements for AI inference and training workloads. For instance, the Blackwell B300 GPU, launching in 2025, offers 50% more HBM3E memory and FP4 performance than its predecessors. Subsequent GPU generations—Vera-Rubin and Feynman—will further expand compute density and bandwidth, ensuring AI workloads scale efficiently for enterprises and hyperscalers.

What Are the Key Features of the Blackwell B300 GPU?

The Blackwell B300, also called Blackwell Ultra, is Nvidia’s next-generation GPU for high-capacity AI inference. Key upgrades include:

Feature Specification
Memory 288 GB HBM3E, 12-high stacks
FP4 Performance 15 Petaflops
Rack Configuration GB300 NVL72 with 72 GPUs per rack
Network NVLink 5 + NVSwitch 4 shared memory

This GPU enables higher memory and compute throughput, allowing large reasoning models to run efficiently across distributed AI clusters.

How Will the Vera CPU and Rubin GPU Improve AI Performance?

The Vera CV100 CPU features 88 custom Arm cores with simultaneous multithreading, doubling threads to 176, and offers over 1 TB of main memory. Rubin R100 GPUs will feature 288 GB HBM4 memory per GPU socket, with a 62.5% increase in memory bandwidth. Combined, the Vera-Rubin NVL144 system will deliver 3.6 exaflops FP4 inference and 1.2 exaflops FP8 training, more than triple the performance of current GB300 NVL72 systems.

Which Innovations Are Included in the Rubin Ultra and Feynman GPU Generations?

The Rubin Ultra GPU, arriving in 2027, increases GPU density with four chiplets per SXM8 socket, reaching 100 Petaflops FP4 performance and 1 TB HBM4E memory. In 2028, the Feynman generation doubles performance, featuring 3.2 Tb/sec ConnectX-10 NICs, 204 Tb/sec Spectrum 7 Ethernet switches, and NVSwitch 8 at 7.2 TB/sec. These systems maximize compute density while maintaining energy efficiency, with Nvidia’s Kyber liquid-cooled racks optimizing thermal management for high-density GPU clusters.

Why Are Network Upgrades Crucial for Nvidia’s AI Systems?

Nvidia is scaling network bandwidth in tandem with compute power. NVLink ports, NVSwitch, and ConnectX NICs ensure high-speed communication between GPUs and CPUs. For example, NVLink 7 ports paired with NVSwitch 6 provide 3.6 TB/sec between GPU and CPU, supporting large-scale reasoning models and distributed training. Ethernet adoption allows hyperscalers to standardize on familiar infrastructure while achieving high throughput for AI workloads.

How Does WECENT Integrate Nvidia Hardware Solutions?

WECENT, as a certified Nvidia distributor, offers enterprise clients access to the full GPU ecosystem, from RTX consumer cards to Tesla data center GPUs and advanced NVL rack systems. By leveraging WECENT’s experience in server deployment, clients can optimize GPU density, power efficiency, and cooling solutions, ensuring AI systems achieve maximum performance while remaining cost-effective. WECENT provides consultation, installation, and ongoing support for mission-critical AI infrastructure.

WECENT Expert Views

“Nvidia’s roadmap is more than a product timeline; it is a blueprint for future AI workloads. The introduction of Blackwell Ultra, Vera-Rubin, and Feynman GPUs addresses both memory and computational bottlenecks, allowing enterprises to scale AI inference and training seamlessly. At WECENT, we emphasize aligning clients’ deployment strategies with these hardware advancements to ensure long-term performance and ROI.”

When Will the Next Nvidia GPU Systems Be Available?

  • GB300 NVL72: Second half of 2025

  • Vera-Rubin NVL144: Second half of 2026

  • Rubin Ultra VR300 NVL576: Second half of 2027

  • Feynman Generation: 2028

These release windows allow enterprises to plan upgrades in line with AI workload growth and system lifecycle management.

Conclusion

Nvidia’s roadmap through 2028 demonstrates strategic growth in GPU performance, memory capacity, and system bandwidth. Enterprises and hyperscalers benefit from predictable, scalable upgrades that address the computing demands of AI reasoning, training, and inference. Partnering with WECENT ensures access to authorized hardware, expert deployment support, and optimized performance for high-density AI workloads. Businesses can now plan AI infrastructure confidently, avoiding bottlenecks while maximizing efficiency.

Frequently Asked Questions

Q: How does the Blackwell B300 differ from the B200?
A: The B300 offers 50% more memory and FP4 performance, with 288 GB HBM3E and 15 Petaflops, compared to the B200’s 192 GB and 10 Petaflops.

Q: What are the advantages of the Vera-Rubin NVL144 system?
A: It provides over 3X the inference performance of GB300 NVL72 systems, combining 88-core CPUs with 288 GB HBM4 GPUs and improved NVLink/NVSwitch connectivity.

Q: Can WECENT support liquid-cooled Nvidia racks?
A: Yes. WECENT offers consultation, installation, and maintenance of liquid-cooled systems like Kyber racks for high-density AI workloads.

Q: How does Feynman generation improve AI compute density?
A: By doubling GPU and CPU performance, integrating high-speed NICs and Ethernet switches, and leveraging NVSwitch 8 for 7.2 TB/sec bandwidth.

Q: Are these systems suitable for hyperscale AI deployments?
A: Absolutely. Nvidia’s roadmap and WECENT’s support ensure scalable, high-performance systems tailored for hyperscale and enterprise AI applications.

What is Nvidia’s GPU and AI roadmap until 2028?
Nvidia plans annual GPU releases from 2025–2028, moving from two-year cycles to yearly updates. Key milestones include Blackwell Ultra (2025), Rubin with Vera CPU (2026), Rubin Ultra (2027), and Feynman (2028). The roadmap emphasizes higher compute density, HBM4/HBM4E memory, AI factory racks, and advanced networking to scale AI inference and agentic AI workloads efficiently.

What are the major GPUs planned from 2025 to 2028?
Blackwell Ultra (2025) uses 288GB HBM3E and 15 PFLOPS FP4. Rubin (2026) introduces HBM4 memory and Vera CPU, Rubin Ultra (2027) features 1TB HBM4E with four chiplets, and Feynman (2028) targets next-gen HBM with 3.2 Tb/sec ConnectX-10 NICs, doubling AI performance and bandwidth at each stage.

What is the “AI factory” concept Nvidia is implementing?
AI factories are rack-scale, liquid-cooled systems integrating GPUs, CPUs, and networking as a single machine. Platforms like NVL72 and NVL576 combine Blackwell or Rubin GPUs with Vera CPUs, high-bandwidth HBM memory, and Spectrum-X networking to optimize AI reasoning, multi-step tasks, and massive inference workloads.

How is Nvidia improving GPU memory and bandwidth?
Nvidia transitions from HBM3E in 2025 to HBM4/HBM4E by 2027, with next-gen HBM in 2028. These upgrades increase GPU-to-GPU bandwidth (up to 3.6 TB/s in Rubin systems), reduce latency, and support larger AI models, enabling agentic AI systems and high-speed data-intensive computation.

What role do custom Vera CPUs play in Nvidia’s roadmap?
Vera CPUs, introduced in 2026, replace Grace CPUs with 88 custom ARM cores. They manage data orchestration across GPUs, enhance multi-step AI reasoning, and optimize performance for agentic AI workloads in Nvidia’s rack-scale AI factory systems.

How does Nvidia plan to handle cooling and energy efficiency?
Nvidia uses “Kyber” liquid-cooled racks designed for 600+ kilowatt systems. These liquid-cooled AI factories manage thermal loads efficiently, allowing higher GPU density and power consumption while maintaining system stability and energy efficiency for AI-intensive workloads.

What networking technologies support Nvidia’s AI systems?
Nvidia integrates Spectrum-6 and Spectrum-7 Ethernet switches, doubling per-switch bandwidth to 102.4 Tb/s in 2026 and 204 Tb/s by 2028. High-speed NVLink and ConnectX-10 NICs ensure low-latency, high-throughput GPU-to-GPU and CPU-to-GPU communication for AI inference and training.

What is the expected impact on AI performance and costs?
Nvidia’s roadmap targets up to 10x cost reduction per token for AI inference compared to Blackwell. Increased compute density, memory bandwidth, and AI factory integration support agentic AI and large-scale reasoning while delivering faster, more efficient AI processing for enterprise and hyperscale customers.

    Related Posts

     

    Contact Us Now

    Please complete this form and our sales team will contact you within 24 hours.