AWS builds new cooling system for Nvidia AI GPUs

AWS builds new cooling system for Nvidia AI GPUs

Tech in Asia·2025-07-10 13:02

Amazon Web Services (AWS) has developed new hardware called the In-Row Heat Exchanger (IRHX) to cool Nvidia’s GPUs used for AI tasks.

These GPUs consume substantial energy, prompting AWS engineers to build an in-house cooling system instead of using liquid-cooled data centers.

The IRHX addresses concerns about space efficiency and water use, according to Dave Brown, AWS’s VP of compute and machine learning services.

The IRHX can be installed in both existing and new AWS data centers.

Customers can access this cooling system through AWS P6e computing instances, which support Nvidia’s GB200 NVL72.

.source-ref{font-size:0.85em;color:#666;display:block;margin-top:1em;}a.ask-tia-citation-link:hover{color:#11628d !important;background:#e9f6f5 !important;border-color:#11628d !important;text-decoration:none !important;}@media only screen and (min-width:768px){a.ask-tia-citation-link{font-size:11px !important;}}

🔗 Source: CNBC

🧠 Food for thought

1️⃣ The evolution of GPU cooling reflects the growing thermal demands of AI

AWS’s development of the In-Row Heat Exchanger follows a decades-long progression in cooling technology necessitated by increasingly powerful processors.

In the 1990s, early graphics chips like the Rage Pro (1997) required no dedicated cooling whatsoever, while by 2003, the Radeon 9800 series introduced heatpipes and dual-slot coolers to manage rising thermal output 1.

The industry’s first experiments with liquid cooling appeared around 2005-2007 with products like the Radeon X1950 XTX featuring All-in-One water coolers, highlighting the limitations of air cooling 2.

Today’s AI-focused GPUs generate unprecedented heat, with modern systems reaching thermal design powers of up to 350W per GPU, multiple times higher than previous generations and requiring entirely new cooling approaches 3.

This trend shows how thermal management has become a critical factor in AI infrastructure, pushing even the largest cloud providers to develop custom cooling solutions rather than rely on conventional approaches.

2️⃣ Custom infrastructure has become a competitive necessity in the AI cloud wars

Amazon’s decision to build proprietary cooling hardware continues its long-standing strategy of vertical integration that began with its first custom silicon initiatives around 2012 4.

AWS has progressively developed custom components including the Nitro System for virtualization, Graviton ARM-based processors, and networking equipment, all delivering significant efficiency improvements while reducing dependency on third-party suppliers 5.

This approach has enabled AWS to maintain industry-leading operating margins, with the first quarter showing the widest margin since at least 2014 according to the original article.

The cooling system announcement comes amid intensifying competition specifically for AI workloads, with specialized providers like CoreWeave and Lambda Labs emerging as significant players by offering optimized GPU infrastructure at lower costs than traditional cloud providers 6.

Microsoft has pursued a similar custom hardware strategy with its Sidekick cooling systems for Maia AI chips, indicating that proprietary infrastructure development has become essential for competitive positioning in the AI cloud market.

3️⃣ Environmental concerns are reshaping data center infrastructure decisions

AWS’s rejection of alternative cooling approaches due to concerns about “increased water usage” reflects growing tension between AI computing demands and sustainability goals.

The company has publicly committed to water replenishment initiatives aimed at returning over 8 billion liters of water to communities annually, making water-intensive cooling solutions problematic for their sustainability metrics 7.

AWS has achieved a Power Usage Effectiveness (PUE) of 1.15 globally, a measure of data center efficiency where lower numbers are better, but AI workloads threaten to reverse efficiency gains as they require significantly more power per computation 7.

The custom cooling system allows AWS to maintain performance while balancing environmental concerns, particularly as the company has committed to powering operations with 100% renewable energy 7.

This intersection of technical requirements and environmental considerations demonstrates how sustainability has become a key factor in infrastructure design decisions, beyond just performance and cost optimization.

Recent Amazon developments

……

Read full article on Tech in Asia

America Technology