Google's Liquid Cooling at Hot Chips 2025

https://news.ycombinator.com/rss Hits: 23
Summary

Liquid cooling is a familiar concept to PC enthusiasts, and has a long history in enterprise compute as well. Recently, liquid cooling has taken an increasing role in datacenters, amid increasing power draw and correspondingly high heat output from the latest chips. Machine learning in particular has an insatiable appetite for power and cooling. Google notes that water has a thermal conductivity about 4000 times that of air, making it an attractive solution to deal with the cooling demands associated with the current AI boom. Their talk at Hot Chips 2025 focuses on datacenter-level cooling for their TPUs, which are machine learning accelerators.Google’s foray into liquid cooled TPUs took form in 2018 after some experimentation and iteration. The company has continued to develop and advance their cooling designs since. Their current liquid cooling solution is designed for datacenter scale, with liquid cooling loops spanning racks rather than being contained within servers. Racks of six CDUs, or Coolant Distribution Units, perform a role analogous to the radiator+pump combo in an enthusiast water cooling loop. The CDUs use flexible hoses and quick disconnect couplings to ease maintenance and reduce tolerance requirements. A CDU rack can provide adequate cooling capacity with five CDUs active, allowing maintenance on one unit without downtime.CDUs exchange heat between coolant liquid and the facility-level water supply. The two liquid supplies don’t mix, and the CDUs only move heat between the two pools of liquid. Coolant liquid from the CDUs pass through manifolds that distribute the coolant to TPU servers. TPU chips are hooked up in series in the loop, which naturally means some chips will get hotter liquid that has already passed other chips in the loop. Cooling capacity is budgeted based on the requirements of the last chip in each loop.Google uses a split-flow cold plate, which they found to perform better than a traditional straight-through configuration. To furt...

First seen: 2025-08-25 18:14

Last seen: 2025-08-26 16:18