It’s AWS re:Invent 2024 this week, Amazon’s annual cloud computing extravaganza in Las Vegas, and as is custom, the corporate has a lot to announce, it might probably’t match every part into its 5 (!) keynotes. Ahead of the present’s official opening, AWS on Monday detailed plenty of updates to its general information middle technique which might be value being attentive to.
The most necessary of those is that AWS will quickly begin utilizing liquid cooling for its AI servers and different machines, no matter whether or not these are based mostly on its homegrown Trainium chips and Nvidia’s accelerators. Specifically, AWS notes that its Trainium2 chips (that are nonetheless in preview) and “rack-scale AI supercomputing options like NVIDIA GB200 NVL72” might be cooled this fashion.
It’s value highlighting that AWS stresses that these up to date cooling methods can combine each air and liquid cooling. After all, there are nonetheless loads of different servers within the information facilities that deal with networking and storage, for instance, that don’t require liquid cooling. “This versatile, multimodal cooling design permits AWS to supply most efficiency and effectivity on the lowest price, whether or not operating conventional workloads or AI fashions,” AWS explains.
The firm additionally introduced that it’s shifting to extra simplified electrical and mechanical designes for its servers and server racks.
“AWS’s newest information middle design enhancements embody simplified electrical distribution and mechanical methods, which allow infrastructure availability of 99.9999%. The simplified methods additionally cut back the potential variety of racks that may be impacted by electrical points by 89%,” the corporate notes in its announcement. In half, AWS is doing this by decreasing the variety of instances the electrical energy will get transformed on its method from {the electrical} community to the server.
AWS didn’t present many extra particulars than that, however this seemingly means utilizing DC energy to run the servers and/or HVAC system and avoiding most of the AC-DC-AC conversion steps (with their default losses) in any other case vital.
“AWS continues to relentlessly innovate its infrastructure to construct essentially the most performant, resilient, safe, and sustainable cloud for patrons worldwide,” mentioned Prasad Kalyanaraman, vice chairman of Infrastructure Services at AWS, in Monday’s announcement. “These information middle capabilities characterize an necessary step ahead with elevated vitality effectivity and versatile assist for rising workloads. But what’s much more thrilling is that they’re designed to be modular, in order that we’re capable of retrofit our present infrastructure for liquid cooling and vitality effectivity to energy generative AI functions and decrease our carbon footprint.”
In whole, AWS says, the brand new multimodal cooling system and upgraded energy supply system will let the group “assist a 6x improve in rack energy density over the subsequent two years, and one other 3x improve sooner or later.”
In this context, AWS additionally notes that it’s now utilizing AI to foretell essentially the most environment friendly strategy to place racks within the information middle to scale back the quantity of unused or underutilized energy. AWS may also roll out its personal management system throughout its electrical and mechanical units within the information middle, which is able to include built-in telemetry providers for real-time diagnostics and troubleshooting.
“Data facilities should evolve to satisfy AI’s transformative calls for,” mentioned Ian Buck, vice chairman of hyperscale and HPC at Nvidia. “By enabling superior liquid cooling options, AI infrastructure will be effectively cooled whereas minimizing vitality use. Our work with AWS on their liquid cooling rack design will permit clients to run demanding AI workloads with distinctive efficiency and effectivity.”