The headquarters of Google is seen in Mountain View, California, United States, September 26, 2022.
Tayfun Coskun | Anadolu Agency | Getty Images
Google details published on one of its artificial intelligence supercomputers Wednesday, saying it’s faster and more efficient than competing Nvidia systems as power-hungry machine learning models continue to be the hottest part of the tech industry.
While Nvidia dominates the market for training and deploying AI models, with more than 90%Google has been designing and deploying AI chips called Tensor Processing Units, or TPUs, since 2016.
Google is a major pioneer in AI, and its employees have developed some of the most important advances in the field over the past decade. But some believe it has fallen behind in terms of bringing its inventions to market, and internally the company has been rushing to release products and prove that it hasn’t wasted its lead, a situation of “code red” in the company, CNBC previously reported.
AI models and products such as Google’s Bard or OpenAI’s ChatGPT — powered by Nvidia’s A100 chips – require many computers and hundreds or thousands of chips to work together to train models, with computers running around the clock for weeks or months.
On Tuesday, Google said it built a system with more than 4,000 TPUs paired with custom components designed to run and train AI models. It has been running since 2020 and was used to train Google’s PaLM model, which competes with OpenAI’s GPT model, over 50 days.
Google’s TPU-based supercomputer, called TPU v4, is “1.2x to 1.7x faster and uses 1.3x to 1.9x less power than the Nvidia A100,” the Google researchers wrote.
“Performance, scalability and availability make TPU v4 supercomputers the workhorses of large language models,” the researchers continued.
However, Google’s TPU results were not compared to Nvidia’s latest AI chip, the H100, because it is newer and was made with more advanced manufacturing technology, Google researchers said.
An Nvidia spokesperson declined to comment. Results and rankings of an industry-wide AI chip test called MLperf should be released on Wednesday.
The substantial amount of computing power required for AI is expensiveand many industry players are focused on developing new chips, components such as optical connections, or software techniques that reduce the amount of computing power needed.
The power requirements of AI are also a boon for cloud providers such as Google, Microsoft and Amazon, which can rent computer processing by the hour and provide credit or compute time for startups to build relationships. (Google’s cloud also sells time on Nvidia chips.) For example, Google said Midjourney, an AI image generator, was trained on its TPU chips.