Legendary Chinese AI Startup DeepSeek OpenAI Google, 23 Jan

Chinese AI startup DeepSeek

The day after Christmas, a small Chinese AI startup called DeepSeek unveiled a new A.I. system that could match the capabilities of cutting-edge chatbots from companies like OpenAI and Google.

That alone would have been a milestone. But the team behind the system, called DeepSeek-V3, described an even bigger step. In a research paper explaining how they built the technology, Chinese AI startup DeepSeek engineers said they used only a fraction of the highly specialized computer chips that leading A.I. companies relied on to train their systems.

These chips are at the center of a tense technological competition between the United States and China. As the U.S. government works to maintain the country’s lead in the global A.I. race, it is trying to limit the number of powerful chips, like those made by Silicon Valley firm Nvidia, that can be sold to China and other rivals.

But the performance of the DeepSeek model raises questions about the unintended consequences of the American government’s trade restrictions. The controls have forced researchers in China to get creative with a wide range of tools that are freely available on the internet.

The DeepSeek chatbot answered questions, solved logic problems and wrote its own computer programs as capably as anything already on the market, according to the benchmark tests that American A.I. companies have been using.

And it was created on the cheap, challenging the prevailing idea that only the tech industry’s biggest companies — all of them based in the United States — could afford to make the most advanced A.I. systems. The Chinese engineers said they needed only about $6 million in raw computing power to build their new system. That is about 10 times less than the tech giant Meta spent building its latest A.I. technology.

“The number of companies who have $6 million to spend is vastly greater than the number of companies who have $100 million or $1 billion to spend,” said Chris V. Nicholson, an investor with the venture capital firm Page One Ventures, who focuses on A.I. technologies.

Since OpenAI sparked the A.I. boom in 2022 with the release of ChatGPT, many experts and investors had concluded that no company could compete with the market leaders without spending hundreds of millions dollars on specialized chips.

The world’s leading A.I. companies train their chatbots using supercomputers that use as many as 16,000 chips, if not more. DeepSeek’s engineers, on the other hand, said they needed only about 2,000 specialized computer chips from Nvidia.

The constraints on chips in China forced the DeepSeek engineers to “train it more efficiently so it could still be competitive,” said Jeffrey Ding, an assistant professor at George Washington University who specializes in emerging technology and international relations.

Earlier this month, the Biden administration issued new rules that aim to keep China from obtaining advanced A.I. chips through other countries. The rules build on multiple rounds of earlier restrictions that prevent Chinese companies from being able to buy or make cutting-edge computer chips. President Trump has not yet indicated whether he will the rules or rescind them.

The U.S. government has tried to keep advanced chips out of the hands of Chinese companies over concerns they could be used for military purposes. In response, some firms in China have stockpiled thousands of chips, while others sourced them from a thriving underground marketplace of smugglers.

DeepSeek is run by a quantitative stock trading firm called High Flyer. By 2021, it had channeled its profits into acquiring thousands of Nvidia chips, which it used to train its earlier models. The company, which did not respond to requests for comment, has become known in China for scooping up talent fresh from top universities with the promise of high salaries and the ability to follow the research questions that most pique their interest.

Zihan Wang, a computer engineer who worked on an earlier DeepSeek model, said the company also hires people without any computer science background to help the technology understand and be able to generate poetry and ace questions on the notoriously difficult Chinese college entrance examination.

DeepSeek does not make any products for consumers, leaving its engineers to focus entirely on research. That means that its technology is not hemmed in by the strictest aspect of China’s regulations on A.I., which require consumer-facing technology to comply with the government’s controls on information.

The leading American companies continue to advance the state of the art in A.I. In December, OpenAI unveiled a new “reasoning” system called o3 that exceeds the performance of existing technologies, though it is not yet widely available outside the company. But DeepSeek continues to show that it is not far behind. This month, it released an impressive reasoning model of its own.

(The New York Times has sued OpenAI and its partner, Microsoft, accusing them of copyright infringement of news content related to A.I. systems. OpenAI and Microsoft have denied those claims.)

A crucial part of this rapidly changing global market is an old idea: open source software. Like many other companies, DeepSeek has open sourced its latest A.I. system, meaning that it has shared the underlying code with other businesses and researchers. This allows others to build and distribute their own products using the same technologies.

While employees at big Chinese technology companies are limited to collaborating with colleagues, “if you work on open source, you work with talent around the world,” said Yineng Zhang, lead software engineer at Baseten in San Francisco who works on the open source SGLang project. He helps other people and companies build products using Chinese AI startup DeepSeek system.

The open source ecosystem for A.I. gathered steam in 2023 when Meta freely shared an openAI system called LLama. Many assumed that this community would flourish only if the companies like Meta — tech giants with massive data centers filled with specialized chips — continued to open source their technologies. But Chinese AI startup DeepSeek and others have shown that they, too, can expand the powers of open source technologies.”

Many executives and pundits have argued that the big U.S. companies should not open source their technologies because they could be used to spread disinformation or cause other serious harm. Some U.S. lawmakers have explored the possibility of preventing or throttling the practice.

But others argue that if regulators stifle the progress of open source technology in the United States, China will gain a significant edge. If the best open source technologies come from China, they argue, U.S. developers will build their systems atop those technologies. In the long-run, that could put China at the heart of openAI research and development.

“The center of gravity of the open source community has been moving to China,” said Ion Stoica, a professor of computer science at the University of California, Berkeley. “This could be a huge danger for the U.S.,” because it allows China to accelerate the development of new technologies.

Hours after his inauguration, President Trump rescinded a Biden administration executive order that threatened to curb open source technologies.

Dr. Stoica and his students recently built an A.I. system called Sky-T1 that rivals the performance of OpenAI latest system, called OpenAI o1, on certain benchmark tests. They needed only $450 in computing power.

They did this by building on top of two open source technologies released by the Chinese tech giant Alibaba.

Their $450 system is not as powerful as OpenAI technology or DeepSeek’s new system. And the techniques they used are unlikely to yield systems that exceed the performance of the leading technologies. But the project showed that even operations with minuscule resources can build competitive systems.

Reuven Cohen, a technology consultant in Toronto, has been using DeepSeek-V3 since late December. He says it is comparable to the latest systems from OpenAI, Google and the San Francisco start-up Anthropic — and much cheaper to use.

“DeepSeek is a way for me to save money,” he said. “This is the kind of technology that someone like me wants to use.”

Source link