
The Trump administration ordered the National Institute of Standards and Technology (NIST) to compare DeepSeek to leading U.S. artificial intelligence models, revealing that the United States is outperforming China.
A report from the Center for AI Standards and Innovation (CAISI) said that three of China’s DeepSeek AI models fell behind four American AI models “across almost every benchmark,” noting that it especially outpaced the DeepSeek model in performance, cost, security, and adoption.
“Thanks to President Trump’s AI Action Plan, the Department of Commerce and NIST’s Center for AI Standards and Innovation have released a groundbreaking evaluation of American vs. adversary AI,” said Secretary of Commerce Howard Lutnick in a statement.
“The report is clear that American AI dominates, with DeepSeek trailing far behind. This weakness isn’t just technical,” Lutnick continued. “It shows why relying on foreign AI is dangerous and shortsighted. By setting the standards, driving innovation, and keeping America secure, the Department of Commerce will ensure continued U.S. leadership in AI.”
However, there are disclaimers – which CAISI emphasized that the study isn’t taken as an endorsement of any system or developer, and that the results are only preliminary.
“The inclusion of specific models in this report does not imply recommendation or endorsement, nor does it assert their superiority over any other model on any specific task or in general,” said CAISI, adding that measurements used in the report “are known to have methodological limitations.”
DeepSeek, a Chinese AI company, sent shockwaves through the technology industry after it launched its DeepSeek-R1 model in January, delivering high performance as an open-source model with low training costs.
According to the preliminary snapshot from CAISI, the most current DeepSeek model falls behind the “best U.S. model” by 20% in its ability to solve tasks. It is also 12 times more likely than evaluated U.S. models to follow malicious instructions intended to hijack the system, and it responded to 94% of overtly malicious requests when used to jailbreak.
DeepSeek models cost 35% more on average than a comparable U.S. reference model, CAISI said, and the Chinese model “echoed four times as many inaccurate and misleading” narratives as American models did.
While xAI’s Grok model was not one of the four American models compared, it has been found to generate misinformation and hateful language. There have been no studies that have shown the same for the four American models compared to DeepSeek in CAISI’s study.
CAISI also pointed to an increased adoption of DeepSeek R1 since its release earlier this year, saying that downloads of DeepSeek models on model-sharing platforms “have increased nearly 1,000% since January 2025.”
Models compared across 19 benchmarks in the study included DeepSeek R1, R1-0528, and V3.1, and the four American models were OpenAI’s GPT-5, GPT-5-mini, and gpt-oss, and Anthropic’s Opus4.