OpenAI’s Reasoning Models Are “Hallucinating” And Creators Have No Idea Why: Report

by oqtey
OpenAI's Reasoning Models Are "Hallucinating" And Creators Have No Idea Why: Report

OpenAI’s recently launched o3 and o4-mini AI models are prone to hallucinations, more often than the company’s previous reasoning models, a report in TechCrunch has claimed. The ChatGPT creators launched the models on Wednesday (Apr 16), which are designed to pause and work through questions before responding.

However, as per OpenAI’s internal tests, the two new models are hallucinating or making things up much more frequently than even the non-reasoning models, such as GPT-4o. The company does not have an idea why this is happening.

In a technical report, OpenAI said “more research is needed” to understand why hallucinations are getting worse as it scales up reasoning models.

“Our hypothesis is that the kind of reinforcement learning used for o-series models may amplify issues that are usually mitigated (but not fully erased) by standard post-training pipelines,” a former OpenAI employee was quoted as saying by the publication.

Experts claim that while hallucinations may help the models develop creative and interesting ideas, they could also make it a tough sell for businesses in a market where accuracy is the paramount benchmark to achieve.

OpenAI has been betting heavily on the new models to beat the likes of Google, Meta, xAI, Anthropic, and DeepSeek in the cutthroat global AI race. As per the Sam Altman-led company, o3 achieves state-of-the-art performance on SWE-bench verified — a test measuring coding abilities, scoring 69.1 per cent. Meanwhile, the o4-mini model achieves similar performance, scoring 68.1 per cent.

Also Read | Can US Manipulate Space And Time? White House Tech Chief’s Speech Triggers Conspiracy Theories

ChatGPT makes people lonely

Earlier this month, a joint study conducted by OpenAI and MIT Media Lab found that ChatGPT might be making its most frequent users more lonely. While feelings of loneliness and social isolation are often influenced by various factors, the study authors concluded that participants who trusted and “bonded” with ChatGPT more were likelier than others to be lonely and to rely on it more.

Though the technology is still in its nascent stage, researchers said the study may help start a conversation about its full impact on the mental health of users.


Related Posts

Leave a Comment