OpenAI’s latest AI models are smarter, but they make things up more often. Here’s what we know

Photo of author

OpenAI launched its new 3 and o4 mini reasoning models on Wednesday with many new features. Some enthusiastic OpenAI employees even went on to state that o3 had is nearing Artificial General Intelligence (AGI) – a technical term which has no fixed definition but is usually meant to believe a stage when AI achieves near or equivalent level of intelligence as humans. However, as it turns a new document by OpenAI itself proves that its new AI models are prone to not just hallucination (making stuff up), but even more hallucinations than its previous reasoning and non-reasoning models. 

OpenAI had first rolled out its reasoning model last year which claims to mimic human level thinking in order to solve for more complex queries. However, with its latest and most powerful reasoning model yet, OpenAI says that it can make ‘accurate’ and ‘inaccurate’ claims. 

In its technical report for o3 and o4 mini (first reported by TechCrunch), OpenAI says “o3 tends to make more claims overall, leading to more accurate claims as well as more inaccurate/hallucinated claims. More research is needed to understand the cause of this result.”

As per OpenAI’s own PersonaQA benchmark, the company found that o3 hallucinated while responding to around 33% of the questions compared to the 16% hallucination rate for o1 and 14.8% for o3 mini.


Source link

Leave a Comment