Gaming News
| Published On May 4, 2026 9:38 am CEST | By Jenny Patel

Harvard Study Finds OpenAI o1 Ahead Of Doctor Baselines

Share

A Harvard Medical School and Beth Israel Deaconess Medical Center team tested OpenAI models against doctors in medical diagnosis, using real emergency room cases and other clinical tasks.


Good to Know

  • OpenAI o1 reached an exact or very close diagnosis in 67% of initial ER triage cases.
  • Two attending physicians scored 55% and 50% on the same triage test.
  • Researchers said hospitals still need real patient care trials before using AI for high-risk diagnosis.

Researchers Say Real ER Use Still Needs Testing

The strongest result came at the point where doctors usually have the least information. At initial ER triage, OpenAI o1 gave an exact or very close diagnosis in 67% of cases. One attending physician reached 55%, while another reached 50%.

Researchers did not frame the result as a green light for AI to run emergency rooms. Instead, the Science study called for an “urgent need for prospective trials to evaluate these technologies in real-world patient care settings.”

That warning matters because the test stayed inside text-based records. The team noted that “existing studies suggest that current foundation models are more limited in reasoning over nontext inputs.” In plain terms, charts, scans, images, physical exams, and bedside judgment still create harder problems for AI diagnosis tools.

177% up to 5BTC + 77 Free Spins!
New players only. Exclusive Welcome Bonus of 177% + 77 Free Spins
Casino

The study used 76 patients from the Beth Israel emergency room. OpenAI o1 and 4o received the same electronic medical record details available at each diagnosis point. Harvard Medical School said researchers did not “pre-process the data at all,” so the models did not get cleaned-up summaries or extra help.

Two other attending physicians then graded the answers without knowing which diagnosis came from a human doctor and which came from AI.

The study said:

“At each diagnostic touchpoint, o1 either performed nominally better than or on par with the two attending physicians and 4o,”

5BTC or 111% + 111 Free Spins!
New players only. Exclusive 111% Welcome Bonus + 111 Free Spins
Casino

It added that the gap looked clearest early in care, where pressure runs high and information stays thin:

“were especially pronounced at the first diagnostic touchpoint (initial ER triage), where there is the least information available about the patient and the most urgency to make the correct decision.”

Arjun Manrai, who heads an AI lab at Harvard Medical School and helped lead the study, said:

“We tested the AI model against virtually every benchmark, and it eclipsed both prior models and our physician baselines,”

Still, accountability remains a hard problem. Adam Rodman, a Beth Israel doctor and one of the study lead authors, commented to the Guardian that there is “no formal framework right now for accountability” around AI diagnoses. He also said patients still “want humans to guide them through life or death decisions [and] to guide them through challenging treatment decisions.”

Jenny Patel

Jenny Patel, a dedicated freelance writer, has been consumed by her love for gaming since her childhood days. Her go-to games growing up were Elder Scrolls V: Skyrim on PC and Halo 3 on XBOX. Jenny now enjoys the flexibility of working remotely, allowing her to explore the world while indulging in her gaming passion.

Tags: AI