• Contact
Sunday, May 18, 2025
Register
Login
European Press
Advertisement
  • News
  • Business
  • Tech
  • Sport
  • Health
  • Entertainment
  • Lifestyle
  • Video
No Result
View All Result
  • News
  • Business
  • Tech
  • Sport
  • Health
  • Entertainment
  • Lifestyle
  • Video
No Result
View All Result
European Press
No Result
View All Result

OpenAI unveils HealthBench to evaluate LLMs’ safety in healthcare

15 May 2025
in Health
Reading Time: 3 mins read
A A
OpenAI unveils HealthBench to evaluate LLMs’ safety in healthcare
ShareShareShareShareShare


OpenAI has announced the launch of HealthBench, a benchmark to evaluate AI models in healthcare using real-world applicability and physician judgment. 
“The 5,000 conversations in HealthBench simulate interactions between AI models and individual users or clinicians. The task for a model is to provide the best possible response to the user’s last message,” the company said in a statement. 
OpenAI built the benchmark with 262 physicians in 60 countries, who are proficient in 49 languages and have training in 26 medical specialties. 
HealthBench includes 5,000 health conversations, each with a physician-created rubric to evaluate model responses. The rubric evaluation includes 48,562 unique rubric criteria. 
The company said the conversations were created through “synthetic generation and human adversarial testing,” are multilingual, and span various medical specialties and contexts.  
“Every model response is graded against a set of physician-written rubric criteria specific to that conversation,” the company said. 
“Each criterion outlines what an ideal response should include or avoid (e.g., a specific fact to include or unnecessarily technical jargon to avoid). Each criterion has a corresponding point value, weighted to match the physician’s judgment of that criterion’s importance.” 
The model’s responses are evaluated using GPT-4.1 to determine if each rubric criterion is met. An overall score based on the criteria being met is shown to the user and compared to the maximum possible score. 
HealthBench is split into seven themes: expertise-tailored communication, response depth, emergency referrals, health data tasks, global health, responding under uncertainty and context seeking.
“Evaluations like HealthBench are part of our ongoing efforts to understand model behavior in high-impact settings and help ensure progress is directed toward real-world benefit,” the company said. 
“Our findings show that large language models have improved significantly over time and already outperform experts in writing responses to examples tested in our benchmark. Yet even the most advanced systems still have substantial room for improvement, particularly in seeking necessary context for underspecified queries and worst-case reliability. We look forward to sharing results for future models.”
The tools are publicly available on GitHub. 
THE LARGER TREND
OpenAI’s CEO, Sam Altman, was part of President Donald Trump’s press conference earlier this year announcing the launch of Project Stargate. This $500 billion project would focus on developing the physical and virtual infrastructure to power AI construction, including AI to improve health outcomes. 
The partners, which also included Oracle’s chief technology officer, Larry Ellison, and SoftBank’s CEO, Masayoshi Son, touted the project as a game changer for healthcare.
Altman said during the press conference that he is thrilled to be part of Stargate and anticipates that diseases will be cured at an unprecedented rate. 
Ellison added that a cancer vaccine is one of the “most exciting” things the group is working on, using the tools that Altman and Son are providing.
Earlier this month, the Financial Times reported that Project Stargate was considering international expansion, with its top country of choice being the UK. Germany and France are also attractive candidates. 
However, this week, Bloomberg reported that the project is facing delays due to the tariffs imposed by Trump and economic uncertainty. 
Due to economic uncertainty and growing market volatility, banks and institutional investors are wary of investing in Stargate, especially as data center build-out costs are uncertain due to U.S. tariffs, particularly on chips, server racks and cooling systems.   
Additionally, SoftBank, which pledged to donate an immediate $100 billion in the project with the goal of it becoming $500 billion within the next four years, has yet to develop a financing template or start discussions with potential backers, according to Bloomberg.  

Credit: Source link

Related Posts:

  • OpenAI unveils HealthBench to evaluate LLMs’ safety in healthcare
    OpenAI unveils HealthBench to evaluate LLMs' safety…
  • Study: AI hallucinations limit reliability of foundation models
    Study: AI hallucinations limit reliability of…
  • Hippocratic AI partners with UAE services provider to expand into Middle East, North Africa
    Hippocratic AI partners with UAE services provider…
  • Hippocratic AI, EUCALIA partner to bring generative AI to Japan
    Hippocratic AI, EUCALIA partner to bring generative…
ShareTweetSendPinShare
Previous Post

Ukraine signs historic rare earth minerals deal with Washington

Next Post

Sony launches new flagship XM6 headphones: Order them now

Next Post
Sony launches new flagship XM6 headphones: Order them now

Sony launches new flagship XM6 headphones: Order them now

Recommended

Eamonn Holmes health update following ambulance dash: ‘He was very lucky’

Eamonn Holmes health update following ambulance dash: ‘He was very lucky’

8 May 2025
Insta360 X5 action camera announcement: Key specs, pricing

Insta360 X5 action camera announcement: Key specs, pricing

22 April 2025
Celebrity Big Brother final winnings and why winner won’t see a penny

Celebrity Big Brother final winnings and why winner won’t see a penny

25 April 2025
‘He’s not ready for a kid’

‘He’s not ready for a kid’

16 May 2025
‘Andor’ season 2 finale, explained

‘Andor’ season 2 finale, explained

15 May 2025
European Press

European-press.com shares the latest news from Europe and around the world. It covers topics such as business, technology, sports, health, entertainment, and lifestyle. Feel free to get in touch with us!

Disclaimer  Privacy Policy – EU  Imprint 

Contact Us

What’s New Here!

  • The secret behind Twins’ 13-game winning streak
  • Albania elections: Prime Minister Edi Rama seeks fourth term in office
  • No ‘malicious’ intent on Angel Reese foul
  • Fire at Spanish chemical warehouse forces evacuation and lockdown

Subscribe to Our Newsletter

Copyright 2025 © EUROPEAN PRESS All rights on our posts reserved!

Translate »
European Press
Manage Cookie Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behaviour or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
View preferences
{title} {title} {title}
No Result
View All Result
  • News
  • Business
  • Tech
  • Sport
  • Health
  • Entertainment
  • Lifestyle
  • Video

Copyright 2025 © EUROPEAN PRESS All rights on our posts reserved!

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?
×