• Contact
Sunday, June 22, 2025
Register
Login
European Press
Advertisement
  • News
  • Business
  • Tech
  • Sport
  • Health
  • Entertainment
  • Lifestyle
  • Video
No Result
View All Result
  • News
  • Business
  • Tech
  • Sport
  • Health
  • Entertainment
  • Lifestyle
  • Video
No Result
View All Result
European Press
No Result
View All Result

OpenAI unveils HealthBench to evaluate LLMs’ safety in healthcare

15 May 2025
in Health
Reading Time: 3 mins read
A A
OpenAI unveils HealthBench to evaluate LLMs’ safety in healthcare
ShareShareShareShareShare


OpenAI has announced the launch of HealthBench, a benchmark to evaluate AI models in healthcare using real-world applicability and physician judgment. 
“The 5,000 conversations in HealthBench simulate interactions between AI models and individual users or clinicians. The task for a model is to provide the best possible response to the user’s last message,” the company said in a statement. 
OpenAI built the benchmark with 262 physicians in 60 countries, who are proficient in 49 languages and have training in 26 medical specialties. 
HealthBench includes 5,000 health conversations, each with a physician-created rubric to evaluate model responses. The rubric evaluation includes 48,562 unique rubric criteria. 
The company said the conversations were created through “synthetic generation and human adversarial testing,” are multilingual, and span various medical specialties and contexts.  
“Every model response is graded against a set of physician-written rubric criteria specific to that conversation,” the company said. 
“Each criterion outlines what an ideal response should include or avoid (e.g., a specific fact to include or unnecessarily technical jargon to avoid). Each criterion has a corresponding point value, weighted to match the physician’s judgment of that criterion’s importance.” 
The model’s responses are evaluated using GPT-4.1 to determine if each rubric criterion is met. An overall score based on the criteria being met is shown to the user and compared to the maximum possible score. 
HealthBench is split into seven themes: expertise-tailored communication, response depth, emergency referrals, health data tasks, global health, responding under uncertainty and context seeking.
“Evaluations like HealthBench are part of our ongoing efforts to understand model behavior in high-impact settings and help ensure progress is directed toward real-world benefit,” the company said. 
“Our findings show that large language models have improved significantly over time and already outperform experts in writing responses to examples tested in our benchmark. Yet even the most advanced systems still have substantial room for improvement, particularly in seeking necessary context for underspecified queries and worst-case reliability. We look forward to sharing results for future models.”
The tools are publicly available on GitHub. 
THE LARGER TREND
OpenAI’s CEO, Sam Altman, was part of President Donald Trump’s press conference earlier this year announcing the launch of Project Stargate. This $500 billion project would focus on developing the physical and virtual infrastructure to power AI construction, including AI to improve health outcomes. 
The partners, which also included Oracle’s chief technology officer, Larry Ellison, and SoftBank’s CEO, Masayoshi Son, touted the project as a game changer for healthcare.
Altman said during the press conference that he is thrilled to be part of Stargate and anticipates that diseases will be cured at an unprecedented rate. 
Ellison added that a cancer vaccine is one of the “most exciting” things the group is working on, using the tools that Altman and Son are providing.
Earlier this month, the Financial Times reported that Project Stargate was considering international expansion, with its top country of choice being the UK. Germany and France are also attractive candidates. 
However, this week, Bloomberg reported that the project is facing delays due to the tariffs imposed by Trump and economic uncertainty. 
Due to economic uncertainty and growing market volatility, banks and institutional investors are wary of investing in Stargate, especially as data center build-out costs are uncertain due to U.S. tariffs, particularly on chips, server racks and cooling systems.   
Additionally, SoftBank, which pledged to donate an immediate $100 billion in the project with the goal of it becoming $500 billion within the next four years, has yet to develop a financing template or start discussions with potential backers, according to Bloomberg.  

Credit: Source link

Related Posts:

  • OpenAI unveils HealthBench to evaluate LLMs’ safety in healthcare
    OpenAI unveils HealthBench to evaluate LLMs' safety…
  • Microsoft adds Elon Musk's Grok 3 to Azure, citing healthcare and science use cases
    Microsoft adds Elon Musk's Grok 3 to Azure, citing…
  • Hippocratic AI partners with UAE services provider to expand into Middle East, North Africa
    Hippocratic AI partners with UAE services provider…
  • Google launches MedGemma for healthcare app developers
    Google launches MedGemma for healthcare app developers
ShareTweetSendPinShare
Previous Post

Ukraine signs historic rare earth minerals deal with Washington

Next Post

Sony launches new flagship XM6 headphones: Order them now

Next Post
Sony launches new flagship XM6 headphones: Order them now

Sony launches new flagship XM6 headphones: Order them now

Recommended

How Jeanie Buss will factor into Lakers after  billion sale

How Jeanie Buss will factor into Lakers after $10 billion sale

19 June 2025
Europeans shooting attractions, tour buses with water guns to protest ‘touristification’ of cities

Europeans shooting attractions, tour buses with water guns to protest ‘touristification’ of cities

8 June 2025
I took care of my late wife, but my in-laws don’t want me dating

I took care of my late wife, but my in-laws don’t want me dating

9 June 2025
Latest news bulletin | May 15th – Morning

Latest news bulletin | May 15th – Morning

24 May 2025
Scientists just discovered a mysterious new world far beyond Pluto

Scientists just discovered a mysterious new world far beyond Pluto

25 May 2025
European Press

European-press.com shares the latest news from Europe and around the world. It covers topics such as business, technology, sports, health, entertainment, and lifestyle. Feel free to get in touch with us!

Disclaimer  Privacy Policy – EU  Imprint 

Contact Us

What’s New Here!

  • Saudi Arabia tackles sweltering heat as Muslim pilgrims embark on Hajj journey
  • ‘My mum stopped talking to me’
  • SI Swimsuit model Camille Kostek claims NFL wives judged her for not having wedding ring
  • My family thinks I’m making up my long-distance boyfriend

Subscribe to Our Newsletter

Copyright 2025 © EUROPEAN PRESS All rights on our posts reserved!

Translate »
European Press
Manage Cookie Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behaviour or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
View preferences
{title} {title} {title}
No Result
View All Result
  • News
  • Business
  • Tech
  • Sport
  • Health
  • Entertainment
  • Lifestyle
  • Video

Copyright 2025 © EUROPEAN PRESS All rights on our posts reserved!

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?
×