Back to feed
MarkTechPost
MarkTechPost
6/18/2026
OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric

OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric

Short summary

OpenAI introduced LifeSciBench, a benchmark comprising 750 expert-authored tasks spanning seven biological domains, designed to rigorously evaluate AI models on authentic life-science research. Created by 173 PhD scientists using 19,020 rubric criteria to assess reasoning and decision-making—not just recall—the best model achieves only 36.1% pass rate, indicating substantial capability gaps in scientific reasoning.

  • OpenAI released LifeSciBench: 750 expert-written tasks evaluating AI on life-science research across seven biological domains
  • Built by 173 PhD scientists with 19,020 rubric criteria; grades reasoning and decisions rather than recall alone
  • Best model (GPT-Rosalind) achieves only 36.1% pass rate, revealing significant AI capability gaps for scientific tasks

Generated with AI, which can make mistakes.

Is this a good recommendation for you?

Explore more