MarkTechPost
6/18/2026

OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric
Short summary
OpenAI introduced LifeSciBench, a benchmark comprising 750 expert-authored tasks spanning seven biological domains, designed to rigorously evaluate AI models on authentic life-science research. Created by 173 PhD scientists using 19,020 rubric criteria to assess reasoning and decision-making—not just recall—the best model achieves only 36.1% pass rate, indicating substantial capability gaps in scientific reasoning.
- •OpenAI released LifeSciBench: 750 expert-written tasks evaluating AI on life-science research across seven biological domains
- •Built by 173 PhD scientists with 19,020 rubric criteria; grades reasoning and decisions rather than recall alone
- •Best model (GPT-Rosalind) achieves only 36.1% pass rate, revealing significant AI capability gaps for scientific tasks
Generated with AI, which can make mistakes.
Is this a good recommendation for you?



