InquiTree: Evaluating AI Agents in the Scientific Inquiry Loop with Paper-Derived Research Trees

Jun 8, 2026·

Shaoyang Cui

· 1 min read

InquiTree turns scientific papers into interactive Research Trees: logical DAGs over subtopic proposal, study design, result interpretation, and belief updating. Agents are evaluated through repeated propose, observe, revise, and conclude cycles, testing whether they can choose the next scientific move, absorb feedback, detect anomalous results, and decide when to draw conclusions.

The benchmark derives inquiry environments from neuroscience papers and reports diagnostic stress tests around long-horizon interaction, Fake Result detection, and temporal generalization across newer papers. Its public IT-18 subset releases open-access paper-derived configurations and logs for evaluating AI agents in scientific inquiry loops.

Last updated on Jun 8, 2026