Task Ability Decomposition and Difficulty Quantification of Visual Tasks for AGI Evaluation
Oct 27, 2025·
,,,,·
1 min read
Shaoyang Cui
X. Y. He
J. H. Han
Z. L. Zhang
Y. J. Peng

Abstract
With the rapid development of multi-modal foundation models and the pursuit of artificial general intelligence (AGD),there is a growing need for corresponding evaluation systems. Systematic AGI evaluation requires tasks that encompass a widerange of ability dimensions and difficulty levels. However, although many benchmarks exist, the field still lacks a quantificationsystem to assess ability decompositions or dificulty levels. Here, we took the visual domain as a starting point and proposed anexplainable system for task ability decomposition and dificulty level quantification of vision (TADDL-V). Using large languagemodels, TADDL-V decomposed the visual abilities required for a given task and leveraged statistical data to map between abilitysets and task difficulty levels. The estimated ability masses align with human intuition, and TADDL-V’s task difficulty estimatesare empirically validated against aggregated human comparisons of task difficulty. Furthermore, we proposed an AGI visualevaluation task set, AGI-V70, comprising 70 composite visual tasks that incorporate visual abilities across a broad spectrum oftask difficulties. Together, TADDL-V serves as a prototype for ability decomposition and task difficulty level quantification, whichare essential for future AGI evaluations.
Type
Publication
Science China Technological Sciences (JCR Q1, in press)
This work represents a significant advance in AGI evaluation methodology by providing the first comprehensive framework for understanding and quantifying visual task difficulty.
Key Contributions
- Novel Theoretical Framework: First exploration of task-ability space structure and its relationship to task difficulty
- TADDL-V Framework: Systematic approach for quantifying difficulty of visual tasks
- AGI-V70 Benchmark: Curated dataset for testing diverse visual abilities
- Practical Impact: Tools and methods that advance the field of AGI evaluation
Motivation
Using the visual domain as a starting point, this research addresses a critical gap in AGI evaluation by introducing a methodology to quantify the difficulty levels of composite tasks. This quantification is crucial for conducting a more comprehensive and fine-grained assessment of AGI systems.
To promote open science and collaborative advancement, the TADDL-V framework and the AGI-V70 benchmark are made freely available to the research community.
Visual teaser
