Task Ability Decomposition and Difficulty Quantification of Visual Tasks for AGI Evaluation

Oct 27, 2025·

Shaoyang Cui

X. Y. He

J. H. Han

Z. L. Zhang

Y. J. Peng

· 1 min read

Cite Code Source Document DOI

Abstract

With the rapid development of multi-modal foundation models and the pursuit of artificial general intelligence (AGD),there is a growing need for corresponding evaluation systems. Systematic AGI evaluation requires tasks that encompass a widerange of ability dimensions and difficulty levels. However, although many benchmarks exist, the field still lacks a quantificationsystem to assess ability decompositions or dificulty levels. Here, we took the visual domain as a starting point and proposed anexplainable system for task ability decomposition and dificulty level quantification of vision (TADDL-V). Using large languagemodels, TADDL-V decomposed the visual abilities required for a given task and leveraged statistical data to map between abilitysets and task difficulty levels. The estimated ability masses align with human intuition, and TADDL-V’s task difficulty estimatesare empirically validated against aggregated human comparisons of task difficulty. Furthermore, we proposed an AGI visualevaluation task set, AGI-V70, comprising 70 composite visual tasks that incorporate visual abilities across a broad spectrum oftask difficulties. Together, TADDL-V serves as a prototype for ability decomposition and task difficulty level quantification, whichare essential for future AGI evaluations.

Type

Journal article

Publication

Science China Technological Sciences (JCR Q1, in press)

This work represents a significant advance in AGI evaluation methodology by providing the first comprehensive framework for understanding and quantifying visual task difficulty.

Key Contributions

Novel Theoretical Framework: First exploration of task-ability space structure and its relationship to task difficulty
TADDL-V Framework: Systematic approach for quantifying difficulty of visual tasks
AGI-V70 Benchmark: Curated dataset for testing diverse visual abilities
Practical Impact: Tools and methods that advance the field of AGI evaluation

Motivation

Using the visual domain as a starting point, this research addresses a critical gap in AGI evaluation by introducing a methodology to quantify the difficulty levels of composite tasks. This quantification is crucial for conducting a more comprehensive and fine-grained assessment of AGI systems.

To promote open science and collaborative advancement, the TADDL-V framework and the AGI-V70 benchmark are made freely available to the research community.

Visual teaser

Last updated on Oct 27, 2025