Ongoing Projects | Shaoyang Cui

Ongoing Projects

VidNum1.4K - A Comprehensive Benchmark for Video-based Numerical Reasoning

This research introduces VNum, a comprehensive VideoQA benchmark containing 1,379 human-annotated video-question pairs designed to test multi-step numerical reasoning in Vision-Language Models (VLMs). Moving beyond simple counting, VNum spans diverse real-world environments to quantify objects, actions, and events through a unique three-level hierarchy.

Apr 3, 2026

ClawTrap - MITM-Based Red-Teaming for OpenClaw Security Evaluation

This research introduces ClawTrap, a MITM-based red-teaming framework designed for the real-world security evaluation of autonomous web agents like OpenClaw. To bridge the gap between static sandbox testing and live network threats, ClawTrap provides a reproducible pipeline for rule-driven interception, transformation, and auditing at the network layer.

Apr 2, 2026

Theory-of-Mind(ToM)

TradeCraft - Exploring Theory of Mind in LLM Agents' Strategic Decision-Making and Communication

LLM agents’ reliance on implicit Theory of Mind (ToM) during strategic decision-making remains debated. We investigate this using a Minecraft-inspired “trade-and-craft” game requiring goal inference and item exchange. By augmenting agents with explicit ToM scaffolding—where players report multi-order beliefs about opponents—we evaluate the alignment between inferred mental states and behavioral outcomes.

Dec 1, 2025