Ongoing Projects | Shaoyang Cui

VidNum1.4K - A Comprehensive Benchmark for Video-based Numerical Reasoning

Fri, 03 Apr 2026 00:00:00 +0000

This research introduces VNum, a comprehensive VideoQA benchmark containing 1,379 human-annotated video-question pairs designed to test multi-step numerical reasoning in Vision-Language Models (VLMs). Moving beyond simple counting, VNum spans diverse real-world environments to quantify objects, actions, and events through a unique three-level hierarchy.

ClawTrap - MITM-Based Red-Teaming for OpenClaw Security Evaluation

Thu, 02 Apr 2026 00:00:00 +0000

This research introduces ClawTrap, a MITM-based red-teaming framework designed for the real-world security evaluation of autonomous web agents like OpenClaw. To bridge the gap between static sandbox testing and live network threats, ClawTrap provides a reproducible pipeline for rule-driven interception, transformation, and auditing at the network layer.

TradeCraft - Exploring Theory of Mind in LLM Agents' Strategic Decision-Making and Communication

Mon, 01 Dec 2025 00:00:00 +0000

LLM agents’ reliance on implicit Theory of Mind (ToM) during strategic decision-making remains debated. We investigate this using a Minecraft-inspired “trade-and-craft” game requiring goal inference and item exchange. By augmenting agents with explicit ToM scaffolding—where players report multi-order beliefs about opponents—we evaluate the alignment between inferred mental states and behavioral outcomes.