FAB - Factory of Abstract-style Benchmark

Nov 1, 2024 · 1 min read

Developed the first fully automated, low-cost benchmark generation framework for abstract-style evaluation across general-purpose domains. Enables scalable testing of large language models using structured abstraction errors, covering semantic, structural, and factual variants. Repository: https://github.com/spidermonk7/FAB-Benchmark

Last updated on Nov 1, 2024

AGI Evaluation Benchmark Generation Large Language Models Python

Authors

Shaoyang Cui

Research Assistant

Computational Robustness of Tall Pyramidal Cells Sep 1, 2022 →