Title |
---|
![]() ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation Cheng Yang Chufan Shi Yaxin Liu Bo Shui Junjie Wang ...Yuxiang Zhang Gongye Liu Xiaomei Nie Deng Cai Yujiu Yang |
![]() The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models Seungone Kim Juyoung Suk Ji Yong Cho Shayne Longpre Chaeeun Kim ...Sean Welleck Graham Neubig Moontae Lee Kyungjae Lee Minjoon Seo |
![]() Tur[k]ingBench: A Challenge Benchmark for Web Agents Kevin Xu Yeganeh Kordi Kate Sanders Yizhong Wang Adam Byerly Kate Sanders Adam Byerly Jingyu Zhang Benjamin Van Durme Daniel Khashabi |
![]() ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots Yu-Chung Hsiao Fedir Zubach Maria Wang Jindong Chen Victor Carbune Jason Lin Maria Wang Yun Zhu Jindong Chen |