Title |
---|
![]() A Survey on Evaluation of Multimodal Large Language Models Jiaxing Huang Jingyi Zhang |
![]() Tur[k]ingBench: A Challenge Benchmark for Web Agents Kevin Xu Yeganeh Kordi Kate Sanders Yizhong Wang Adam Byerly Kate Sanders Adam Byerly Jingyu Zhang Benjamin Van Durme Daniel Khashabi |
![]() BEHAVIOR: Benchmark for Everyday Household Activities in Virtual,
Interactive, and Ecological Environments S. Srivastava Chengshu Li Michael Lingelbach Roberto Martín-Martín Fei Xia ...C. Karen Liu Silvio Savarese H. Gweon Jiajun Wu Li Fei-Fei |