Title |
---|
![]() AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? Han Bao Yue Huang Yanbo Wang Jiayi Ye Xiangqi Wang Xiuying Chen Mohamed Elhoseiny X. Zhang Mohamed Elhoseiny Xiangliang Zhang |
![]() MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA Hanrong Ye Haotian Zhang Erik Daxberger Lin Chen Zongyu Lin ...Haoxuan You Dan Xu Zhe Gan Jiasen Lu Yinfei Yang |
![]() JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images Zhecan Wang Junzhang Liu Chia-Wei Tang Hani Alomari Anushka Sivakumar ...Haoxuan You A. Ishmam Kai-Wei Chang Shih-Fu Chang Chris Thomas |
![]() EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model Feipeng Ma Yizhou Zhou Hebei Li Zilong He Siying Wu Fengyun Rao Siying Wu Fengyun Rao Yueyi Zhang Xiaoyan Sun |
![]() Visual Agents as Fast and Slow Thinkers Guangyan Sun Mingyu Jin Zhenting Wang Cheng-Long Wang Siqi Ma Qifan Wang Ying Nian Wu Ying Nian Wu Dongfang Liu Dongfang Liu |