Title |
---|
![]() Vision Language Models See What You Want but not What You See Qingying Gao Yijiang Li Haiyun Lyu Haoran Sun Dezhi Luo Hokin Deng |
![]() Probing Mechanical Reasoning in Large Vision Language Models Haoran Sun Qingying Gao Haiyun Lyu Dezhi Luo Yijiang Li Hokin Deng |
![]() CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Zhuoyi Yang Jiayan Teng Wendi Zheng Ming Ding Shiyu Huang ...Weihan Wang Yean Cheng Xiaotao Gu Yuxiao Dong Jie Tang |
![]() LVBench: An Extreme Long Video Understanding Benchmark Weihan Wang Zehai He Wenyi Hong Yean Cheng Xiaohan Zhang ...Shiyu Huang Bin Xu Yuxiao Dong Ming Ding Jie Tang |