Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.16784
Cited By
Four Eyes Are Better Than Two: Harnessing the Collaborative Potential of Large Models via Differentiated Thinking and Complementary Ensembles
22 May 2025
Jun Xie
Xiongjun Guan
Yingjian Zhu
Zhaoran Zhao
Xinming Wang
Feng Chen
Zhepeng Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Four Eyes Are Better Than Two: Harnessing the Collaborative Potential of Large Models via Differentiated Thinking and Complementary Ensembles"
14 / 14 papers shown
Title
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
91
26
0
31 Dec 2024
HCQA @ Ego4D EgoSchema Challenge 2024
Haoyu Zhang
Yuquan Xie
Yisen Feng
Zaijing Li
Meng Liu
Liqiang Nie
59
4
0
22 Jun 2024
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Zebang Cheng
Zhi-Qi Cheng
Jun-Yan He
Jingdong Sun
Kai Wang
Yuxiang Lin
Zheng Lian
Xiaojiang Peng
Alexander G. Hauptmann
MLLM
55
37
0
17 Jun 2024
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Xiaohan Wang
Yuhui Zhang
Orr Zohar
Serena Yeung-Levy
VLM
142
94
0
15 Mar 2024
Memory Consolidation Enables Long-Context Video Understanding
Ivana Balavzević
Yuge Shi
Pinelopi Papalampidi
Rahma Chaabouni
Skanda Koppula
Olivier J. Hénaff
140
26
0
08 Feb 2024
A Simple LLM Framework for Long-Range Video Question-Answering
Ce Zhang
Taixi Lu
Md. Mohaiminul Islam
Ziyang Wang
Shoubin Yu
Mohit Bansal
Gedas Bertasius
121
87
0
28 Dec 2023
LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos
Ying Wang
Yanlai Yang
Mengye Ren
70
17
0
07 Dec 2023
Zero-Shot Video Question Answering with Procedural Programs
Rohan Choudhury
Koichiro Niinuma
Kris M. Kitani
László A. Jeni
44
24
0
01 Dec 2023
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
K. Mangalam
Raiymbek Akshulakov
Jitendra Malik
68
275
0
17 Aug 2023
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
240
931
0
27 Apr 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
631
13,788
0
15 Mar 2023
Learning Video Representations from Large Language Models
Yue Zhao
Ishan Misra
Philipp Krahenbuhl
Rohit Girdhar
VLM
AI4TS
56
172
0
08 Dec 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
350
1,056
0
13 Oct 2021
Parameter-Efficient Transfer Learning for NLP
N. Houlsby
A. Giurgiu
Stanislaw Jastrzebski
Bruna Morrone
Quentin de Laroussilhe
Andrea Gesmundo
Mona Attariyan
Sylvain Gelly
189
4,368
0
02 Feb 2019
1