Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2310.01852
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
International Conference on Learning Representations (ICLR), 2023
3 October 2023
Bin Zhu
Bin Lin
Munan Ning
Yang Yan
Jiaxi Cui
HongFa Wang
Yatian Pang
Wenhao Jiang
Junwu Zhang
Zongwei Li
Wancai Zhang
Zhifeng Li
Wei Liu
Liejie Yuan
VLM
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Github (810★)
Papers citing
"LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment"
50 / 122 papers shown
Title
LVC: A Lightweight Compression Framework for Enhancing VLMs in Long Video Understanding
Ziyi Wang
Haoran Wu
Yiming Rong
Deyang Jiang
Yixin Zhang
Yue Zhao
Shuang Xu
Bo Xu
VLM
189
3
0
09 Apr 2025
PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
Xinpeng Ding
Jianchao Tan
Jinahua Han
Lanqing Hong
Hang Xu
Xuelong Li
MLLM
VLM
1.1K
3
0
08 Apr 2025
Post-processing for Fair Regression via Explainable SVD
International Conference on Artificial Intelligence and Statistics (AISTATS), 2025
Zhiqun Zuo
Ding Zhu
Mohammad Mahdi Khalili
870
0
0
04 Apr 2025
Safety Modulation: Enhancing Safety in Reinforcement Learning through Cost-Modulated Rewards
Hanping Zhang
Yuhong Guo
OffRL
276
2
0
03 Apr 2025
Aligned Better, Listen Better for Audio-Visual Large Language Models
International Conference on Learning Representations (ICLR), 2025
Yuxin Guo
Shuailei Ma
Shijie Ma
Xiaoyi Bao
Chen-Wei Xie
Kecheng Zheng
Tingyu Weng
Siyang Sun
Yun Zheng
Wei Zou
MLLM
AuLLM
295
7
0
02 Apr 2025
Understanding Co-speech Gestures in-the-wild
Sindhu B. Hegde
KR Prajwal
Taein Kwon
Andrew Zisserman
SLR
351
2
0
28 Mar 2025
MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Saron Samuel
Dan DeGenaro
Jimena Guallar-Blasco
Kate Sanders
Oluwaseun Eisape
...
David Etter
Efsun Kayi
Matthew Wiesner
Kenton W. Murray
Reno Kriz
439
4
0
26 Mar 2025
Can Text-to-Video Generation help Video-Language Alignment?
Computer Vision and Pattern Recognition (CVPR), 2025
Luca Zanella
Goran Frehse
Willi Menapace
Sergey Tulyakov
Yiming Wang
Elisa Ricci
DiffM
VGen
306
1
0
24 Mar 2025
EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining
Boshen Xu
Yuting Mei
Xinbi Liu
Sipeng Zheng
Qin Jin
VLM
MDE
495
2
0
19 Mar 2025
Continual Multimodal Contrastive Learning
Xiaohao Liu
Xiaobo Xia
See-Kiong Ng
Tat-Seng Chua
CLL
663
8
0
19 Mar 2025
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds
Computer Vision and Pattern Recognition (CVPR), 2025
E. Shaar
Ariel Shaulov
Gal Chechik
Lior Wolf
VLM
323
1
0
17 Mar 2025
Language-guided Open-world Video Anomaly Detection under Weak Supervision
Zihao Liu
Xiaoyu Wu
Jianqin Wu
Xuxu Wang
Linlin Yang
232
4
0
17 Mar 2025
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
Jonas Belouadi
Eddy Ilg
Margret Keuper
Hideki Tanaka
Masao Utiyama
Mary Dabre
Steffen Eger
Simone Paolo Ponzetto
558
2
0
14 Mar 2025
UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?
Yuanxin Liu
Rui Zhu
Shuhuai Ren
Jiacong Wang
Haoyuan Guo
Xu Sun
Lu Jiang
811
2
0
13 Mar 2025
Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model
Ali Vosoughi
Dimitra Emmanouilidou
H. Gamper
429
2
0
12 Mar 2025
Memory-enhanced Retrieval Augmentation for Long Video Understanding
Huaying Yuan
Zhengyang Liang
Minhao Qin
Hongjin Qian
Yan Shu
Zhicheng Dou
Ji-Rong Wen
Andrii Zadaianchuk
VOS
RALM
VLM
306
9
0
12 Mar 2025
Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding
Haoyu Zhang
Qiaohui Chu
Meng Liu
Yunxiao Wang
Bin Wen
Fan Yang
EgoV
438
12
0
12 Mar 2025
Continual Learning for Multiple Modalities
Hyundong Jin
Eunwoo Kim
CLL
434
0
0
11 Mar 2025
Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance
Jiayi Zhao
Fei Teng
Kai Luo
Guoqiang Zhao
Hui Yuan
Xu Zheng
Kailun Yang
VLM
330
9
0
04 Mar 2025
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
J.N. Zhang
Xuan Yang
Tianfu Wang
Yu Yao
Aleksandr Petiushko
B. Li
422
10
0
28 Feb 2025
Can Hallucination Correction Improve Video-Language Alignment?
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Lingjun Zhao
Mingyang Xie
Paola Cascante-Bonilla
Hal Daumé III
Kwonjoon Lee
HILM
VLM
315
1
0
20 Feb 2025
Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
IEEE Robotics and Automation Letters (IEEE RA-L), 2025
Rui Zhao
Qirui Yuan
Jinyu Li
Haofeng Hu
Yun Li
Chengyuan Zheng
Fei Gao
LRM
279
18
0
19 Feb 2025
Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Yuqi Pang
Bowen Yang
Haoqin Tu
Yun Cao
Zeyu Zhang
LRM
MLLM
228
1
0
17 Feb 2025
Uni-Retrieval: A Multi-Style Retrieval Framework for STEM's Education
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yanhao Jia
Xinyi Wu
Hao Li
Qinglin Zhang
Yuxiao Hu
Shuai Zhao
Wenqi Fan
506
14
0
09 Feb 2025
TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data
International Conference on Learning Representations (ICLR), 2024
Jeremy Irvin
Emily Ruoyu Liu
Joyce Chuyi Chen
Ines Dormoy
Jinyoung Kim
Samar Khanna
Zhuo Zheng
Stefano Ermon
MLLM
VLM
402
43
0
28 Jan 2025
The "Law" of the Unconscious Contrastive Learner: Probabilistic Alignment of Unpaired Modalities
International Conference on Learning Representations (ICLR), 2025
Yongwei Che
Benjamin Eysenbach
290
1
0
20 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
IEEE Access (IEEE Access), 2024
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
446
6
0
10 Jan 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Peng Jin
Haoyang Li
Li Yuan
Shuicheng Yan
Jie Chen
367
4
0
31 Dec 2024
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Computer Vision and Pattern Recognition (CVPR), 2024
Ho Kei Cheng
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Alex Schwing
Yuki Mitsufuji
VGen
489
65
0
19 Dec 2024
Do Language Models Understand Time?
The Web Conference (WWW), 2024
Xi Ding
Lei Wang
840
10
0
18 Dec 2024
Gramian Multimodal Representation Learning and Alignment
International Conference on Learning Representations (ICLR), 2024
Giordano Cicchetti
Eleonora Grassucci
Luigi Sigillo
Danilo Comminiello
418
26
0
16 Dec 2024
Expanding Event Modality Applications through a Robust CLIP-Based Encoder
SungHeon Jeong
Hanning Chen
Sanggeon Yun
Suhyeon Cho
Wenjun Huang
Xiangjian Liu
Mohsen Imani
425
2
0
04 Dec 2024
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
Yiwu Zhong
Zhuoming Liu
Yin Li
Liwei Wang
406
19
0
04 Dec 2024
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Computer Vision and Pattern Recognition (CVPR), 2024
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Zichun Liao
Yusuke Kato
Kazuki Kozuka
Aditya Grover
VGen
400
25
0
02 Dec 2024
VideoSAVi: Self-Aligned Video Language Models without Human Supervision
Yogesh Kulkarni
Pooyan Fazli
VLM
590
5
0
01 Dec 2024
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Computer Vision and Pattern Recognition (CVPR), 2024
Zongjian Li
Bin Lin
Yang Ye
Liuhan Chen
Xinhua Cheng
Shenghai Yuan
Li-xin Yuan
VGen
DiffM
502
30
0
26 Nov 2024
ReWind: Understanding Long Videos with Instructed Learnable Memory
Computer Vision and Pattern Recognition (CVPR), 2024
Anxhelo Diko
Tinghuai Wang
Wassim Swaileh
Shiyan Sun
Ioannis Patras
KELM
VLM
347
4
0
23 Nov 2024
Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge
Ruiyang Qin
Dancheng Liu
Gelei Xu
Zheyu Yan
Chenhui Xu
Yuting Hu
Xiaolin Hu
Jinjun Xiong
Yiyu Shi
Y. Shi
AuLLM
499
2
0
21 Nov 2024
Generative Emotion Cause Explanation in Multimodal Conversations
International Conference on Multimedia Retrieval (ICMR), 2024
Lin Wang
Xiaocui Yang
Shi Feng
Daling Wang
Yifei Zhang
Zhitao Zhang
446
1
0
01 Nov 2024
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
Computer Vision and Pattern Recognition (CVPR), 2024
Reno Kriz
Kate Sanders
David Etter
Kenton W. Murray
Cameron Carpenter
...
Alexander Martin
Ronald Colaianni
Nolan King
Eugene Yang
Benjamin Van Durme
VGen
434
6
0
15 Oct 2024
Deep Correlated Prompting for Visual Recognition with Missing Modalities
Neural Information Processing Systems (NeurIPS), 2024
Lianyu Hu
Tongkai Shi
Wei Feng
Fanhua Shang
Liang Wan
VLM
402
11
0
09 Oct 2024
Human-in-the-loop Reasoning For Traffic Sign Detection: Collaborative Approach Yolo With Video-llava
Mehdi Azarafza
Fatima Idrees
Ali Ehteshami Bejnordi
Charles Steinmetz
Stefan Henkler
A. Rettberg
305
2
0
07 Oct 2024
Geometric Analysis of Reasoning Trajectories: A Phase Space Approach to Understanding Valid and Invalid Multi-Hop Reasoning in LLMs
Javier Marin
LRM
488
184
0
06 Oct 2024
LLaVA-Video: Video Instruction Tuning With Synthetic Data
Yuanhan Zhang
Jinming Wu
W. Li
Bo Li
Zejun Ma
Ziwei Liu
Chunyuan Li
SyDa
VGen
484
361
0
03 Oct 2024
Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations
Minoh Jeong
Min Namgung
Min Namgung
Luan Tuyen Chau
Yao-Yi Chiang
Alfred Hero
397
3
0
02 Oct 2024
Designing Interfaces for Multimodal Vector Search Applications
Owen Pendrigh Elliott
Tom Hamer
Jesse Clark
167
0
0
18 Sep 2024
One missing piece in Vision and Language: A Survey on Comics Understanding
Emanuele Vivoli
Andrey Barsky
Mohamed Ali Souibgui
Artemis LLabres
Marco Bertini
Dimosthenis Karatzas
312
7
0
14 Sep 2024
EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning
Bohao Xing
Zitong Yu
Xin Liu
Kaishen Yuan
Qilang Ye
Weicheng Xie
Huanjing Yue
Jingyu Yang
Heikki Kälviäinen
185
23
0
21 Aug 2024
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
IEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024
Chunyu Qiang
Wang Geng
Yi Zhao
Ruibo Fu
Tao Wang
...
Chen Zhang
Hao Che
L. Wang
Jianwu Dang
Jianhua Tao
AI4TS
289
7
0
11 Aug 2024
VideoQA in the Era of LLMs: An Empirical Study
International Journal of Computer Vision (IJCV), 2024
Junbin Xiao
Nanxin Huang
Hangyu Qin
Dongyang Li
Yicong Li
...
Zhulin Tao
Jianxing Yu
Liang Lin
Tat-Seng Chua
Angela Yao
331
23
0
08 Aug 2024
Previous
1
2
3
Next