Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Neural Information Processing Systems (NeurIPS), 2019
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,231 papers shown
Title
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
Nam V. Nguyen
Thong T. Doan
Luong Tran
Van Nguyen
Quang Pham
MoE
568
4
0
01 Nov 2024
IO Transformer: Evaluating SwinV2-Based Reward Models for Computer Vision
Maxwell Meyer
Jack Spruyt
ViT
98
0
0
31 Oct 2024
An Information Criterion for Controlled Disentanglement of Multimodal Data
International Conference on Learning Representations (ICLR), 2024
Chenyu Wang
Sharut Gupta
Xinyi Zhang
Sana Tonekaboni
Stefanie Jegelka
Tommi Jaakkola
Caroline Uhler
DRL
336
6
0
31 Oct 2024
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Bo Jiang
Shaoyu Chen
Bencheng Liao
Xingyu Zhang
Wei Yin
Qian Zhang
Chang Huang
Wen Liu
Xinyu Wang
VLM
MLLM
LRM
225
71
0
29 Oct 2024
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Donghoon Kim
Gusang Lee
Kyuhong Shim
B. Shim
250
5
0
29 Oct 2024
Improving Generalization in Visual Reasoning via Self-Ensemble
Tien-Huy Nguyen
Quang-Khai Tran
Anh-Tuan Quang-Hoang
VLM
LRM
242
9
0
28 Oct 2024
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest
Xupeng Chen
Zhixin Lai
Kangrui Ruan
Shichu Chen
Jiaxiang Liu
Zuozhu Liu
557
16
0
27 Oct 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
Applied Soft Computing (Appl. Soft Comput.), 2024
David Ortiz-Perez
Manuel Benavent-Lledo
José García Rodríguez
David Tomás
M. Flores Vizcaya-Moreno
207
3
0
24 Oct 2024
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
International Journal of Computer Vision (IJCV), 2024
Zhiwei Hao
Jianyuan Guo
Li Shen
Yong Luo
Han Hu
Yonggang Wen
VLM
255
3
0
23 Oct 2024
ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question Answering
Pacific Asia Conference on Language, Information and Computation (PACLIC), 2024
Nghia Hieu Nguyen
Tho Thanh Quan
Ngan Luu-Thuy Nguyen
199
0
0
18 Oct 2024
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
International Conference on Learning Representations (ICLR), 2024
Chenhang Cui
An Zhang
Yiyang Zhou
Zhaorun Chen
Gelei Deng
Huaxiu Yao
Tat-Seng Chua
572
12
0
18 Oct 2024
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Shailaja Keyur Sampat
Mutsumi Nakamura
Shankar Kailas
Kartik Aggarwal
Mandy Zhou
Yezhou Yang
Chitta Baral
MLLM
CoGe
ReLM
VLM
LRM
203
1
0
17 Oct 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
ACM Multimedia (ACM MM), 2022
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
337
9
0
16 Oct 2024
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities
Lawrence Yunliang Chen
Hexiang Hu
Ruotong Wang
Yiran Chen
Zifeng Wang
...
Pranav Shyam
Tianyi Zhou
Heng-Chiao Huang
Ming-Hsuan Yang
Boqing Gong
133
8
0
16 Oct 2024
X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing
International Conference on Learning Representations (ICLR), 2024
Xinyan Chen
Jianfei Yang
326
9
0
14 Oct 2024
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
Computer Vision and Pattern Recognition (CVPR), 2024
Jian Yang
Dacheng Yin
Yizhou Zhou
Fengyun Rao
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
DiffM
257
10
0
14 Oct 2024
Leveraging Customer Feedback for Multi-modal Insight Extraction
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Sandeep Sricharan Mukku
Abinesh Kanagarajan
Pushpendu Ghosh
Chetan Aggarwal
162
0
0
13 Oct 2024
nach0-pc: Multi-task Language Model with Molecular Point Cloud Encoder
AAAI Conference on Artificial Intelligence (AAAI), 2024
Maksim Kuznetsov
Airat Valiev
Alex Aliper
Daniil Polykovskiy
E. Tutubalina
Rim Shayakhmetov
Z. Miftahutdinov
185
3
0
11 Oct 2024
A social context-aware graph-based multimodal attentive learning framework for disaster content classification during emergencies: a benchmark dataset and method
Expert systems with applications (ESWA), 2024
Shahid Shafi Dar
Mohammad Zia Ur Rehman
Karan Bais
Mohammed Abdul Haseeb
Nagendra Kumara
159
19
0
11 Oct 2024
Exploring Foundation Models in Remote Sensing Image Change Detection: A Comprehensive Survey
Zihan Yu
Tianxiao Li
Yuxin Zhu
Rongze Pan
208
4
0
10 Oct 2024
Multimodal Clickbait Detection by De-confounding Biases Using Causal Representation Inference
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Jianxing Yu
Shiqi Wang
Han Yin
Zhenlong Sun
Ruobing Xie
Bo Zhang
Yanghui Rao
CML
155
0
0
10 Oct 2024
FLIER: Few-shot Language Image Models Embedded with Latent Representations
Zhinuo Zhou
Peng Zhou
Xiaoyong Pan
VLM
127
1
0
10 Oct 2024
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
Guankun Wang
Han Xiao
Huxin Gao
Renrui Zhang
Long Bai
Xiaoxiao Yang
Zhen Li
Hongsheng Li
Hongliang Ren
203
9
0
10 Oct 2024
Structured Spatial Reasoning with Open Vocabulary Object Detectors
Negar Nejatishahidin
Madhukar Reddy Vongala
Jana Kosecka
200
3
0
09 Oct 2024
Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models
International Conference on Learning Representations (ICLR), 2024
Zeman Li
Xinwei Zhang
Peilin Zhong
Yuan Deng
Meisam Razaviyayn
Vahab Mirrokni
274
8
0
09 Oct 2024
DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Sungnyun Kim
Haofu Liao
Srikar Appalaraju
Peng Tang
Zhuowen Tu
R. Satzoda
R. Manmatha
Vijay Mahadevan
Stefano Soatto
252
4
0
04 Oct 2024
Multi-modal clothing recommendation model based on large model and VAE enhancement
Artificial Intelligence and Cloud Computing Conference (AICC), 2024
Bingjie Huang
Qingyi Lu
Shuaishuai Huang
Xue-she Wang
Haowei Yang
219
7
0
03 Oct 2024
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Hanqi Jiang
Xixuan Hao
Yuzhou Huang
Chong Ma
Jiaxun Zhang
Yi Pan
Ruimao Zhang
MedIm
341
1
0
01 Oct 2024
Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
Neural Information Processing Systems (NeurIPS), 2024
Heeseong Shin
Chaehyun Kim
Sunghwan Hong
Seokju Cho
Anurag Arnab
Paul Hongsuck Seo
Seungryong Kim
VLM
231
9
0
30 Sep 2024
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
Asian Conference on Computer Vision (ACCV), 2024
Kazuki Matsuda
Yuiga Wada
Komei Sugiura
232
6
0
28 Sep 2024
Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving
European Conference on Computer Vision (ECCV), 2024
Zhenghao Peng
Wenjie Luo
Yiren Lu
Tianyi Shen
Cole Gulino
Ari Seff
Justin Fu
167
22
0
26 Sep 2024
A Multimodal Single-Branch Embedding Network for Recommendation in Cold-Start and Missing Modality Scenarios
ACM Conference on Recommender Systems (RecSys), 2024
Christian Ganhor
Marta Moscati
Anna Hausberger
Shah Nawaz
Markus Schedl
197
16
0
26 Sep 2024
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
414
20
0
26 Sep 2024
PASS: Path-selective State Space Model for Event-based Recognition
Jiazhou Zhou
Kanghao Chen
Lei Zhang
Lin Wang
307
0
0
25 Sep 2024
DIAL: Dense Image-text ALignment for Weakly Supervised Semantic Segmentation
European Conference on Computer Vision (ECCV), 2024
Soojin Jang
Jungmin Yun
Junehyoung Kwon
Eunju Lee
Youngbin Kim
325
7
0
24 Sep 2024
Multi-modal Generative AI: Multi-modal LLMs, Diffusions, and the Unification
X. Wang
Yuwei Zhou
Bin Huang
Hong Chen
Wenwu Zhu
DiffM
434
9
0
23 Sep 2024
LARE: Latent Augmentation using Regional Embedding with Vision-Language Model
Machine Learning with Applications (MLWA), 2024
Kosuke Sakurai
Tatsuya Ishii
Ryotaro Shimizu
Linxin Song
Masayuki Goto
VLM
220
1
0
19 Sep 2024
Multi-Cohort Framework with Cohort-Aware Attention and Adversarial Mutual-Information Minimization for Whole Slide Image Classification
Sharon Peled
Y. Maruvka
Moti Freiman
179
1
0
17 Sep 2024
Resolving Inconsistent Semantics in Multi-Dataset Image Segmentation
Qilong Zhangli
Di Liu
Abhishek Aich
Dimitris Metaxas
S. Schulter
183
1
0
15 Sep 2024
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics
Yuxuan Liu
Jingmin Sun
Xinjie He
Griffin Pinney
Zecheng Zhang
Hayden Schaeffer
AI4CE
211
19
0
15 Sep 2024
ComAlign: Compositional Alignment in Vision-Language Models
Ali Abdollah
Amirmohammad Izadi
Armin Saghafian
Reza Vahidimajd
Mohammad Mozafari
Amirreza Mirzaei
Mohammadmahdi Samiei
M. Baghshah
CoGe
VLM
186
1
0
12 Sep 2024
Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective
Guimin Hu
Yi Xin
Weimin Lyu
Haojian Huang
Chang Sun
Zehan Zhu
Lin Gui
Ruichu Cai
Erik Cambria
Hasti Seifi
320
15
0
11 Sep 2024
What to align in multimodal contrastive learning?
International Conference on Learning Representations (ICLR), 2024
Benoit Dufumier
J. Castillo-Navarro
D. Tuia
Jean-Philippe Thiran
305
26
0
11 Sep 2024
MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model
Zhen Yang
Jinhao Chen
Zhengxiao Du
Wenmeng Yu
Weihan Wang
Wenyi Hong
Zhihuan Jiang
Bin Xu
Yuxiao Dong
Jie Tang
VLM
LRM
163
14
0
10 Sep 2024
VidLPRO: A
V
i
d
‾
\underline{Vid}
Vi
d
eo-
L
‾
\underline{L}
L
anguage
P
‾
\underline{P}
P
re-training Framework for
R
o
‾
\underline{Ro}
R
o
botic and Laparoscopic Surgery
Mohammadmahdi Honarmand
Muhammad Abdullah Jamal
Omid Mohareri
328
5
0
07 Sep 2024
MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with Missing Modality
Ruiting Dai
Yuqiao Tan
Lisi Mo
Tao He
Ke Qin
Shuang Liang
VLM
209
4
0
07 Sep 2024
Spindle: Efficient Distributed Training of Multi-Task Large Models via Wavefront Scheduling
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024
Yujie Wang
Shenhan Zhu
Fangcheng Fu
Xupeng Miao
Jie Zhang
Juan Zhu
Fan Hong
Yongbin Li
Bin Cui
113
0
0
05 Sep 2024
Vision-Language Navigation with Continual Learning
Zhiyuan Li
Yanfeng Lv
Ziqin Tu
Di Shang
Hong Qiao
239
3
0
04 Sep 2024
CV-Probes: Studying the interplay of lexical and world knowledge in visually grounded verb understanding
Ivana Beňová
Michal Gregor
Albert Gatt
276
1
0
02 Sep 2024
HiTSR: A Hierarchical Transformer for Reference-based Super-Resolution
Masoomeh Aslahishahri
Jordan R. Ubbens
Ian Stavness
229
1
0
30 Aug 2024
Previous
1
2
3
...
5
6
7
...
43
44
45
Next