Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Neural Information Processing Systems (NeurIPS), 2019
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,233 papers shown
Supervised Contrastive Learning for Multimodal Unreliable News Detection in COVID-19 Pandemic
Wenjia Zhang
Lin Gui
Yulan He
143
43
0
04 Sep 2021
Multimodal Conditionality for Natural Language Generation
Michael Sollami
Aashish Jain
154
10
0
02 Sep 2021
Point-of-Interest Type Prediction using Text and Images
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Danae Sánchez Villegas
Nikolaos Aletras
234
15
0
01 Sep 2021
WebQA: Multihop and Multimodal QA
Computer Vision and Pattern Recognition (CVPR), 2021
Yingshan Chang
M. Narang
Hisami Suzuki
Guihong Cao
Jianfeng Gao
Yonatan Bisk
LRM
389
122
0
01 Sep 2021
CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Hang Li
Yunxing Kang
Tianqiao Liu
Wenbiao Ding
Zitao Liu
190
20
0
01 Sep 2021
On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering
K. Gouthaman
Anurag Mittal
CML
229
0
0
28 Aug 2021
Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training
ACM Multimedia (ACM MM), 2021
Yuqing Song
Shizhe Chen
Qin Jin
Wei Luo
Jun Xie
Fei Huang
211
25
0
25 Aug 2021
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
Hanbo Zhang
Yunfan Lu
Cunjun Yu
David Hsu
Xuguang Lan
Nanning Zheng
LM&Ro
252
71
0
25 Aug 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
International Conference on Learning Representations (ICLR), 2021
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLM
MLLM
1.0K
927
0
24 Aug 2021
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
IEEE International Conference on Computer Vision (ICCV), 2021
Jianwei Yang
Yonatan Bisk
Jianfeng Gao
258
7
0
23 Aug 2021
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network
IEEE International Conference on Computer Vision (ICCV), 2021
Yuxin Wang
Hongtao Xie
Shancheng Fang
Jing Wang
Shenggao Zhu
Yongdong Zhang
VLM
254
180
0
22 Aug 2021
Multimodal Breast Lesion Classification Using Cross-Attention Deep Networks
Hung Q. Vo
Pengyu Yuan
T. He
Stephen T. C. Wong
H. Nguyen
203
5
0
21 Aug 2021
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Ming Yan
Haiyang Xu
Chenliang Li
Bin Bi
Junfeng Tian
Min Gui
Wei Wang
VLM
137
11
0
21 Aug 2021
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Pierre-Louis Guhur
Makarand Tapaswi
Shizhe Chen
Ivan Laptev
Cordelia Schmid
LM&Ro
231
172
0
20 Aug 2021
Knowledge Perceived Multi-modal Pretraining in E-commerce
Yushan Zhu
Huaixiao Tou
Wen Zhang
Ganqiang Ye
Hui Chen
Ningyu Zhang
Huajun Chen
262
38
0
20 Aug 2021
Detection of Illicit Drug Trafficking Events on Instagram: A Deep Multimodal Multilabel Learning Approach
Chuanbo Hu
Minglei Yin
Bin Liu
Xin Li
Yanfang Ye
119
18
0
19 Aug 2021
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Yehao Li
Yingwei Pan
Jingwen Chen
Ting Yao
Tao Mei
VLM
195
37
0
18 Aug 2021
Who's Waldo? Linking People Across Text and Images
Claire Yuqing Cui
Apoorv Khandelwal
Yoav Artzi
Noah Snavely
Hadar Averbuch-Elor
237
21
0
16 Aug 2021
MMChat: Multi-Modal Chat Dataset on Social Media
Yinhe Zheng
Guanyi Chen
Xin Liu
K. Lin
338
38
0
16 Aug 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
177
59
0
16 Aug 2021
Video Transformer for Deepfake Detection with Incremental Learning
ACM Multimedia (ACM MM), 2021
Sohail Ahmed Khan
Hang Dai
ViT
215
77
0
11 Aug 2021
BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis
Masoud Monajatipoor
Mozhdeh Rouhsedaghat
Liunian Harold Li
Aichi Chien
C.-C. Jay Kuo
Fabien Scalzo
Kai-Wei Chang
LM&MA
MedIm
118
42
0
10 Aug 2021
Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion
Alessandro Suglia
Qiaozi Gao
Jesse Thomason
Govind Thattai
Gaurav Sukhatme
LM&Ro
372
84
0
10 Aug 2021
Relation-aware Compositional Zero-shot Learning for Attribute-Object Pair Recognition
IEEE transactions on multimedia (IEEE Trans. Multimedia), 2021
Ziwei Xu
Guangzhi Wang
Yongkang Wong
Mohan S. Kankanhalli
179
35
0
10 Aug 2021
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
IEEE International Conference on Computer Vision (ICCV), 2021
Zheyuan Liu
Cristian Rodriguez-Opazo
Damien Teney
Stephen Gould
VLM
298
306
0
09 Aug 2021
Disentangling Hate in Online Memes
ACM Multimedia (ACM MM), 2021
Rui Cao
Ziqing Fan
Roy Ka-wei Lee
Wen-Haw Chong
Jing Jiang
289
106
0
09 Aug 2021
Detecting Propaganda Techniques in Memes
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Dimitar Dimitrov
Bishr Bin Ali
Shaden Shaar
Firoj Alam
Fabrizio Silvestri
Hamed Firooz
Preslav Nakov
Giovanni Da San Martino
255
104
0
07 Aug 2021
Interpretable Visual Understanding with Cognitive Attention Network
International Conference on Artificial Neural Networks (ICANN), 2021
Xuejiao Tang
Wenbin Zhang
Yi Yu
Kea Turner
Hanyu Wang
Mengyu Wang
Eirini Ntoutsi
290
19
0
06 Aug 2021
StrucTexT: Structured Text Understanding with Multi-Modal Transformers
ACM Multimedia (ACM MM), 2021
Yulin Li
Yuxi Qian
Yuchen Yu
Xiameng Qin
Chengquan Zhang
Yan Liu
Kun Yao
Junyu Han
Jingtuo Liu
Errui Ding
316
140
0
06 Aug 2021
Fast Convergence of DETR with Spatially Modulated Co-Attention
IEEE International Conference on Computer Vision (ICCV), 2021
Shiyang Feng
Minghang Zheng
Xiaogang Wang
Jifeng Dai
Jiaming Song
ViT
274
375
0
05 Aug 2021
Exploiting BERT For Multimodal Target Sentiment Classification Through Input Space Translation
Zaid Khan
Y. Fu
178
182
0
03 Aug 2021
Representation learning for neural population activity with Neural Data Transformers
Joel Ye
C. Pandarinath
AI4TS
AI4CE
417
80
0
02 Aug 2021
StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
Rinon Gal
Or Patashnik
Haggai Maron
Gal Chechik
Daniel Cohen-Or
CLIP
VLM
308
286
0
02 Aug 2021
Word2Pix: Word to Pixel Cross Attention Transformer in Visual Grounding
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
Heng Zhao
Qiufeng Wang
Yew-Soon Ong
ObjD
208
34
0
31 Jul 2021
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining
IEEE International Conference on Computer Vision (ICCV), 2021
Xunlin Zhan
Yangxin Wu
Xiao Dong
Yunchao Wei
Minlong Lu
Yichi Zhang
Hang Xu
Xiaodan Liang
ViT
297
82
0
30 Jul 2021
Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions
Information Fusion (Inf. Fusion), 2021
Anil Rahate
Rahee Walambe
S. Ramanna
K. Kotecha
429
179
0
29 Jul 2021
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
ACM Computing Surveys (CSUR), 2021
Pengfei Liu
Weizhe Yuan
Jinlan Fu
Zhengbao Jiang
Hiroaki Hayashi
Graham Neubig
VLM
SyDa
798
4,996
0
28 Jul 2021
Exceeding the Limits of Visual-Linguistic Multi-Task Learning
Cameron R. Wolfe
Keld T. Lundgaard
VLM
186
3
0
27 Jul 2021
Language Grounding with 3D Objects
Conference on Robot Learning (CoRL), 2021
Jesse Thomason
Mohit Shridhar
Yonatan Bisk
Chris Paxton
Luke Zettlemoyer
LM&Ro
224
55
0
26 Jul 2021
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
IEEE International Conference on Computer Vision (ICCV), 2021
Yuren Cong
Wentong Liao
H. Ackermann
Bodo Rosenhahn
M. Yang
ViT
357
151
0
26 Jul 2021
Multi-stage Pre-training over Simplified Multimodal Pre-training Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Tongtong Liu
Fangxiang Feng
Caixia Yuan
79
16
0
22 Jul 2021
DRDF: Determining the Importance of Different Multimodal Information with Dual-Router Dynamic Framework
ACM Multimedia (ACM MM), 2021
Haiwen Hong
Xuan Jin
Yin Zhang
Yunqing Hu
Jingfeng Zhang
Yuan He
Hui Xue
MoE
131
0
0
21 Jul 2021
Neural Variational Learning for Grounded Language Acquisition
IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2021
Nisha Pillai
Cynthia Matuszek
Francis Ferraro
VLM
SSL
GAN
DRL
256
2
0
20 Jul 2021
Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning
Kaylee Burns
Christopher D. Manning
Li Fei-Fei
226
0
0
20 Jul 2021
Separating Skills and Concepts for Novel Visual Question Answering
Computer Vision and Pattern Recognition (CVPR), 2021
Spencer Whitehead
Hui Wu
Heng Ji
Rogerio Feris
Kate Saenko
CoGe
192
38
0
19 Jul 2021
Constructing Multi-Modal Dialogue Dataset by Replacing Text with Semantically Relevant Images
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Nyoungwoo Lee
Suwon Shin
Jaegul Choo
Ho‐Jin Choi
S. Myaeng
180
32
0
19 Jul 2021
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Neural Information Processing Systems (NeurIPS), 2021
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
865
2,536
0
16 Jul 2021
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
Paul Pu Liang
Yiwei Lyu
Xiang Fan
Zetian Wu
Yun Cheng
...
Peter Wu
Michelle A. Lee
Yuke Zhu
Ruslan Salakhutdinov
Louis-Philippe Morency
VLM
306
229
0
15 Jul 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
464
358
0
14 Jul 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
521
480
0
13 Jul 2021
Previous
1
2
3
...
35
36
37
...
43
44
45
Next
Page 36 of 45
Page
of 45
Go