Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Neural Information Processing Systems (NeurIPS), 2019
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,223 papers shown
Title
Coreferential Reasoning Learning for Language Representation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Deming Ye
Yankai Lin
Jiaju Du
Zhenghao Liu
Peng Li
Maosong Sun
Zhiyuan Liu
193
184
0
15 Apr 2020
Relation Transformer Network
Rajat Koner
Poulami Sinhamahapatra
Volker Tresp
ViT
283
35
0
13 Apr 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
European Conference on Computer Vision (ECCV), 2020
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
603
2,115
0
13 Apr 2020
An Entropy Clustering Approach for Assessing Visual Question Difficulty
IEEE Access (IEEE Access), 2020
K. Terao
Toru Tamaki
B. Raytchev
K. Kaneda
Shuníchi Satoh
OOD
AAML
249
1
0
12 Apr 2020
Rephrasing visual questions by specifying the entropy of the answer distribution
K. Terao
Toru Tamaki
B. Raytchev
K. Kaneda
S. Satoh
OOD
132
2
0
10 Apr 2020
Multimodal Categorization of Crisis Events in Social Media
Computer Vision and Pattern Recognition (CVPR), 2020
Mahdi Abavisani
Liwei Wu
Shengli Hu
Joel R. Tetreault
A. Jaimes
236
109
0
10 Apr 2020
Learning to Scale Multilingual Representations for Vision-Language Tasks
European Conference on Computer Vision (ECCV), 2020
Andrea Burns
Donghyun Kim
Derry Wijaya
Kate Saenko
Bryan A. Plummer
162
36
0
09 Apr 2020
Context-Aware Group Captioning via Self-Attention and Contrastive Features
Computer Vision and Pattern Recognition (CVPR), 2020
Zhuowan Li
Quan Hung Tran
Long Mai
Zhe Lin
Alan Yuille
VLM
147
50
0
07 Apr 2020
TAPAS: Weakly Supervised Table Parsing via Pre-training
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Jonathan Herzig
Pawel Krzysztof Nowak
Thomas Müller
Francesco Piccinno
Julian Martin Eisenschlos
LMTD
RALM
421
767
0
05 Apr 2020
Generating Rationales in Visual Question Answering
Hammad A. Ayyubi
Md. Mehrab Tanjim
Julian McAuley
G. Cottrell
LRM
107
6
0
04 Apr 2020
XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Yaobo Liang
Nan Duan
Yeyun Gong
Ning Wu
Fenfei Guo
...
Shuguang Liu
Fan Yang
Daniel Fernando Campos
Rangan Majumder
Ming Zhou
ELM
VLM
276
367
0
03 Apr 2020
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
Zhicheng Huang
Zhaoyang Zeng
Bei Liu
Dongmei Fu
Jianlong Fu
ViT
346
467
0
02 Apr 2020
VIOLIN: A Large-Scale Dataset for Video-and-Language Inference
Computer Vision and Pattern Recognition (CVPR), 2020
J. Liu
Wenhu Chen
Yu Cheng
Zhe Gan
Licheng Yu
Yiming Yang
Jingjing Liu
MLLM
VGen
226
75
0
25 Mar 2020
Pre-trained Models for Natural Language Processing: A Survey
Science China Technological Sciences (Sci China Technol Sci), 2020
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
889
1,597
0
18 Mar 2020
Deconfounded Image Captioning: A Causal Retrospect
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Xu Yang
Hanwang Zhang
Jianfei Cai
CML
159
145
0
09 Mar 2020
Cross-modal Learning for Multi-modal Video Categorization
Palash Goyal
Saurabh Sahu
Shalini Ghosh
Chul Lee
238
10
0
07 Mar 2020
XGPT: Cross-modal Generative Pre-Training for Image Captioning
Natural Language Processing and Chinese Computing (NLPCC), 2020
Qiaolin Xia
Haoyang Huang
Nan Duan
Dongdong Zhang
Lei Ji
Zhifang Sui
Edward Cui
Taroon Bharti
Xin Liu
Ming Zhou
MLLM
VLM
211
84
0
03 Mar 2020
Visual Commonsense R-CNN
Computer Vision and Pattern Recognition (CVPR), 2020
Tan Wang
Jianqiang Huang
Hanwang Zhang
Qianru Sun
SSL
ObjD
CML
200
275
0
27 Feb 2020
What BERT Sees: Cross-Modal Transfer for Visual Question Generation
Thomas Scialom
Patrick Bordes
Paul-Alexis Dray
Jacopo Staiano
Patrick Gallinari
175
7
0
25 Feb 2020
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training
Computer Vision and Pattern Recognition (CVPR), 2020
Weituo Hao
Chunyuan Li
Xiujun Li
Lawrence Carin
Jianfeng Gao
LM&Ro
253
319
0
25 Feb 2020
Measuring Social Biases in Grounded Vision and Language Embeddings
North American Chapter of the Association for Computational Linguistics (NAACL), 2020
Candace Ross
Boris Katz
Andrei Barbu
253
69
0
20 Feb 2020
Contextual Lensing of Universal Sentence Representations
J. Kiros
114
5
0
20 Feb 2020
VQA-LOL: Visual Question Answering under the Lens of Logic
European Conference on Computer Vision (ECCV), 2020
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
CoGe
184
78
0
19 Feb 2020
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Findings (Findings), 2020
Zhangyin Feng
Daya Guo
Duyu Tang
Nan Duan
Xiaocheng Feng
...
Linjun Shou
Bing Qin
Ting Liu
Daxin Jiang
Ming Zhou
787
3,286
0
19 Feb 2020
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
Huaishao Luo
Lei Ji
Botian Shi
Haoyang Huang
Nan Duan
Tianrui Li
Jason Li
Xilin Chen
Ming Zhou
VLM
312
419
0
15 Feb 2020
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
Jesse Dodge
Gabriel Ilharco
Roy Schwartz
Ali Farhadi
Hannaneh Hajishirzi
Noah A. Smith
251
662
0
15 Feb 2020
Exploiting Temporal Coherence for Multi-modal Video Categorization
Palash Goyal
Saurabh Sahu
Shalini Ghosh
Chul Lee
130
1
0
07 Feb 2020
Retrospective Reader for Machine Reading Comprehension
AAAI Conference on Artificial Intelligence (AAAI), 2020
Zhuosheng Zhang
Junjie Yang
Hai Zhao
RALM
297
236
0
27 Jan 2020
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
Di Qi
Lin Su
Jianwei Song
Edward Cui
Taroon Bharti
Arun Sacheti
VLM
329
273
0
22 Jan 2020
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
Pattern Recognition (Pattern Recognit.), 2020
M. Farazi
Salman H. Khan
Nick Barnes
176
18
0
20 Jan 2020
In Defense of Grid Features for Visual Question Answering
Computer Vision and Pattern Recognition (CVPR), 2020
Huaizu Jiang
Ishan Misra
Marcus Rohrbach
Erik Learned-Miller
Xinlei Chen
OOD
ObjD
291
351
0
10 Jan 2020
Visual Question Answering on 360° Images
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
Shih-Han Chou
Wei-Lun Chao
Wei-Sheng Lai
Min Sun
Ming-Hsuan Yang
122
27
0
10 Jan 2020
Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Lei Shi
Shijie Geng
Kai Shuang
Chiori Hori
Songxiang Liu
Shiyang Feng
Sen Su
225
12
0
03 Jan 2020
All-in-One Image-Grounded Conversational Agents
Da Ju
Kurt Shuster
Y-Lan Boureau
Jason Weston
LLMAG
133
9
0
28 Dec 2019
Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection
Computer Vision and Pattern Recognition (CVPR), 2019
Sara Beery
Guanhang Wu
V. Rathod
Ronny Votel
Jonathan Huang
ObjD
232
124
0
07 Dec 2019
Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks
European Conference on Artificial Intelligence (ECAI), 2019
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
160
15
0
06 Dec 2019
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
European Conference on Computer Vision (ECCV), 2019
Vishvak Murahari
Dhruv Batra
Devi Parikh
Abhishek Das
VLM
276
119
0
05 Dec 2019
15 Keypoints Is All You Need
Computer Vision and Pattern Recognition (CVPR), 2019
Michael Snower
Asim Kadav
Farley Lai
H. Graf
VOT
3DH
254
50
0
05 Dec 2019
12-in-1: Multi-Task Vision and Language Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2019
Jiasen Lu
Vedanuj Goswami
Marcus Rohrbach
Devi Parikh
Stefan Lee
VLM
ObjD
271
498
0
05 Dec 2019
Multimodal Attention Networks for Low-Level Vision-and-Language Navigation
Computer Vision and Image Understanding (CVIU), 2019
Federico Landi
Lorenzo Baraldi
Marcella Cornia
M. Corsini
Rita Cucchiara
LM&Ro
200
30
0
27 Nov 2019
Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs
Van-Quang Nguyen
Masanori Suganuma
Takayuki Okatani
235
7
0
26 Nov 2019
Learning to Learn Words from Visual Scenes
Dídac Surís
Dave Epstein
Heng Ji
Shih-Fu Chang
Carl Vondrick
VLM
CLIP
SSL
OffRL
125
4
0
25 Nov 2019
Temporal Reasoning via Audio Question Answering
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2019
Haytham M. Fayek
Justin Johnson
124
59
0
21 Nov 2019
Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks
Computer Vision and Pattern Recognition (CVPR), 2019
Fengda Zhu
Yi Zhu
Xiaojun Chang
Xiaodan Liang
LRM
337
264
0
18 Nov 2019
Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA
Computer Vision and Pattern Recognition (CVPR), 2019
Ronghang Hu
Amanpreet Singh
Trevor Darrell
Marcus Rohrbach
285
220
0
14 Nov 2019
Unsupervised Pre-training for Natural Language Generation: A Literature Review
Yuanxin Liu
Zheng Lin
SSL
AI4CE
102
5
0
13 Nov 2019
The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design
J. Dean
152
83
0
13 Nov 2019
Multimodal Intelligence: Representation Learning, Information Fusion, and Applications
IEEE Journal on Selected Topics in Signal Processing (JSTSP), 2019
Chao Zhang
Zichao Yang
Xiaodong He
Li Deng
HAI
AI4TS
231
394
0
10 Nov 2019
Two-Headed Monster And Crossed Co-Attention Networks
Yaoyiran Li
Jing Jiang
118
0
0
10 Nov 2019
The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
Kurt Shuster
Da Ju
Stephen Roller
Emily Dinan
Y-Lan Boureau
Jason Weston
226
84
0
09 Nov 2019
Previous
1
2
3
...
43
44
45
Next