Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Neural Information Processing Systems (NeurIPS), 2019
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,232 papers shown
Polysemy Deciphering Network for Robust Human-Object Interaction Detection
International Journal of Computer Vision (IJCV), 2020
Xubin Zhong
Changxing Ding
X. Qu
Dacheng Tao
344
63
0
07 Aug 2020
ConvBERT: Improving BERT with Span-based Dynamic Convolution
Neural Information Processing Systems (NeurIPS), 2020
Zihang Jiang
Weihao Yu
Daquan Zhou
Yunpeng Chen
Jiashi Feng
Shuicheng Yan
347
199
0
06 Aug 2020
Word meaning in minds and machines
Brenden M. Lake
G. Murphy
NAI
369
140
0
04 Aug 2020
Learning Visual Representations with Caption Annotations
Mert Bulent Sariyildiz
J. Perez
Diane Larlus
VLM
SSL
254
171
0
04 Aug 2020
HAMLET: A Hierarchical Multimodal Attention-based Human Activity Recognition Algorithm
Md. Mofijul Islam
Tariq Iqbal
158
94
0
03 Aug 2020
SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space
Workshop on Document-grounded Dialogue and Conversational Question Answering (DialDoc), 2020
Liu Yang
VLM
176
5
0
02 Aug 2020
Neural Language Generation: Formulation, Methods, and Evaluation
Cristina Garbacea
Qiaozhu Mei
354
29
0
31 Jul 2020
Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval
British Machine Vision Conference (BMVC), 2020
Aneeshan Sain
A. Bhunia
Yongxin Yang
Tao Xiang
Yi-Zhe Song
307
58
0
29 Jul 2020
Pre-training for Video Captioning Challenge 2020 Summary
Yingwei Pan
Jun Xu
Yehao Li
Ting Yao
Tao Mei
84
1
0
27 Jul 2020
Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Su
Zhengkai Jiang
Shiyang Feng
Zuohui Fu
Gerard de Melo
Sen Su
VLM
SSL
CLIP
171
29
0
26 Jul 2020
Spatially Aware Multimodal Transformers for TextVQA
European Conference on Computer Vision (ECCV), 2020
Yash Kant
Dhruv Batra
Peter Anderson
Alex Schwing
Devi Parikh
Jiasen Lu
Harsh Agrawal
205
94
0
23 Jul 2020
Analogical Reasoning for Visually Grounded Language Acquisition
Bo Wu
Haoyu Qin
Alireza Zareian
Carl Vondrick
Shih-Fu Chang
136
10
0
22 Jul 2020
Referring Expression Comprehension: A Survey of Methods and Datasets
IEEE transactions on multimedia (TMM), 2020
Yanyuan Qiao
Chaorui Deng
Qi Wu
ObjD
338
118
0
19 Jul 2020
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval
European Conference on Computer Vision (ECCV), 2020
Christopher Thomas
Adriana Kovashka
259
43
0
16 Jul 2020
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder
European Conference on Computer Vision (ECCV), 2020
K. Gouthaman
Anurag Mittal
372
88
0
13 Jul 2020
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training
Yingwei Pan
Yehao Li
Jianjie Luo
Jun Xu
Ting Yao
Tao Mei
210
61
0
05 Jul 2020
Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation
Wanrong Zhu
Xinze Wang
Tsu-Jui Fu
An Yan
P. Narayana
Kazoo Sone
Sugato Basu
Wenjie Wang
353
38
0
01 Jul 2020
Modality-Agnostic Attention Fusion for visual search with text feedback
Eric Dodds
Jack Culpepper
Simão Herdade
Yang Zhang
K. Boakye
EgoV
259
86
0
30 Jun 2020
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
Fei Yu
Jiji Tang
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
418
400
0
30 Jun 2020
Ontology-guided Semantic Composition for Zero-Shot Learning
Jiaoyan Chen
Freddy Lecue
Yuxia Geng
Jeff Z. Pan
Huajun Chen
VLM
202
18
0
30 Jun 2020
Improving VQA and its Explanations \\ by Comparing Competing Explanations
Jialin Wu
Liyan Chen
Raymond J. Mooney
FAtt
AAML
210
18
0
28 Jun 2020
Video-Grounded Dialogues with Pretrained Generation Language Models
Hung Le
Guosheng Lin
218
31
0
27 Jun 2020
Unsupervised Video Decomposition using Spatio-temporal Iterative Inference
Polina Zablotskaia
E. Dominici
Leonid Sigal
Andreas M. Lehrmann
OCL
277
20
0
25 Jun 2020
Comprehensive Information Integration Modeling Framework for Video Titling
Knowledge Discovery and Data Mining (KDD), 2020
Shengyu Zhang
Ziqi Tan
Jin Yu
Zhou Zhao
Kun Kuang
Tan Jiang
Jingren Zhou
Hongxia Yang
Leilei Gan
171
41
0
24 Jun 2020
Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"
Saeed Amizadeh
Hamid Palangi
Oleksandr Polozov
Yichen Huang
K. Koishida
NAI
LRM
332
70
0
20 Jun 2020
Overcoming Statistical Shortcuts for Open-ended Visual Counting
Corentin Dancette
Rémi Cadène
Xinlei Chen
Matthieu Cord
207
3
0
17 Jun 2020
Contrastive Learning for Weakly Supervised Phrase Grounding
Tanmay Gupta
Arash Vahdat
Gal Chechik
Xiaodong Yang
Jan Kautz
Derek Hoiem
ObjD
SSL
306
157
0
17 Jun 2020
Learning Visual Commonsense for Robust Scene Graph Generation
Alireza Zareian
Zhecan Wang
Haoxuan You
Shih-Fu Chang
400
312
0
17 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
Computer Vision and Pattern Recognition (CVPR), 2020
Karan Desai
Justin Johnson
SSL
VLM
504
467
0
11 Jun 2020
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Neural Information Processing Systems (NeurIPS), 2020
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
363
537
0
11 Jun 2020
M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training
Minheng Ni
Haoyang Huang
Lin Su
Edward Cui
Taroon Bharti
Lijuan Wang
Jianfeng Gao
Dongdong Zhang
Nan Duan
285
7
0
04 Jun 2020
TRIE: End-to-End Text Reading and Information Extraction for Document Understanding
ACM Multimedia (ACM MM), 2020
Peng Zhang
Yunlu Xu
Zhanzhan Cheng
Shiliang Pu
Jing Lu
Liang Qiao
Yi Niu
Leilei Gan
SyDa
254
110
0
27 May 2020
FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval
D. Gao
Linbo Jin
Ben Chen
Minghui Qiu
Peng Li
Yi Wei
Yitao Hu
Haozhe Jasper Wang
OOD
212
147
0
20 May 2020
Human Instruction-Following with Deep Reinforcement Learning via Transfer-Learning from Text
Felix Hill
Soňa Mokrá
Nathaniel Wong
Tim Harley
LM&Ro
221
90
0
19 May 2020
IMoJIE: Iterative Memory-Based Joint Open Information Extraction
Keshav Kolluru
Samarth Aggarwal
Vipul Rathore
Mausam
Soumen Chakrabarti
VLM
175
76
0
17 May 2020
Adaptive Transformers for Learning Multimodal Representations
Prajjwal Bhargava
112
5
0
15 May 2020
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
Jize Cao
Zhe Gan
Yu Cheng
Licheng Yu
Yen-Chun Chen
Jingjing Liu
VLM
268
139
0
15 May 2020
Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond
Zhuosheng Zhang
Hai Zhao
Rui Wang
216
66
0
13 May 2020
Cross-Modality Relevance for Reasoning on Language and Vision
Chen Zheng
Quan Guo
Parisa Kordjamshidi
LRM
134
37
0
12 May 2020
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
Douwe Kiela
Hamed Firooz
Aravind Mohan
Vedanuj Goswami
Amanpreet Singh
Pratik Ringshia
Davide Testuggine
332
763
0
10 May 2020
History for Visual Dialog: Do we really need it?
Shubham Agarwal
Trung Bui
Joon-Young Lee
Ioannis Konstas
Verena Rieser
VLM
133
74
0
08 May 2020
MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis
Devamanyu Hazarika
Roger Zimmermann
Soujanya Poria
361
970
0
07 May 2020
Cross-media Structured Common Space for Multimedia Event Extraction
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Pengfei Yu
Alireza Zareian
Qi Zeng
Spencer Whitehead
Di Lu
Heng Ji
Shih-Fu Chang
186
116
0
05 May 2020
Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Arjun Reddy Akula
Spandana Gella
Yaser Al-Onaizan
Song-Chun Zhu
Siva Reddy
ObjD
163
56
0
04 May 2020
Visually Grounded Continual Learning of Compositional Phrases
Xisen Jin
Junyi Du
Arka Sadhu
Ram Nevatia
Xiang Ren
CLL
255
4
0
02 May 2020
Probing Contextual Language Models for Common Ground with Visual Representations
Gabriel Ilharco
Rowan Zellers
Ali Farhadi
Hannaneh Hajishirzi
410
14
0
01 May 2020
Visuo-Linguistic Question Answering (VLQA) Challenge
Shailaja Keyur Sampat
Yezhou Yang
Chitta Baral
CoGe
138
1
0
01 May 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLM
VLM
OffRL
AI4TS
706
539
0
01 May 2020
Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2020
Zarana Parekh
Jason Baldridge
Daniel Cer
Austin Waters
Yinfei Yang
275
68
0
30 Apr 2020
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
European Conference on Computer Vision (ECCV), 2020
Arjun Majumdar
Ayush Shrivastava
Stefan Lee
Peter Anderson
Devi Parikh
Dhruv Batra
LM&Ro
445
261
0
30 Apr 2020
Previous
1
2
3
...
42
43
44
45
Next