Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,088 papers shown
Title
DeVLBert: Learning Deconfounded Visio-Linguistic Representations
Shengyu Zhang
Tan Jiang
Tan Wang
Kun Kuang
Zhou Zhao
Jianke Zhu
Jin Yu
Hongxia Yang
Fei Wu
OOD
20
85
0
16 Aug 2020
Poet: Product-oriented Video Captioner for E-commerce
Shengyu Zhang
Ziqi Tan
Jin Yu
Zhou Zhao
Kun Kuang
Jie Liu
Jingren Zhou
Hongxia Yang
Fei Wu
6
34
0
16 Aug 2020
Jointly Fine-Tuning "BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition
Shamane Siriwardhana
Andrew Reis
Rivindu Weerasekera
Suranga Nanayakkara
19
112
0
15 Aug 2020
Weakly supervised cross-domain alignment with optimal transport
Siyang Yuan
Ke Bai
Liqun Chen
Yizhe Zhang
Chenyang Tao
Chunyuan Li
Guoyin Wang
Ricardo Henao
Lawrence Carin
OT
24
7
0
14 Aug 2020
A Machine of Few Words -- Interactive Speaker Recognition with Reinforcement Learning
Mathieu Seurin
Florian Strub
Philippe Preux
Olivier Pietquin
13
5
0
07 Aug 2020
Polysemy Deciphering Network for Robust Human-Object Interaction Detection
Xubin Zhong
Changxing Ding
X. Qu
Dacheng Tao
6
58
0
07 Aug 2020
ConvBERT: Improving BERT with Span-based Dynamic Convolution
Zihang Jiang
Weihao Yu
Daquan Zhou
Yunpeng Chen
Jiashi Feng
Shuicheng Yan
21
155
0
06 Aug 2020
Word meaning in minds and machines
Brenden Lake
G. Murphy
NAI
15
117
0
04 Aug 2020
Learning Visual Representations with Caption Annotations
Mert Bulent Sariyildiz
J. Perez
Diane Larlus
VLM
SSL
11
159
0
04 Aug 2020
HAMLET: A Hierarchical Multimodal Attention-based Human Activity Recognition Algorithm
Md. Mofijul Islam
Tariq Iqbal
12
78
0
03 Aug 2020
SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space
Liu Yang
VLM
16
5
0
02 Aug 2020
Neural Language Generation: Formulation, Methods, and Evaluation
Cristina Garbacea
Qiaozhu Mei
32
30
0
31 Jul 2020
Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval
Aneeshan Sain
A. Bhunia
Yongxin Yang
Tao Xiang
Yi-Zhe Song
16
49
0
29 Jul 2020
Pre-training for Video Captioning Challenge 2020 Summary
Yingwei Pan
Jun Xu
Yehao Li
Ting Yao
Tao Mei
16
1
0
27 Jul 2020
Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Su
Zhengkai Jiang
Peng Gao
Zuohui Fu
Gerard de Melo
Sen Su
VLM
SSL
CLIP
25
29
0
26 Jul 2020
Spatially Aware Multimodal Transformers for TextVQA
Yash Kant
Dhruv Batra
Peter Anderson
A. Schwing
Devi Parikh
Jiasen Lu
Harsh Agrawal
17
85
0
23 Jul 2020
Analogical Reasoning for Visually Grounded Language Acquisition
Bo Wu
Haoyu Qin
Alireza Zareian
Carl Vondrick
Shih-Fu Chang
6
9
0
22 Jul 2020
Referring Expression Comprehension: A Survey of Methods and Datasets
Yanyuan Qiao
Chaorui Deng
Qi Wu
ObjD
42
93
0
19 Jul 2020
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval
Christopher Thomas
Adriana Kovashka
15
41
0
16 Jul 2020
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder
K. Gouthaman
Anurag Mittal
42
78
0
13 Jul 2020
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training
Yingwei Pan
Yehao Li
Jianjie Luo
Jun Xu
Ting Yao
Tao Mei
21
57
0
05 Jul 2020
Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation
Wanrong Zhu
X. Wang
Tsu-jui Fu
An Yan
P. Narayana
Kazoo Sone
Sugato Basu
W. Wang
21
33
0
01 Jul 2020
Modality-Agnostic Attention Fusion for visual search with text feedback
Eric Dodds
Jack Culpepper
Simão Herdade
Yang Zhang
K. Boakye
EgoV
11
71
0
30 Jun 2020
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
Fei Yu
Jiji Tang
Weichong Yin
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
11
375
0
30 Jun 2020
Ontology-guided Semantic Composition for Zero-Shot Learning
Jiaoyan Chen
Freddy Lecue
Yuxia Geng
Jeff Z. Pan
Huajun Chen
VLM
6
15
0
30 Jun 2020
Improving VQA and its Explanations \\ by Comparing Competing Explanations
Jialin Wu
Liyan Chen
Raymond J. Mooney
FAtt
AAML
33
17
0
28 Jun 2020
Video-Grounded Dialogues with Pretrained Generation Language Models
Hung Le
S. Hoi
18
28
0
27 Jun 2020
Unsupervised Video Decomposition using Spatio-temporal Iterative Inference
Polina Zablotskaia
E. Dominici
Leonid Sigal
Andreas M. Lehrmann
OCL
19
19
0
25 Jun 2020
Comprehensive Information Integration Modeling Framework for Video Titling
Shengyu Zhang
Ziqi Tan
Jin Yu
Zhou Zhao
Kun Kuang
Tan Jiang
Jingren Zhou
Hongxia Yang
Fei Wu
17
40
0
24 Jun 2020
Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"
Saeed Amizadeh
Hamid Palangi
Oleksandr Polozov
Yichen Huang
K. Koishida
NAI
LRM
31
58
0
20 Jun 2020
Overcoming Statistical Shortcuts for Open-ended Visual Counting
Corentin Dancette
Rémi Cadène
Xinlei Chen
Matthieu Cord
8
3
0
17 Jun 2020
Contrastive Learning for Weakly Supervised Phrase Grounding
Tanmay Gupta
Arash Vahdat
Gal Chechik
Xiaodong Yang
Jan Kautz
Derek Hoiem
ObjD
SSL
34
139
0
17 Jun 2020
Learning Visual Commonsense for Robust Scene Graph Generation
Alireza Zareian
Zhecan Wang
Haoxuan You
Shih-Fu Chang
19
312
0
17 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai
Justin Johnson
SSL
VLM
14
432
0
11 Jun 2020
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
24
485
0
11 Jun 2020
M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training
Minheng Ni
Haoyang Huang
Lin Su
Edward Cui
Taroon Bharti
Lijuan Wang
Jianfeng Gao
Dongdong Zhang
Nan Duan
8
7
0
04 Jun 2020
TRIE: End-to-End Text Reading and Information Extraction for Document Understanding
Peng Zhang
Yunlu Xu
Zhanzhan Cheng
Shiliang Pu
Jing Lu
Liang Qiao
Yi Niu
Fei Wu
SyDa
6
102
0
27 May 2020
FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval
D. Gao
Linbo Jin
Ben Chen
Minghui Qiu
Peng Li
Yi Wei
Y. Hu
Haozhe Jasper Wang
OOD
9
133
0
20 May 2020
Human Instruction-Following with Deep Reinforcement Learning via Transfer-Learning from Text
Felix Hill
Soňa Mokrá
Nathaniel Wong
Tim Harley
LM&Ro
6
81
0
19 May 2020
IMoJIE: Iterative Memory-Based Joint Open Information Extraction
Keshav Kolluru
Samarth Aggarwal
Vipul Rathore
Mausam
Soumen Chakrabarti
VLM
6
71
0
17 May 2020
Adaptive Transformers for Learning Multimodal Representations
Prajjwal Bhargava
14
4
0
15 May 2020
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
Jize Cao
Zhe Gan
Yu Cheng
Licheng Yu
Yen-Chun Chen
Jingjing Liu
VLM
14
127
0
15 May 2020
Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond
Zhuosheng Zhang
Hai Zhao
Rui Wang
16
62
0
13 May 2020
Cross-Modality Relevance for Reasoning on Language and Vision
Chen Zheng
Quan Guo
Parisa Kordjamshidi
LRM
32
36
0
12 May 2020
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
Douwe Kiela
Hamed Firooz
Aravind Mohan
Vedanuj Goswami
Amanpreet Singh
Pratik Ringshia
Davide Testuggine
23
577
0
10 May 2020
History for Visual Dialog: Do we really need it?
Shubham Agarwal
Trung Bui
Joon-Young Lee
Ioannis Konstas
Verena Rieser
VLM
11
69
0
08 May 2020
MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis
Devamanyu Hazarika
Roger Zimmermann
Soujanya Poria
19
662
0
07 May 2020
Cross-media Structured Common Space for Multimedia Event Extraction
Manling Li
Alireza Zareian
Qi Zeng
Spencer Whitehead
Di Lu
Heng Ji
Shih-Fu Chang
10
102
0
05 May 2020
Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions
Arjun Reddy Akula
Spandana Gella
Yaser Al-Onaizan
Song-Chun Zhu
Siva Reddy
ObjD
10
52
0
04 May 2020
Visually Grounded Continual Learning of Compositional Phrases
Xisen Jin
Junyi Du
Arka Sadhu
Ram Nevatia
Xiang Ren
CLL
14
4
0
02 May 2020
Previous
1
2
3
...
39
40
41
42
Next