Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,088 papers shown
Title
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
Haiyang Xu
Ming Yan
Chenliang Li
Bin Bi
Songfang Huang
Wenming Xiao
Fei Huang
VLM
17
118
0
03 Jun 2021
TVDIM: Enhancing Image Self-Supervised Pretraining via Noisy Text Data
Pengda Qin
Yuhong Li
Kefeng Deng
Qiang Wu
11
1
0
03 Jun 2021
Attention mechanisms and deep learning for machine vision: A survey of the state of the art
A. M. Hafiz
S. A. Parah
R. A. Bhat
19
45
0
03 Jun 2021
More Identifiable yet Equally Performant Transformers for Text Classification
Rishabh Bhardwaj
Navonil Majumder
Soujanya Poria
Eduard H. Hovy
11
6
0
02 Jun 2021
Towards Efficient Cross-Modal Visual Textual Retrieval using Transformer-Encoder Deep Features
Nicola Messina
Giuseppe Amato
Fabrizio Falchi
Claudio Gennaro
Stéphane Marchand-Maillet
14
7
0
01 Jun 2021
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models
Linjie Li
Jie Lei
Zhe Gan
Jingjing Liu
AAML
VLM
20
70
0
01 Jun 2021
M6-T: Exploring Sparse Expert Models and Beyond
An Yang
Junyang Lin
Rui Men
Chang Zhou
Le Jiang
...
Dingyang Zhang
Wei Lin
Lin Qu
Jingren Zhou
Hongxia Yang
MoE
31
24
0
31 May 2021
Dual-stream Network for Visual Recognition
Mingyuan Mao
Renrui Zhang
Honghui Zheng
Peng Gao
Teli Ma
Yan Peng
Errui Ding
Baochang Zhang
Shumin Han
ViT
18
63
0
31 May 2021
GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning
Jiaqi Chen
Jianheng Tang
Jinghui Qin
Xiaodan Liang
Lingbo Liu
Eric P. Xing
Liang Lin
AIMat
14
157
0
30 May 2021
Modeling Text-visual Mutual Dependency for Multi-modal Dialog Generation
Shuhe Wang
Yuxian Meng
Xiaofei Sun
Fei Wu
Rongbin Ouyang
Rui Yan
Tianwei Zhang
Jiwei Li
21
15
0
30 May 2021
M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis via Non-Autoregressive Generative Transformers
Zhu Zhang
Jianxin Ma
Chang Zhou
Rui Men
Zhikang Li
Ming Ding
Jie Tang
Jingren Zhou
Hongxia Yang
25
46
0
29 May 2021
Maintaining Common Ground in Dynamic Environments
Takuma Udagawa
Akiko Aizawa
19
11
0
29 May 2021
Learning Relation Alignment for Calibrated Cross-modal Retrieval
Shuhuai Ren
Junyang Lin
Guangxiang Zhao
Rui Men
An Yang
Jingren Zhou
Xu Sun
Hongxia Yang
18
36
0
28 May 2021
Maria: A Visual Experience Powered Conversational Agent
Zujie Liang
Huang Hu
Can Xu
Chongyang Tao
Xiubo Geng
Yining Chen
Fan Liang
Daxin Jiang
23
29
0
27 May 2021
Multi-Modal Semantic Inconsistency Detection in Social Media News Posts
S. McCrae
Kehan Wang
A. Zakhor
28
15
0
26 May 2021
Understanding Mobile GUI: from Pixel-Words to Screen-Sentences
Jingwen Fu
Xiaoyi Zhang
Yuwang Wang
Wenjun Zeng
Sam Yang
Grayson Hilliard
21
14
0
25 May 2021
Enhance Multimodal Model Performance with Data Augmentation: Facebook Hateful Meme Challenge Solution
Yang Li
Zi-xin Zhang
Hutchin Huang
19
1
0
25 May 2021
Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation
Tao Tu
Q. Ping
Govind Thattai
Gökhan Tür
Premkumar Natarajan
24
18
0
24 May 2021
Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
Jong Hak Moon
HyunGyung Lee
W. Shin
Young-Hak Kim
E. Choi
MedIm
19
151
0
24 May 2021
Human-centric Relation Segmentation: Dataset and Solution
Si Liu
Zitian Wang
Yulu Gao
Lejian Ren
Yue Liao
Guanghui Ren
Bo Li
Shuicheng Yan
11
10
0
24 May 2021
Aligning Visual Prototypes with BERT Embeddings for Few-Shot Learning
Kun Yan
Zied Bouraoui
Ping Wang
Shoaib Jameel
Steven Schockaert
22
21
0
21 May 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Prahal Arora
Masoumeh Aminzadeh
Christoph Feichtenhofer
Florian Metze
Luke Zettlemoyer
21
129
0
20 May 2021
Pathdreamer: A World Model for Indoor Navigation
Jing Yu Koh
Honglak Lee
Yinfei Yang
Jason Baldridge
Peter Anderson
26
79
0
18 May 2021
Parallel Attention Network with Sequence Matching for Video Grounding
Hao Zhang
Aixin Sun
Wei Jing
Liangli Zhen
Joey Tianyi Zhou
Rick Siow Mong Goh
16
40
0
18 May 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
40
440
0
18 May 2021
A Review on Explainability in Multimodal Deep Neural Nets
Gargi Joshi
Rahee Walambe
K. Kotecha
21
137
0
17 May 2021
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval
K. Ueki
21
3
0
16 May 2021
Episodic Transformer for Vision-and-Language Navigation
Alexander Pashevich
Cordelia Schmid
Chen Sun
LM&Ro
24
193
0
13 May 2021
Video Corpus Moment Retrieval with Contrastive Learning
Hao Zhang
Aixin Sun
Wei Jing
Guoshun Nan
Liangli Zhen
Joey Tianyi Zhou
Rick Siow Mong Goh
33
81
0
13 May 2021
Connecting What to Say With Where to Look by Modeling Human Attention Traces
Zihang Meng
Licheng Yu
Ning Zhang
Tamara L. Berg
Babak Damavandi
Vikas Singh
Amy Bearman
32
25
0
12 May 2021
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching
Chenchi Zhang
Wenbo Ma
Jun Xiao
Hanwang Zhang
Jian Shao
Yueting Zhuang
Long Chen
21
4
0
12 May 2021
Language Acquisition is Embodied, Interactive, Emotive: a Research Proposal
C. Kennington
LM&Ro
38
0
0
10 May 2021
Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions
Mathew Monfort
SouYoung Jin
Alexander H. Liu
David Harwath
Rogerio Feris
James Glass
Aude Oliva
8
60
0
10 May 2021
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey
Jinjie Ni
Tom Young
Vlad Pandelea
Fuzhao Xue
Erik Cambria
54
267
0
10 May 2021
A survey on VQA_Datasets and Approaches
Yeyun Zou
Qiyu Xie
40
18
0
02 May 2021
Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads
Chenyu Gao
Qi Zhu
Peng Wang
Qi Wu
8
2
0
30 Apr 2021
Comparing Visual Reasoning in Humans and AI
Shravan Murlidaran
W. Wang
M. Eckstein
24
1
0
29 Apr 2021
A First Look: Towards Explainable TextVQA Models via Visual and Textual Explanations
Varun Nagaraj Rao
Xingjian Zhen
K. Hovsepian
Mingwei Shen
24
17
0
29 Apr 2021
Multimodal Contrastive Training for Visual Representation Learning
Xin Yuan
Zhe-nan Lin
Jason Kuen
Jianming Zhang
Yilin Wang
Michael Maire
Ajinkya Kale
Baldo Faieta
SSL
28
153
0
26 Apr 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
57
858
0
26 Apr 2021
SemEval-2021 Task 6: Detection of Persuasion Techniques in Texts and Images
Dimitar Dimitrov
Bishr Bin Ali
Shaden Shaar
Firoj Alam
Fabrizio Silvestri
Hamed Firooz
Preslav Nakov
Giovanni Da San Martino
13
103
0
25 Apr 2021
MusCaps: Generating Captions for Music Audio
Ilaria Manco
Emmanouil Benetos
Elio Quinton
Gyorgy Fazekas
30
36
0
24 Apr 2021
M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers
Tianrui Guan
Jun Wang
Shiyi Lan
Rohan Chandra
Zuxuan Wu
Larry S. Davis
Dinesh Manocha
ViT
3DPC
21
118
0
24 Apr 2021
Playing Lottery Tickets with Vision and Language
Zhe Gan
Yen-Chun Chen
Linjie Li
Tianlong Chen
Yu Cheng
Shuohang Wang
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
103
53
0
23 Apr 2021
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
19
1,221
0
22 Apr 2021
Comprehensive Multi-Modal Interactions for Referring Image Segmentation
Kanishk Jain
Vineet Gandhi
11
17
0
21 Apr 2021
Understanding Synonymous Referring Expressions via Contrastive Features
Yi-Wen Chen
Yi-Hsuan Tsai
Ming-Hsuan Yang
ObjD
19
4
0
20 Apr 2021
Detector-Free Weakly Supervised Grounding by Separation
Assaf Arbelle
Sivan Doveh
Amit Alfassy
J. Shtok
Guy Lev
...
Kate Saenko
S. Ullman
Raja Giryes
Rogerio Feris
Leonid Karlinsky
35
23
0
20 Apr 2021
Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
Chenyi Lei
Shixian Luo
Yong-jin Liu
Wanggui He
Jiamang Wang
Guoxin Wang
Haihong Tang
C. Miao
Houqiang Li
28
41
0
19 Apr 2021
BM-NAS: Bilevel Multimodal Neural Architecture Search
Yihang Yin
Siyu Huang
Xiang Zhang
32
27
0
19 Apr 2021
Previous
1
2
3
...
34
35
36
...
40
41
42
Next