Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,088 papers shown
Title
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering
Aisha Urooj Khan
Amir Mazaheri
N. Lobo
M. Shah
24
56
0
27 Oct 2020
Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions
Radhika Dua
Sai Srinivas Kancheti
V. Balasubramanian
LRM
30
22
0
24 Oct 2020
Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions
Liunian Harold Li
Haoxuan You
Zhecan Wang
Alireza Zareian
Shih-Fu Chang
Kai-Wei Chang
SSL
VLM
64
12
0
24 Oct 2020
Multilingual Speech Translation with Efficient Finetuning of Pretrained Models
Xian Li
Changhan Wang
Yun Tang
C. Tran
Yuqing Tang
J. Pino
Alexei Baevski
Alexis Conneau
Michael Auli
19
6
0
24 Oct 2020
Can images help recognize entities? A study of the role of images for Multimodal NER
Shuguang Chen
Gustavo Aguilar
Leonardo Neves
Thamar Solorio
EgoV
37
33
0
23 Oct 2020
GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight Gated Injection Method
Nicole Peinelt
Marek Rei
Maria Liakata
14
2
0
23 Oct 2020
ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken Language Understanding
Minjeong Kim
Gyuwan Kim
Sang-Woo Lee
Jung-Woo Ha
VLM
24
34
0
23 Oct 2020
Language-Conditioned Imitation Learning for Robot Manipulation Tasks
Simon Stepputtis
Joseph Campbell
Mariano Phielipp
Stefan Lee
Chitta Baral
H. B. Amor
LM&Ro
114
192
0
22 Oct 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
20
39,174
0
22 Oct 2020
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies
Itai Gat
Idan Schwartz
A. Schwing
Tamir Hazan
51
89
0
21 Oct 2020
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
18
6
0
19 Oct 2020
Towards Data Distillation for End-to-end Spoken Conversational Question Answering
Chenyu You
Nuo Chen
Fenglin Liu
Dongchao Yang
Yuexian Zou
6
45
0
18 Oct 2020
Knowledge-Grounded Dialogue Generation with Pre-trained Language Models
Xueliang Zhao
Wei Yu Wu
Can Xu
Chongyang Tao
Dongyan Zhao
Rui Yan
175
191
0
17 Oct 2020
Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering
Hantao Huang
Tao Han
Wei Han
D. Yap
Cheng-Ming Chiang
13
2
0
17 Oct 2020
Unsupervised Natural Language Inference via Decoupled Multimodal Contrastive Learning
Wanyun Cui
Guangyu Zheng
Wei Wang
SSL
14
21
0
16 Oct 2020
Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs
Ana Marasović
Chandra Bhagavatula
J. S. Park
Ronan Le Bras
Noah A. Smith
Yejin Choi
ReLM
LRM
18
62
0
15 Oct 2020
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision
Hao Tan
Mohit Bansal
CLIP
11
120
0
14 Oct 2020
A Multi-Modal Method for Satire Detection using Textual and Visual Cues
Lily Li
Or Levi
Pedram Hosseini
David A. Broniatowski
9
21
0
13 Oct 2020
CAPT: Contrastive Pre-Training for Learning Denoised Sequence Representations
Fuli Luo
Pengcheng Yang
Shicheng Li
Xuancheng Ren
Xu Sun
VLM
SSL
10
16
0
13 Oct 2020
Contrast and Classify: Training Robust VQA Models
Yash Kant
A. Moudgil
Dhruv Batra
Devi Parikh
Harsh Agrawal
19
5
0
13 Oct 2020
Webly Supervised Image Classification with Metadata: Automatic Noisy Label Correction via Visual-Semantic Graph
Jingkang Yang
Weirong Chen
Litong Feng
Xiaopeng Yan
Huabin Zheng
Wayne Zhang
NoLa
19
13
0
12 Oct 2020
Beyond Language: Learning Commonsense from Images for Reasoning
Wanqing Cui
Yanyan Lan
Liang Pang
Jiafeng Guo
Xueqi Cheng
LRM
13
5
0
10 Oct 2020
comp-syn: Perceptually Grounded Word Embeddings with Color
Bhargav Srinivasa Desikan
Tasker Hull
E. Nadler
Douglas Guilbeault
Aabir Abubaker Kar
Mark Chu
Donald Ruggiero Lo Sardo
6
7
0
08 Oct 2020
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
Mohit Shridhar
Xingdi Yuan
Marc-Alexandre Côté
Yonatan Bisk
Adam Trischler
Matthew J. Hausknecht
LM&Ro
LLMAG
16
392
0
08 Oct 2020
Multi-label classification of promotions in digital leaflets using textual and visual information
R. Arroyo
David Jiménez-Cabello
Javier Martínez-Cebrián
6
3
0
07 Oct 2020
ZEST: Zero-shot Learning from Text Descriptions using Textual Similarity and Visual Summarization
Tzuf Paz-Argaman
Y. Atzmon
Gal Chechik
Reut Tsarfaty
VLM
16
32
0
07 Oct 2020
Learning to Represent Image and Text with Denotation Graph
Bowen Zhang
Hexiang Hu
Vihan Jain
Eugene Ie
Fei Sha
6
21
0
06 Oct 2020
Support-set bottlenecks for video-text representation learning
Mandela Patrick
Po-Yao (Bernie) Huang
Yuki M. Asano
Florian Metze
Alexander G. Hauptmann
João Henriques
Andrea Vedaldi
20
242
0
06 Oct 2020
Pathological Visual Question Answering
Xuehai He
Zhuo Cai
Wenlan Wei
Yichen Zhang
Luntian Mou
Eric P. Xing
P. Xie
62
24
0
06 Oct 2020
Attention Guided Semantic Relationship Parsing for Visual Question Answering
M. Farazi
Salman Khan
Nick Barnes
11
2
0
05 Oct 2020
Multi-Modal Open-Domain Dialogue
Kurt Shuster
Eric Michael Smith
Da Ju
Jason Weston
AI4CE
28
42
0
02 Oct 2020
Contrastive Learning of Medical Visual Representations from Paired Images and Text
Yuhao Zhang
Hang Jiang
Yasuhide Miura
Christopher D. Manning
C. Langlotz
MedIm
13
730
0
02 Oct 2020
Learning Object Detection from Captions via Textual Scene Attributes
Achiya Jerbi
Roei Herzig
Jonathan Berant
Gal Chechik
Amir Globerson
22
21
0
30 Sep 2020
GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing
Tao Yu
Chien-Sheng Wu
Xi Victoria Lin
Bailin Wang
Y. Tan
Xinyi Yang
Dragomir R. Radev
R. Socher
Caiming Xiong
LMTD
11
247
0
29 Sep 2020
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
Xiaowei Hu
Xi Yin
Kevin Qinghong Lin
Lijuan Wang
L. Zhang
Jianfeng Gao
Zicheng Liu
VLM
8
56
0
28 Sep 2020
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
Jaemin Cho
Jiasen Lu
Dustin Schwenk
Hannaneh Hajishirzi
Aniruddha Kembhavi
VLM
MLLM
19
102
0
23 Sep 2020
Preserving Integrity in Online Social Networks
A. Halevy
Cristian Canton Ferrer
Hao Ma
Umut Ozertem
Patrick Pantel
Marzieh Saeidi
Fabrizio Silvestri
Ves Stoyanov
17
57
0
22 Sep 2020
MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
OOD
14
139
0
18 Sep 2020
A Multimodal Memes Classification: A Survey and Open Research Issues
Tariq Habib Afridi
A. Alam
Muhammad Numan Khan
Jawad Khan
Young-Koo Lee
19
34
0
17 Sep 2020
Multi-modal Summarization for Video-containing Documents
Xiyan Fu
Jun Wang
Zhenglu Yang
20
23
0
17 Sep 2020
Machine Learning for Temporal Data in Finance: Challenges and Opportunities
J. Wittenbach
Learning McLean
Virginia Brian
AI4TS
4
1
0
11 Sep 2020
Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models
Khyathi Raghavi Chandu
Piyush Sharma
Soravit Changpinyo
Ashish V. Thapliyal
Radu Soricut
DiffM
VLM
21
3
0
10 Sep 2020
Investigating Gender Bias in BERT
Rishabh Bhardwaj
Navonil Majumder
Soujanya Poria
25
105
0
10 Sep 2020
Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations
Meng-Jiun Chiou
Roger Zimmermann
Jiashi Feng
13
1
0
10 Sep 2020
Video Moment Retrieval via Natural Language Queries
Xinli Yu
Mohsen Malmir
C. He
Yue Liu
Rex Wu
9
1
0
04 Sep 2020
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports
Yikuan Li
Hanyin Wang
Yuan Luo
6
63
0
03 Sep 2020
Practical Cross-modal Manifold Alignment for Grounded Language
A. Nguyen
Luke E. Richards
Gaoussou Youssouf Kebe
Edward Raff
Kasra Darvish
Frank Ferraro
Cynthia Matuszek
6
4
0
01 Sep 2020
Active Contrastive Learning of Audio-Visual Video Representations
Shuang Ma
Zhaoyang Zeng
Daniel J. McDuff
Yale Song
VLM
SSL
16
8
0
31 Aug 2020
A Survey of Visual Analytics Techniques for Machine Learning
Jun Yuan
Changjian Chen
Weikai Yang
Mengchen Liu
Jiazhi Xia
Shixia Liu
19
215
0
21 Aug 2020
Linguistically-aware Attention for Reducing the Semantic-Gap in Vision-Language Tasks
K. Gouthaman
Athira M. Nambiar
K. Srinivas
Anurag Mittal
VLM
19
12
0
18 Aug 2020
Previous
1
2
3
...
38
39
40
41
42
Next