Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Neural Information Processing Systems (NeurIPS), 2019
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,232 papers shown
AMMU : A Survey of Transformer-based Biomedical Pretrained Language Models
Journal of Biomedical Informatics (JBI), 2021
Katikapalli Subramanyam Kalyan
A. Rajasekharan
S. Sangeetha
LM&MA
MedIm
389
192
0
16 Apr 2021
Cross-Modal Retrieval Augmentation for Multi-Modal Classification
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Shir Gur
Natalia Neverova
C. Stauffer
Ser-Nam Lim
Douwe Kiela
A. Reiter
217
36
0
16 Apr 2021
Effect of Visual Extensions on Natural Language Understanding in Vision-and-Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Taichi Iki
Akiko Aizawa
VLM
237
22
0
16 Apr 2021
Exploring Visual Engagement Signals for Representation Learning
IEEE International Conference on Computer Vision (ICCV), 2021
Menglin Jia
Zuxuan Wu
A. Reiter
Claire Cardie
Serge Belongie
Ser-Nam Lim
174
15
0
15 Apr 2021
Learning Zero-Shot Multifaceted Visually Grounded Word Embeddings via Multi-Task Training
Conference on Computational Natural Language Learning (CoNLL), 2021
Hassan Shahmohammadi
Hendrik P. A. Lensch
R. Baayen
194
19
0
15 Apr 2021
MultiModalQA: Complex Question Answering over Text, Tables and Images
International Conference on Learning Representations (ICLR), 2021
Alon Talmor
Ori Yoran
Amnon Catav
Dan Lahav
Yizhong Wang
Akari Asai
Gabriel Ilharco
Hannaneh Hajishirzi
Jonathan Berant
LMTD
279
210
0
13 Apr 2021
Disentangled Motif-aware Graph Learning for Phrase Grounding
AAAI Conference on Artificial Intelligence (AAAI), 2021
Zongshen Mu
Siliang Tang
Jie Tan
Qiang Yu
Yueting Zhuang
GNN
251
38
0
13 Apr 2021
Escaping the Big Data Paradigm with Compact Transformers
Ali Hassani
Steven Walton
Nikhil Shah
Abulikemu Abuduweili
Jiachen Li
Humphrey Shi
550
547
0
12 Apr 2021
FreSaDa: A French Satire Data Set for Cross-Domain Satire Detection
IEEE International Joint Conference on Neural Network (IJCNN), 2021
Radu Tudor Ionescu
Adrian-Gabriel Chifu
158
14
0
10 Apr 2021
The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation
IEEE International Conference on Computer Vision (ICCV), 2021
Yuankai Qi
Zizheng Pan
Yicong Hong
Ming-Hsuan Yang
Anton Van Den Hengel
Qi Wu
LM&Ro
239
79
0
09 Apr 2021
Exploiting Natural Language for Efficient Risk-Aware Multi-robot SaR Planning
IEEE Robotics and Automation Letters (RA-L), 2021
Vikram Shree
B. Asfora
Rachel Zheng
Samantha Hong
Jacopo Banfi
M. Campbell
120
13
0
08 Apr 2021
Video Question Answering with Phrases via Semantic Roles
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Arka Sadhu
Kan Chen
Ram Nevatia
177
16
0
08 Apr 2021
How Transferable are Reasoning Patterns in VQA?
Computer Vision and Pattern Recognition (CVPR), 2021
Corentin Kervadec
Theo Jaunet
G. Antipov
M. Baccouche
Romain Vuillemot
Christian Wolf
LRM
149
29
0
08 Apr 2021
Multimodal Fusion Refiner Networks
Sethuraman Sankaran
David Yang
Ser-Nam Lim
OffRL
172
8
0
08 Apr 2021
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
IEEE International Conference on Computer Vision (ICCV), 2021
Corentin Dancette
Rémi Cadène
Damien Teney
Matthieu Cord
CML
332
91
0
07 Apr 2021
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2021
Zhicheng Huang
Zhaoyang Zeng
Yupan Huang
Bei Liu
Dongmei Fu
Jianlong Fu
VLM
ViT
428
302
0
07 Apr 2021
Compressing Visual-linguistic Model via Knowledge Distillation
IEEE International Conference on Computer Vision (ICCV), 2021
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lijuan Wang
Yezhou Yang
Zicheng Liu
VLM
283
116
0
05 Apr 2021
MMBERT: Multimodal BERT Pretraining for Improved Medical VQA
IEEE International Symposium on Biomedical Imaging (ISBI), 2021
Yash Khare
Viraj Bagal
Minesh Mathew
Adithi Devi
U. Priyakumar
C. V. Jawahar
MedIm
280
172
0
03 Apr 2021
VisQA: X-raying Vision and Language Reasoning in Transformers
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2021
Theo Jaunet
Corentin Kervadec
Romain Vuillemot
G. Antipov
M. Baccouche
Christian Wolf
301
32
0
02 Apr 2021
Towards General Purpose Vision Systems
Computer Vision and Pattern Recognition (CVPR), 2021
Tanmay Gupta
Amita Kamath
Aniruddha Kembhavi
Derek Hoiem
275
55
0
01 Apr 2021
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
Computer Vision and Pattern Recognition (CVPR), 2021
Mingyang Zhou
Luowei Zhou
Shuohang Wang
Yu Cheng
Linjie Li
Zhou Yu
Jingjing Liu
MLLM
VLM
235
107
0
01 Apr 2021
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning
Luowei Zhou
Jingjing Liu
Yu Cheng
Zhe Gan
Lei Zhang
196
7
0
01 Apr 2021
A Survey on Natural Language Video Localization
Xinfang Liu
Xiushan Nie
Zhifang Tan
Jie Guo
Yilong Yin
248
9
0
01 Apr 2021
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
IEEE International Conference on Computer Vision (ICCV), 2021
Or Patashnik
Zongze Wu
Eli Shechtman
Daniel Cohen-Or
Dani Lischinski
CLIP
VLM
390
1,373
0
31 Mar 2021
Diagnosing Vision-and-Language Navigation: What Really Matters
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Wanrong Zhu
Yuankai Qi
P. Narayana
Kazoo Sone
Sugato Basu
Xinze Wang
Qi Wu
Miguel P. Eckstein
Wenjie Wang
LM&Ro
233
55
0
30 Mar 2021
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
Computer Vision and Pattern Recognition (CVPR), 2021
Antoine Miech
Jean-Baptiste Alayrac
Ivan Laptev
Josef Sivic
Andrew Zisserman
ViT
329
160
0
30 Mar 2021
Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
Computer Vision and Pattern Recognition (CVPR), 2021
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Linbo Jin
Ben Chen
Hao Zhou
Minghui Qiu
Ling Shao
VLM
350
134
0
30 Mar 2021
Self-supervised Image-text Pre-training With Mixed Data In Chest X-rays
Xiaosong Wang
Ziyue Xu
Leo K. Tam
Dong Yang
Daguang Xu
ViT
MedIm
140
25
0
30 Mar 2021
Domain-robust VQA with diverse datasets and methods but no target labels
Computer Vision and Pattern Recognition (CVPR), 2021
Ruotong Wang
Tristan D. Maidment
Ahmad Diab
Adriana Kovashka
R. Hwa
OOD
300
25
0
29 Mar 2021
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers
IEEE International Conference on Computer Vision (ICCV), 2021
Hila Chefer
Shir Gur
Lior Wolf
ViT
358
412
0
29 Mar 2021
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
IEEE International Conference on Computer Vision (ICCV), 2021
Pengchuan Zhang
Xiyang Dai
Jianwei Yang
Bin Xiao
Lu Yuan
Lei Zhang
Jianfeng Gao
ViT
306
373
0
29 Mar 2021
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval
IEEE International Conference on Computer Vision (ICCV), 2021
Song Liu
Haoqi Fan
Shengsheng Qian
Yiru Chen
Wenkui Ding
Zhongyuan Wang
343
166
0
28 Mar 2021
'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) Tasks
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2021
Man Luo
Shailaja Keyur Sampat
Riley Tallman
Yankai Zeng
Manuha Vancha
Akarshan Sajja
Chitta Baral
155
11
0
28 Mar 2021
Generating and Evaluating Explanations of Attended and Error-Inducing Input Regions for VQA Models
Applied AI Letters (AA), 2021
Arijit Ray
Michael Cogswell
Xiaoyu Lin
Kamran Alipour
Ajay Divakaran
Yi Yao
Giedrius Burachas
FAtt
153
5
0
26 Mar 2021
Understanding Robustness of Transformers for Image Classification
IEEE International Conference on Computer Vision (ICCV), 2021
Srinadh Bhojanapalli
Ayan Chakrabarti
Daniel Glasner
Daliang Li
Thomas Unterthiner
Andreas Veit
ViT
313
472
0
26 Mar 2021
Describing and Localizing Multiple Changes with Transformers
IEEE International Conference on Computer Vision (ICCV), 2021
Yue Qiu
Shintaro Yamamoto
Kodai Nakashima
Ryota Suzuki
K. Iwata
Hirokatsu Kataoka
Y. Satoh
240
91
0
25 Mar 2021
Visual Grounding Strategies for Text-Only Natural Language Processing
Damien Sileo
103
9
0
25 Mar 2021
VLGrammar: Grounded Grammar Induction of Vision and Language
IEEE International Conference on Computer Vision (ICCV), 2021
Yining Hong
Qing Li
Song-Chun Zhu
Siyuan Huang
VLM
177
26
0
24 Mar 2021
Scene-Intuitive Agent for Remote Embodied Visual Grounding
Computer Vision and Pattern Recognition (CVPR), 2021
Xiangru Lin
Guanbin Li
Yizhou Yu
LM&Ro
190
60
0
24 Mar 2021
Multi-Modal Answer Validation for Knowledge-Based VQA
AAAI Conference on Artificial Intelligence (AAAI), 2021
Jialin Wu
Jiasen Lu
Ashish Sabharwal
Roozbeh Mottaghi
377
167
0
23 Mar 2021
Instance-level Image Retrieval using Reranking Transformers
IEEE International Conference on Computer Vision (ICCV), 2021
Fuwen Tan
Jiangbo Yuan
Vicente Ordonez
ViT
357
107
0
22 Mar 2021
Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval
Transactions of the Association for Computational Linguistics (TACL), 2021
Gregor Geigle
Jonas Pfeiffer
Nils Reimers
Ivan Vulić
Iryna Gurevych
307
61
0
22 Mar 2021
DeepViT: Towards Deeper Vision Transformer
Daquan Zhou
Bingyi Kang
Xiaojie Jin
Linjie Yang
Xiaochen Lian
Zihang Jiang
Qibin Hou
Jiashi Feng
ViT
348
604
0
22 Mar 2021
Incorporating Convolution Designs into Visual Transformers
IEEE International Conference on Computer Vision (ICCV), 2021
Kun Yuan
Shaopeng Guo
Ziwei Liu
Aojun Zhou
F. Yu
Wei Wu
ViT
300
566
0
22 Mar 2021
MaAST: Map Attention with Semantic Transformersfor Efficient Visual Navigation
IEEE International Conference on Robotics and Automation (ICRA), 2021
Zachary Seymour
Kowshik Thopalli
Niluthpol Chowdhury Mithun
Han-Pang Chiu
S. Samarasekera
Rakesh Kumar
3DPC
152
20
0
21 Mar 2021
Let Your Heart Speak in its Mother Tongue: Multilingual Captioning of Cardiac Signals
Dani Kiyasseh
T. Zhu
David Clifton
238
0
0
19 Mar 2021
Variational Knowledge Distillation for Disease Classification in Chest X-Rays
Information Processing in Medical Imaging (IPMI), 2021
Tom van Sonsbeek
Xiantong Zhen
M. Worring
Ling Shao
86
17
0
19 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
IEEE International Conference on Computer Vision (ICCV), 2021
Mandela Patrick
Yuki M. Asano
Bernie Huang
Ishan Misra
Florian Metze
Joao Henriques
Andrea Vedaldi
AI4TS
278
36
0
18 Mar 2021
Few-Shot Visual Grounding for Natural Human-Robot Interaction
Georgios Tziafas
S. Kasaei
202
7
0
17 Mar 2021
On the Role of Images for Analyzing Claims in Social Media
Gullal Singh Cheema
Sherzod Hakimov
Eric Müller-Budack
Ralph Ewerth
260
10
0
17 Mar 2021
Previous
1
2
3
...
38
39
40
...
43
44
45
Next
Page 39 of 45
Page
of 45
Go