ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02265
  4. Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

Neural Information Processing Systems (NeurIPS), 2019
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
    SSLVLM
ArXiv (abs)PDFHTML

Papers citing "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"

50 / 2,233 papers shown
Supervised Contrastive Learning for Multimodal Unreliable News Detection
  in COVID-19 Pandemic
Supervised Contrastive Learning for Multimodal Unreliable News Detection in COVID-19 Pandemic
Wenjia Zhang
Lin Gui
Yulan He
143
43
0
04 Sep 2021
Multimodal Conditionality for Natural Language Generation
Multimodal Conditionality for Natural Language Generation
Michael Sollami
Aashish Jain
154
10
0
02 Sep 2021
Point-of-Interest Type Prediction using Text and Images
Point-of-Interest Type Prediction using Text and ImagesConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Danae Sánchez Villegas
Nikolaos Aletras
234
15
0
01 Sep 2021
WebQA: Multihop and Multimodal QA
WebQA: Multihop and Multimodal QAComputer Vision and Pattern Recognition (CVPR), 2021
Yingshan Chang
M. Narang
Hisami Suzuki
Guihong Cao
Jianfeng Gao
Yonatan Bisk
LRM
389
122
0
01 Sep 2021
CTAL: Pre-training Cross-modal Transformer for Audio-and-Language
  Representations
CTAL: Pre-training Cross-modal Transformer for Audio-and-Language RepresentationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Hang Li
Yunxing Kang
Tianqiao Liu
Wenbiao Ding
Zitao Liu
190
20
0
01 Sep 2021
On the Significance of Question Encoder Sequence Model in the
  Out-of-Distribution Performance in Visual Question Answering
On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering
K. Gouthaman
Anurag Mittal
CML
229
0
0
28 Aug 2021
Product-oriented Machine Translation with Cross-modal Cross-lingual
  Pre-training
Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-trainingACM Multimedia (ACM MM), 2021
Yuqing Song
Shizhe Chen
Qin Jin
Wei Luo
Jun Xie
Fei Huang
211
25
0
25 Aug 2021
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
Hanbo Zhang
Yunfan Lu
Cunjun Yu
David Hsu
Xuguang Lan
Nanning Zheng
LM&Ro
252
71
0
25 Aug 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionInternational Conference on Learning Representations (ICLR), 2021
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLMMLLM
1.0K
927
0
24 Aug 2021
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
TACo: Token-aware Cascade Contrastive Learning for Video-Text AlignmentIEEE International Conference on Computer Vision (ICCV), 2021
Jianwei Yang
Yonatan Bisk
Jianfeng Gao
258
7
0
23 Aug 2021
From Two to One: A New Scene Text Recognizer with Visual Language
  Modeling Network
From Two to One: A New Scene Text Recognizer with Visual Language Modeling NetworkIEEE International Conference on Computer Vision (ICCV), 2021
Yuxin Wang
Hongtao Xie
Shancheng Fang
Jing Wang
Shenggao Zhu
Yongdong Zhang
VLM
254
180
0
22 Aug 2021
Multimodal Breast Lesion Classification Using Cross-Attention Deep
  Networks
Multimodal Breast Lesion Classification Using Cross-Attention Deep Networks
Hung Q. Vo
Pengyu Yuan
T. He
Stephen T. C. Wong
H. Nguyen
203
5
0
21 Aug 2021
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Ming Yan
Haiyang Xu
Chenliang Li
Bin Bi
Junfeng Tian
Min Gui
Wei Wang
VLM
137
11
0
21 Aug 2021
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Pierre-Louis Guhur
Makarand Tapaswi
Shizhe Chen
Ivan Laptev
Cordelia Schmid
LM&Ro
231
172
0
20 Aug 2021
Knowledge Perceived Multi-modal Pretraining in E-commerce
Knowledge Perceived Multi-modal Pretraining in E-commerce
Yushan Zhu
Huaixiao Tou
Wen Zhang
Ganqiang Ye
Hui Chen
Ningyu Zhang
Huajun Chen
262
38
0
20 Aug 2021
Detection of Illicit Drug Trafficking Events on Instagram: A Deep
  Multimodal Multilabel Learning Approach
Detection of Illicit Drug Trafficking Events on Instagram: A Deep Multimodal Multilabel Learning Approach
Chuanbo Hu
Minglei Yin
Bin Liu
Xin Li
Yanfang Ye
119
18
0
19 Aug 2021
X-modaler: A Versatile and High-performance Codebase for Cross-modal
  Analytics
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Yehao Li
Yingwei Pan
Jingwen Chen
Ting Yao
Tao Mei
VLM
195
37
0
18 Aug 2021
Who's Waldo? Linking People Across Text and Images
Who's Waldo? Linking People Across Text and Images
Claire Yuqing Cui
Apoorv Khandelwal
Yoav Artzi
Noah Snavely
Hadar Averbuch-Elor
237
21
0
16 Aug 2021
MMChat: Multi-Modal Chat Dataset on Social Media
MMChat: Multi-Modal Chat Dataset on Social Media
Yinhe Zheng
Guanyi Chen
Xin Liu
K. Lin
338
38
0
16 Aug 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and
  Intra-modal Knowledge Integration
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
177
59
0
16 Aug 2021
Video Transformer for Deepfake Detection with Incremental Learning
Video Transformer for Deepfake Detection with Incremental LearningACM Multimedia (ACM MM), 2021
Sohail Ahmed Khan
Hang Dai
ViT
215
77
0
11 Aug 2021
BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease
  Diagnosis
BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis
Masoud Monajatipoor
Mozhdeh Rouhsedaghat
Liunian Harold Li
Aichi Chien
C.-C. Jay Kuo
Fabien Scalzo
Kai-Wei Chang
LM&MAMedIm
118
42
0
10 Aug 2021
Embodied BERT: A Transformer Model for Embodied, Language-guided Visual
  Task Completion
Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion
Alessandro Suglia
Qiaozi Gao
Jesse Thomason
Govind Thattai
Gaurav Sukhatme
LM&Ro
372
84
0
10 Aug 2021
Relation-aware Compositional Zero-shot Learning for Attribute-Object
  Pair Recognition
Relation-aware Compositional Zero-shot Learning for Attribute-Object Pair RecognitionIEEE transactions on multimedia (IEEE Trans. Multimedia), 2021
Ziwei Xu
Guangzhi Wang
Yongkang Wong
Mohan S. Kankanhalli
179
35
0
10 Aug 2021
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language
  Models
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language ModelsIEEE International Conference on Computer Vision (ICCV), 2021
Zheyuan Liu
Cristian Rodriguez-Opazo
Damien Teney
Stephen Gould
VLM
298
306
0
09 Aug 2021
Disentangling Hate in Online Memes
Disentangling Hate in Online MemesACM Multimedia (ACM MM), 2021
Rui Cao
Ziqing Fan
Roy Ka-wei Lee
Wen-Haw Chong
Jing Jiang
289
106
0
09 Aug 2021
Detecting Propaganda Techniques in Memes
Detecting Propaganda Techniques in MemesAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Dimitar Dimitrov
Bishr Bin Ali
Shaden Shaar
Firoj Alam
Fabrizio Silvestri
Hamed Firooz
Preslav Nakov
Giovanni Da San Martino
255
104
0
07 Aug 2021
Interpretable Visual Understanding with Cognitive Attention Network
Interpretable Visual Understanding with Cognitive Attention NetworkInternational Conference on Artificial Neural Networks (ICANN), 2021
Xuejiao Tang
Wenbin Zhang
Yi Yu
Kea Turner
Hanyu Wang
Mengyu Wang
Eirini Ntoutsi
290
19
0
06 Aug 2021
StrucTexT: Structured Text Understanding with Multi-Modal Transformers
StrucTexT: Structured Text Understanding with Multi-Modal TransformersACM Multimedia (ACM MM), 2021
Yulin Li
Yuxi Qian
Yuchen Yu
Xiameng Qin
Chengquan Zhang
Yan Liu
Kun Yao
Junyu Han
Jingtuo Liu
Errui Ding
316
140
0
06 Aug 2021
Fast Convergence of DETR with Spatially Modulated Co-Attention
Fast Convergence of DETR with Spatially Modulated Co-AttentionIEEE International Conference on Computer Vision (ICCV), 2021
Shiyang Feng
Minghang Zheng
Xiaogang Wang
Jifeng Dai
Jiaming Song
ViT
274
375
0
05 Aug 2021
Exploiting BERT For Multimodal Target Sentiment Classification Through
  Input Space Translation
Exploiting BERT For Multimodal Target Sentiment Classification Through Input Space Translation
Zaid Khan
Y. Fu
178
182
0
03 Aug 2021
Representation learning for neural population activity with Neural Data
  Transformers
Representation learning for neural population activity with Neural Data Transformers
Joel Ye
C. Pandarinath
AI4TSAI4CE
417
80
0
02 Aug 2021
StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
Rinon Gal
Or Patashnik
Haggai Maron
Gal Chechik
Daniel Cohen-Or
CLIPVLM
308
286
0
02 Aug 2021
Word2Pix: Word to Pixel Cross Attention Transformer in Visual Grounding
Word2Pix: Word to Pixel Cross Attention Transformer in Visual GroundingIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
Heng Zhao
Qiufeng Wang
Yew-Soon Ong
ObjD
208
34
0
31 Jul 2021
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval
  via Cross-modal Pretraining
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal PretrainingIEEE International Conference on Computer Vision (ICCV), 2021
Xunlin Zhan
Yangxin Wu
Xiao Dong
Yunchao Wei
Minlong Lu
Yichi Zhang
Hang Xu
Xiaodan Liang
ViT
297
82
0
30 Jul 2021
Multimodal Co-learning: Challenges, Applications with Datasets, Recent
  Advances and Future Directions
Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future DirectionsInformation Fusion (Inf. Fusion), 2021
Anil Rahate
Rahee Walambe
S. Ramanna
K. Kotecha
429
179
0
29 Jul 2021
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods
  in Natural Language Processing
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language ProcessingACM Computing Surveys (CSUR), 2021
Pengfei Liu
Weizhe Yuan
Jinlan Fu
Zhengbao Jiang
Hiroaki Hayashi
Graham Neubig
VLMSyDa
798
4,996
0
28 Jul 2021
Exceeding the Limits of Visual-Linguistic Multi-Task Learning
Exceeding the Limits of Visual-Linguistic Multi-Task Learning
Cameron R. Wolfe
Keld T. Lundgaard
VLM
186
3
0
27 Jul 2021
Language Grounding with 3D Objects
Language Grounding with 3D ObjectsConference on Robot Learning (CoRL), 2021
Jesse Thomason
Mohit Shridhar
Yonatan Bisk
Chris Paxton
Luke Zettlemoyer
LM&Ro
224
55
0
26 Jul 2021
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
Spatial-Temporal Transformer for Dynamic Scene Graph GenerationIEEE International Conference on Computer Vision (ICCV), 2021
Yuren Cong
Wentong Liao
H. Ackermann
Bodo Rosenhahn
M. Yang
ViT
357
151
0
26 Jul 2021
Multi-stage Pre-training over Simplified Multimodal Pre-training Models
Multi-stage Pre-training over Simplified Multimodal Pre-training ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Tongtong Liu
Fangxiang Feng
Caixia Yuan
79
16
0
22 Jul 2021
DRDF: Determining the Importance of Different Multimodal Information
  with Dual-Router Dynamic Framework
DRDF: Determining the Importance of Different Multimodal Information with Dual-Router Dynamic FrameworkACM Multimedia (ACM MM), 2021
Haiwen Hong
Xuan Jin
Yin Zhang
Yunqing Hu
Jingfeng Zhang
Yuan He
Hui Xue
MoE
131
0
0
21 Jul 2021
Neural Variational Learning for Grounded Language Acquisition
Neural Variational Learning for Grounded Language AcquisitionIEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2021
Nisha Pillai
Cynthia Matuszek
Francis Ferraro
VLMSSLGANDRL
256
2
0
20 Jul 2021
Neural Abstructions: Abstractions that Support Construction for Grounded
  Language Learning
Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning
Kaylee Burns
Christopher D. Manning
Li Fei-Fei
226
0
0
20 Jul 2021
Separating Skills and Concepts for Novel Visual Question Answering
Separating Skills and Concepts for Novel Visual Question AnsweringComputer Vision and Pattern Recognition (CVPR), 2021
Spencer Whitehead
Hui Wu
Heng Ji
Rogerio Feris
Kate Saenko
CoGe
192
38
0
19 Jul 2021
Constructing Multi-Modal Dialogue Dataset by Replacing Text with
  Semantically Relevant Images
Constructing Multi-Modal Dialogue Dataset by Replacing Text with Semantically Relevant ImagesAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Nyoungwoo Lee
Suwon Shin
Jaegul Choo
Ho‐Jin Choi
S. Myaeng
180
32
0
19 Jul 2021
Align before Fuse: Vision and Language Representation Learning with
  Momentum Distillation
Align before Fuse: Vision and Language Representation Learning with Momentum DistillationNeural Information Processing Systems (NeurIPS), 2021
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
865
2,536
0
16 Jul 2021
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
Paul Pu Liang
Yiwei Lyu
Xiang Fan
Zetian Wu
Yun Cheng
...
Peter Wu
Michelle A. Lee
Yuke Zhu
Ruslan Salakhutdinov
Louis-Philippe Morency
VLM
306
229
0
15 Jul 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
From Show to Tell: A Survey on Deep Learning-based Image CaptioningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DVVLMMLLM
464
358
0
14 Jul 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIPVLMMLLM
521
480
0
13 Jul 2021
Previous
123...353637...434445
Next
Page 36 of 45
Pageof 45