ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02265
  4. Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

Neural Information Processing Systems (NeurIPS), 2019
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
    SSLVLM
ArXiv (abs)PDFHTML

Papers citing "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"

50 / 2,232 papers shown
Multimodal Conditionality for Natural Language Generation
Multimodal Conditionality for Natural Language Generation
Michael Sollami
Aashish Jain
117
10
0
02 Sep 2021
Point-of-Interest Type Prediction using Text and Images
Point-of-Interest Type Prediction using Text and ImagesConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Danae Sánchez Villegas
Nikolaos Aletras
202
15
0
01 Sep 2021
WebQA: Multihop and Multimodal QA
WebQA: Multihop and Multimodal QAComputer Vision and Pattern Recognition (CVPR), 2021
Yingshan Chang
M. Narang
Hisami Suzuki
Guihong Cao
Jianfeng Gao
Yonatan Bisk
LRM
382
113
0
01 Sep 2021
CTAL: Pre-training Cross-modal Transformer for Audio-and-Language
  Representations
CTAL: Pre-training Cross-modal Transformer for Audio-and-Language RepresentationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Hang Li
Yunxing Kang
Tianqiao Liu
Wenbiao Ding
Zitao Liu
173
20
0
01 Sep 2021
On the Significance of Question Encoder Sequence Model in the
  Out-of-Distribution Performance in Visual Question Answering
On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering
K. Gouthaman
Anurag Mittal
CML
228
0
0
28 Aug 2021
Product-oriented Machine Translation with Cross-modal Cross-lingual
  Pre-training
Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-trainingACM Multimedia (ACM MM), 2021
Yuqing Song
Shizhe Chen
Qin Jin
Wei Luo
Jun Xie
Fei Huang
211
25
0
25 Aug 2021
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
Hanbo Zhang
Yunfan Lu
Cunjun Yu
David Hsu
Xuguang Lan
Nanning Zheng
LM&Ro
232
71
0
25 Aug 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionInternational Conference on Learning Representations (ICLR), 2021
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLMMLLM
871
915
0
24 Aug 2021
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
TACo: Token-aware Cascade Contrastive Learning for Video-Text AlignmentIEEE International Conference on Computer Vision (ICCV), 2021
Jianwei Yang
Yonatan Bisk
Jianfeng Gao
238
154
0
23 Aug 2021
From Two to One: A New Scene Text Recognizer with Visual Language
  Modeling Network
From Two to One: A New Scene Text Recognizer with Visual Language Modeling NetworkIEEE International Conference on Computer Vision (ICCV), 2021
Yuxin Wang
Hongtao Xie
Shancheng Fang
Jing Wang
Shenggao Zhu
Yongdong Zhang
VLM
247
175
0
22 Aug 2021
Multimodal Breast Lesion Classification Using Cross-Attention Deep
  Networks
Multimodal Breast Lesion Classification Using Cross-Attention Deep Networks
Hung Q. Vo
Pengyu Yuan
T. He
Stephen T. C. Wong
H. Nguyen
167
5
0
21 Aug 2021
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Ming Yan
Haiyang Xu
Chenliang Li
Bin Bi
Junfeng Tian
Min Gui
Wei Wang
VLM
136
11
0
21 Aug 2021
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Airbert: In-domain Pretraining for Vision-and-Language Navigation
Pierre-Louis Guhur
Makarand Tapaswi
Shizhe Chen
Ivan Laptev
Cordelia Schmid
LM&Ro
213
167
0
20 Aug 2021
Knowledge Perceived Multi-modal Pretraining in E-commerce
Knowledge Perceived Multi-modal Pretraining in E-commerce
Yushan Zhu
Huaixiao Tou
Wen Zhang
Ganqiang Ye
Hui Chen
Ningyu Zhang
Huajun Chen
250
38
0
20 Aug 2021
Detection of Illicit Drug Trafficking Events on Instagram: A Deep
  Multimodal Multilabel Learning Approach
Detection of Illicit Drug Trafficking Events on Instagram: A Deep Multimodal Multilabel Learning Approach
Chuanbo Hu
Minglei Yin
Bin Liu
Xin Li
Yanfang Ye
107
16
0
19 Aug 2021
X-modaler: A Versatile and High-performance Codebase for Cross-modal
  Analytics
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Yehao Li
Yingwei Pan
Jingwen Chen
Ting Yao
Tao Mei
VLM
194
37
0
18 Aug 2021
Who's Waldo? Linking People Across Text and Images
Who's Waldo? Linking People Across Text and Images
Claire Yuqing Cui
Apoorv Khandelwal
Yoav Artzi
Noah Snavely
Hadar Averbuch-Elor
208
21
0
16 Aug 2021
MMChat: Multi-Modal Chat Dataset on Social Media
MMChat: Multi-Modal Chat Dataset on Social Media
Yinhe Zheng
Guanyi Chen
Xin Liu
K. Lin
337
38
0
16 Aug 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and
  Intra-modal Knowledge Integration
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
177
58
0
16 Aug 2021
Video Transformer for Deepfake Detection with Incremental Learning
Video Transformer for Deepfake Detection with Incremental LearningACM Multimedia (ACM MM), 2021
Sohail Ahmed Khan
Hang Dai
ViT
212
78
0
11 Aug 2021
BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease
  Diagnosis
BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis
Masoud Monajatipoor
Mozhdeh Rouhsedaghat
Liunian Harold Li
Aichi Chien
C.-C. Jay Kuo
Fabien Scalzo
Kai-Wei Chang
LM&MAMedIm
105
40
0
10 Aug 2021
Embodied BERT: A Transformer Model for Embodied, Language-guided Visual
  Task Completion
Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion
Alessandro Suglia
Qiaozi Gao
Jesse Thomason
Govind Thattai
Gaurav Sukhatme
LM&Ro
316
84
0
10 Aug 2021
Relation-aware Compositional Zero-shot Learning for Attribute-Object
  Pair Recognition
Relation-aware Compositional Zero-shot Learning for Attribute-Object Pair RecognitionIEEE transactions on multimedia (IEEE Trans. Multimedia), 2021
Ziwei Xu
Guangzhi Wang
Yongkang Wong
Mohan S. Kankanhalli
173
35
0
10 Aug 2021
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language
  Models
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language ModelsIEEE International Conference on Computer Vision (ICCV), 2021
Zheyuan Liu
Cristian Rodriguez-Opazo
Damien Teney
Stephen Gould
VLM
297
300
0
09 Aug 2021
Disentangling Hate in Online Memes
Disentangling Hate in Online MemesACM Multimedia (ACM MM), 2021
Rui Cao
Ziqing Fan
Roy Ka-wei Lee
Wen-Haw Chong
Jing Jiang
199
105
0
09 Aug 2021
Detecting Propaganda Techniques in Memes
Detecting Propaganda Techniques in MemesAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Dimitar Dimitrov
Bishr Bin Ali
Shaden Shaar
Firoj Alam
Fabrizio Silvestri
Hamed Firooz
Preslav Nakov
Giovanni Da San Martino
175
103
0
07 Aug 2021
Interpretable Visual Understanding with Cognitive Attention Network
Interpretable Visual Understanding with Cognitive Attention NetworkInternational Conference on Artificial Neural Networks (ICANN), 2021
Xuejiao Tang
Wenbin Zhang
Yi Yu
Kea Turner
Hanyu Wang
Mengyu Wang
Eirini Ntoutsi
281
19
0
06 Aug 2021
StrucTexT: Structured Text Understanding with Multi-Modal Transformers
StrucTexT: Structured Text Understanding with Multi-Modal TransformersACM Multimedia (ACM MM), 2021
Yulin Li
Yuxi Qian
Yuchen Yu
Xiameng Qin
Chengquan Zhang
Yan Liu
Kun Yao
Junyu Han
Jingtuo Liu
Errui Ding
309
139
0
06 Aug 2021
Fast Convergence of DETR with Spatially Modulated Co-Attention
Fast Convergence of DETR with Spatially Modulated Co-AttentionIEEE International Conference on Computer Vision (ICCV), 2021
Shiyang Feng
Minghang Zheng
Xiaogang Wang
Jifeng Dai
Jiaming Song
ViT
273
373
0
05 Aug 2021
Exploiting BERT For Multimodal Target Sentiment Classification Through
  Input Space Translation
Exploiting BERT For Multimodal Target Sentiment Classification Through Input Space Translation
Zaid Khan
Y. Fu
174
180
0
03 Aug 2021
Representation learning for neural population activity with Neural Data
  Transformers
Representation learning for neural population activity with Neural Data Transformers
Joel Ye
C. Pandarinath
AI4TSAI4CE
397
79
0
02 Aug 2021
StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
Rinon Gal
Or Patashnik
Haggai Maron
Gal Chechik
Daniel Cohen-Or
CLIPVLM
302
281
0
02 Aug 2021
Word2Pix: Word to Pixel Cross Attention Transformer in Visual Grounding
Word2Pix: Word to Pixel Cross Attention Transformer in Visual GroundingIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
Heng Zhao
Qiufeng Wang
Yew-Soon Ong
ObjD
205
33
0
31 Jul 2021
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval
  via Cross-modal Pretraining
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal PretrainingIEEE International Conference on Computer Vision (ICCV), 2021
Xunlin Zhan
Yangxin Wu
Xiao Dong
Yunchao Wei
Minlong Lu
Yichi Zhang
Hang Xu
Xiaodan Liang
ViT
291
79
0
30 Jul 2021
Multimodal Co-learning: Challenges, Applications with Datasets, Recent
  Advances and Future Directions
Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future DirectionsInformation Fusion (Inf. Fusion), 2021
Anil Rahate
Rahee Walambe
S. Ramanna
K. Kotecha
402
176
0
29 Jul 2021
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods
  in Natural Language Processing
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language ProcessingACM Computing Surveys (CSUR), 2021
Pengfei Liu
Weizhe Yuan
Jinlan Fu
Zhengbao Jiang
Hiroaki Hayashi
Graham Neubig
VLMSyDa
795
4,889
0
28 Jul 2021
Exceeding the Limits of Visual-Linguistic Multi-Task Learning
Exceeding the Limits of Visual-Linguistic Multi-Task Learning
Cameron R. Wolfe
Keld T. Lundgaard
VLM
146
3
0
27 Jul 2021
Language Grounding with 3D Objects
Language Grounding with 3D ObjectsConference on Robot Learning (CoRL), 2021
Jesse Thomason
Mohit Shridhar
Yonatan Bisk
Chris Paxton
Luke Zettlemoyer
LM&Ro
220
55
0
26 Jul 2021
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
Spatial-Temporal Transformer for Dynamic Scene Graph GenerationIEEE International Conference on Computer Vision (ICCV), 2021
Yuren Cong
Wentong Liao
H. Ackermann
Bodo Rosenhahn
M. Yang
ViT
317
150
0
26 Jul 2021
Multi-stage Pre-training over Simplified Multimodal Pre-training Models
Multi-stage Pre-training over Simplified Multimodal Pre-training ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Tongtong Liu
Fangxiang Feng
Caixia Yuan
79
16
0
22 Jul 2021
DRDF: Determining the Importance of Different Multimodal Information
  with Dual-Router Dynamic Framework
DRDF: Determining the Importance of Different Multimodal Information with Dual-Router Dynamic FrameworkACM Multimedia (ACM MM), 2021
Haiwen Hong
Xuan Jin
Yin Zhang
Yunqing Hu
Jingfeng Zhang
Yuan He
Hui Xue
MoE
121
0
0
21 Jul 2021
Neural Variational Learning for Grounded Language Acquisition
Neural Variational Learning for Grounded Language AcquisitionIEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2021
Nisha Pillai
Cynthia Matuszek
Francis Ferraro
VLMSSLGANDRL
247
2
0
20 Jul 2021
Neural Abstructions: Abstractions that Support Construction for Grounded
  Language Learning
Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning
Kaylee Burns
Christopher D. Manning
Li Fei-Fei
178
0
0
20 Jul 2021
Separating Skills and Concepts for Novel Visual Question Answering
Separating Skills and Concepts for Novel Visual Question AnsweringComputer Vision and Pattern Recognition (CVPR), 2021
Spencer Whitehead
Hui Wu
Heng Ji
Rogerio Feris
Kate Saenko
CoGe
185
38
0
19 Jul 2021
Constructing Multi-Modal Dialogue Dataset by Replacing Text with
  Semantically Relevant Images
Constructing Multi-Modal Dialogue Dataset by Replacing Text with Semantically Relevant ImagesAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Nyoungwoo Lee
Suwon Shin
Jaegul Choo
Ho‐Jin Choi
S. Myaeng
179
31
0
19 Jul 2021
Align before Fuse: Vision and Language Representation Learning with
  Momentum Distillation
Align before Fuse: Vision and Language Representation Learning with Momentum DistillationNeural Information Processing Systems (NeurIPS), 2021
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
839
2,493
0
16 Jul 2021
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
Paul Pu Liang
Yiwei Lyu
Xiang Fan
Zetian Wu
Yun Cheng
...
Peter Wu
Michelle A. Lee
Yuke Zhu
Ruslan Salakhutdinov
Louis-Philippe Morency
VLM
294
224
0
15 Jul 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
From Show to Tell: A Survey on Deep Learning-based Image CaptioningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DVVLMMLLM
437
355
0
14 Jul 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIPVLMMLLM
515
472
0
13 Jul 2021
FairyTailor: A Multimodal Generative Framework for Storytelling
FairyTailor: A Multimodal Generative Framework for Storytelling
Eden Bensaid
Mauro Martino
Benjamin Hoover
Hendrik Strobelt
LRM
177
24
0
13 Jul 2021
Previous
123...353637...434445
Next
Page 36 of 45
Pageof 45