ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02265
  4. Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
    SSL
    VLM
ArXivPDFHTML

Papers citing "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"

50 / 2,088 papers shown
Title
Align before Fuse: Vision and Language Representation Learning with
  Momentum Distillation
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq R. Joty
Caiming Xiong
S. Hoi
FaML
53
1,884
0
16 Jul 2021
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
Paul Pu Liang
Yiwei Lyu
Xiang Fan
Zetian Wu
Yun Cheng
...
Peter Wu
Michelle A. Lee
Yuke Zhu
Ruslan Salakhutdinov
Louis-Philippe Morency
VLM
29
158
0
15 Jul 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
58
254
0
14 Jul 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Mohit Bansal
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
196
405
0
13 Jul 2021
FairyTailor: A Multimodal Generative Framework for Storytelling
FairyTailor: A Multimodal Generative Framework for Storytelling
Eden Bensaid
Mauro Martino
Benjamin Hoover
Hendrik Strobelt
LRM
18
17
0
13 Jul 2021
End-to-end Multi-modal Video Temporal Grounding
End-to-end Multi-modal Video Temporal Grounding
Yi-Wen Chen
Yi-Hsuan Tsai
Ming-Hsuan Yang
11
51
0
12 Jul 2021
MECT: Multi-Metadata Embedding based Cross-Transformer for Chinese Named
  Entity Recognition
MECT: Multi-Metadata Embedding based Cross-Transformer for Chinese Named Entity Recognition
Shuang Wu
Xiaoning Song
Zhenhua Feng
30
113
0
12 Jul 2021
BERT-like Pre-training for Symbolic Piano Music Classification Tasks
BERT-like Pre-training for Symbolic Piano Music Classification Tasks
Yi-Hui Chou
I-Chun Chen
Chin-Jui Chang
Joann Ching
Yi-Hsuan Yang
30
25
0
12 Jul 2021
Zero-Shot Compositional Concept Learning
Zero-Shot Compositional Concept Learning
Guangyue Xu
Parisa Kordjamshidi
J. Chai
CoGe
81
19
0
12 Jul 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
45
5,044
0
07 Jul 2021
Deep Learning for Embodied Vision Navigation: A Survey
Deep Learning for Embodied Vision Navigation: A Survey
Fengda Zhu
Yi Zhu
Vincent CS Lee
Xiaodan Liang
Xiaojun Chang
EgoV
LM&Ro
34
0
0
07 Jul 2021
VidLanKD: Improving Language Understanding via Video-Distilled Knowledge
  Transfer
VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer
Zineng Tang
Jaemin Cho
Hao Tan
Mohit Bansal
VLM
30
29
0
06 Jul 2021
PhotoChat: A Human-Human Dialogue Dataset with Photo Sharing Behavior
  for Joint Image-Text Modeling
PhotoChat: A Human-Human Dialogue Dataset with Photo Sharing Behavior for Joint Image-Text Modeling
Xiaoxue Zang
Lijuan Liu
Maria Wang
Yang Song
Hao Zhang
Jindong Chen
VLM
21
55
0
06 Jul 2021
Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory
Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory
Xuejiao Tang
Xin Huang
Wenbin Zhang
T. Child
Qiong Hu
Zhen Liu
Ji Zhang
LRM
19
18
0
04 Jul 2021
Target-dependent UNITER: A Transformer-Based Multimodal Language
  Comprehension Model for Domestic Service Robots
Target-dependent UNITER: A Transformer-Based Multimodal Language Comprehension Model for Domestic Service Robots
Shintaro Ishikawa
K. Sugiura
23
10
0
02 Jul 2021
Case Relation Transformer: A Crossmodal Language Generation Model for
  Fetching Instructions
Case Relation Transformer: A Crossmodal Language Generation Model for Fetching Instructions
Motonari Kambara
K. Sugiura
ViT
16
6
0
02 Jul 2021
Productivity, Portability, Performance: Data-Centric Python
Productivity, Portability, Performance: Data-Centric Python
Yiheng Wang
Yao Zhang
Yanzhang Wang
Yan Wan
Jiao Wang
Zhongyuan Wu
Yuhao Yang
Bowen She
52
94
0
01 Jul 2021
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and
  Generation
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Jing Liu
Xinxin Zhu
Fei Liu
Longteng Guo
Zijia Zhao
...
Weining Wang
Hanqing Lu
Shiyu Zhou
Jiajun Zhang
Jinqiao Wang
31
37
0
01 Jul 2021
Attention Bottlenecks for Multimodal Fusion
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
25
541
0
30 Jun 2021
The Values Encoded in Machine Learning Research
The Values Encoded in Machine Learning Research
Abeba Birhane
Pratyusha Kalluri
Dallas Card
William Agnew
Ravit Dotan
Michelle Bao
25
273
0
29 Jun 2021
Adventurer's Treasure Hunt: A Transparent System for Visually Grounded
  Compositional Visual Question Answering based on Scene Graphs
Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs
Daniel Reich
F. Putze
Tanja Schultz
22
2
0
28 Jun 2021
UMIC: An Unreferenced Metric for Image Captioning via Contrastive
  Learning
UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning
Hwanhee Lee
Seunghyun Yoon
Franck Dernoncourt
Trung Bui
Kyomin Jung
VLM
19
44
0
26 Jun 2021
Core Challenges in Embodied Vision-Language Planning
Core Challenges in Embodied Vision-Language Planning
Jonathan M Francis
Nariaki Kitamura
Felix Labelle
Xiaopeng Lu
Ingrid Navarro
Jean Oh
LM&Ro
44
45
0
26 Jun 2021
Multimodal Few-Shot Learning with Frozen Language Models
Multimodal Few-Shot Learning with Frozen Language Models
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
53
749
0
25 Jun 2021
Probing Inter-modality: Visual Parsing with Self-Attention for
  Vision-Language Pre-training
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Hongwei Xue
Yupan Huang
Bei Liu
Houwen Peng
Jianlong Fu
Houqiang Li
Jiebo Luo
22
88
0
25 Jun 2021
A Picture May Be Worth a Hundred Words for Visual Question Answering
A Picture May Be Worth a Hundred Words for Visual Question Answering
Yusuke Hirota
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
Ittetsu Taniguchi
Takao Onoye
ViT
8
5
0
25 Jun 2021
iReason: Multimodal Commonsense Reasoning using Videos and Natural
  Language with Interpretability
iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability
Andrew Wang
Aman Chadha
CML
11
5
0
25 Jun 2021
A Transformer-based Cross-modal Fusion Model with Adversarial Training
  for VQA Challenge 2021
A Transformer-based Cross-modal Fusion Model with Adversarial Training for VQA Challenge 2021
Keda Lu
Bo Fang
Kuan-Yu Chen
ViT
14
2
0
24 Jun 2021
DocFormer: End-to-End Transformer for Document Understanding
DocFormer: End-to-End Transformer for Document Understanding
Srikar Appalaraju
Bhavan A. Jasani
Bhargava Urala Kota
Yusheng Xie
R. Manmatha
ViT
27
270
0
22 Jun 2021
Towards Long-Form Video Understanding
Towards Long-Form Video Understanding
Chaoxia Wu
Philipp Krahenbuhl
VLM
ViT
36
165
0
21 Jun 2021
GEM: A General Evaluation Benchmark for Multimodal Tasks
GEM: A General Evaluation Benchmark for Multimodal Tasks
Lin Su
Nan Duan
Edward Cui
Lei Ji
Chenfei Wu
Huaishao Luo
Yongfei Liu
Ming Zhong
Taroon Bharti
Arun Sacheti
VLM
19
19
0
18 Jun 2021
Efficient Self-supervised Vision Transformers for Representation
  Learning
Efficient Self-supervised Vision Transformers for Representation Learning
Chunyuan Li
Jianwei Yang
Pengchuan Zhang
Mei Gao
Bin Xiao
Xiyang Dai
Lu Yuan
Jianfeng Gao
ViT
32
209
0
17 Jun 2021
Probing Image-Language Transformers for Verb Understanding
Probing Image-Language Transformers for Verb Understanding
Lisa Anne Hendricks
Aida Nematzadeh
25
114
0
16 Jun 2021
A Fair and Comprehensive Comparison of Multimodal Tweet Sentiment
  Analysis Methods
A Fair and Comprehensive Comparison of Multimodal Tweet Sentiment Analysis Methods
Gullal Singh Cheema
Sherzod Hakimov
Eric Müller-Budack
Ralph Ewerth
14
19
0
16 Jun 2021
Vision-Language Navigation with Random Environmental Mixup
Vision-Language Navigation with Random Environmental Mixup
Chong Liu
Fengda Zhu
Xiaojun Chang
Xiaodan Liang
Zongyuan Ge
Yi-Dong Shen
LM&Ro
48
86
0
15 Jun 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
37
813
0
14 Jun 2021
Assessing Multilingual Fairness in Pre-trained Multimodal
  Representations
Assessing Multilingual Fairness in Pre-trained Multimodal Representations
Jialu Wang
Yang Liu
X. Wang
EGVM
23
35
0
12 Jun 2021
Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object
  Localization
Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization
Ludan Ruan
Jieting Chen
Yuqing Song
Shizhe Chen
Qin Jin
13
0
0
11 Jun 2021
ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural
  Language Generation
ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation
Wanrong Zhu
X. Wang
An Yan
M. Eckstein
W. Wang
16
7
0
10 Jun 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
8
274
0
09 Jun 2021
Bayesian Attention Belief Networks
Bayesian Attention Belief Networks
Shujian Zhang
Xinjie Fan
Bo Chen
Mingyuan Zhou
BDL
22
30
0
09 Jun 2021
PAM: Understanding Product Images in Cross Product Category Attribute
  Extraction
PAM: Understanding Product Images in Cross Product Category Attribute Extraction
Rongmei Lin
Xiang He
J. Feng
Nasser Zalmout
Yan Liang
Li Xiong
Xin Luna Dong
17
35
0
08 Jun 2021
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Tianlong Chen
Yu Cheng
Zhe Gan
Lu Yuan
Lei Zhang
Zhangyang Wang
ViT
13
216
0
08 Jun 2021
BERTGEN: Multi-task Generation through BERT
BERTGEN: Multi-task Generation through BERT
Faidon Mitzalis
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
VLM
19
7
0
07 Jun 2021
SelfDoc: Self-Supervised Document Representation Learning
SelfDoc: Self-Supervised Document Representation Learning
Peizhao Li
Jiuxiang Gu
Jason Kuen
Vlad I. Morariu
Handong Zhao
R. Jain
Varun Manjunatha
Hongfu Liu
ViT
SSL
14
158
0
07 Jun 2021
Oriented Object Detection with Transformer
Oriented Object Detection with Transformer
Teli Ma
Mingyuan Mao
Honghui Zheng
Peng Gao
Xiaodi Wang
Shumin Han
Errui Ding
Baochang Zhang
David Doermann
ViT
14
39
0
06 Jun 2021
Referring Transformer: A One-step Approach to Multi-task Visual
  Grounding
Referring Transformer: A One-step Approach to Multi-task Visual Grounding
Muchen Li
Leonid Sigal
ObjD
10
187
0
06 Jun 2021
MERLOT: Multimodal Neural Script Knowledge Models
MERLOT: Multimodal Neural Script Knowledge Models
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLM
LRM
22
372
0
04 Jun 2021
Human-Adversarial Visual Question Answering
Human-Adversarial Visual Question Answering
Sasha Sheng
Amanpreet Singh
Vedanuj Goswami
Jose Alberto Lopez Magana
Wojciech Galuba
Devi Parikh
Douwe Kiela
OOD
EgoV
AAML
18
60
0
04 Jun 2021
Scalable Transformers for Neural Machine Translation
Scalable Transformers for Neural Machine Translation
Peng Gao
Shijie Geng
Yu Qiao
Xiaogang Wang
Jifeng Dai
Hongsheng Li
28
13
0
04 Jun 2021
Previous
123...333435...404142
Next