ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02265
  4. Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
    SSL
    VLM
ArXivPDFHTML

Papers citing "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"

50 / 2,088 papers shown
Title
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual
  Question Answering
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering
Aisha Urooj Khan
Amir Mazaheri
N. Lobo
M. Shah
24
56
0
27 Oct 2020
Beyond VQA: Generating Multi-word Answer and Rationale to Visual
  Questions
Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions
Radhika Dua
Sai Srinivas Kancheti
V. Balasubramanian
LRM
30
22
0
24 Oct 2020
Unsupervised Vision-and-Language Pre-training Without Parallel Images
  and Captions
Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions
Liunian Harold Li
Haoxuan You
Zhecan Wang
Alireza Zareian
Shih-Fu Chang
Kai-Wei Chang
SSL
VLM
64
12
0
24 Oct 2020
Multilingual Speech Translation with Efficient Finetuning of Pretrained
  Models
Multilingual Speech Translation with Efficient Finetuning of Pretrained Models
Xian Li
Changhan Wang
Yun Tang
C. Tran
Yuqing Tang
J. Pino
Alexei Baevski
Alexis Conneau
Michael Auli
19
6
0
24 Oct 2020
Can images help recognize entities? A study of the role of images for
  Multimodal NER
Can images help recognize entities? A study of the role of images for Multimodal NER
Shuguang Chen
Gustavo Aguilar
Leonardo Neves
Thamar Solorio
EgoV
37
33
0
23 Oct 2020
GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight
  Gated Injection Method
GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight Gated Injection Method
Nicole Peinelt
Marek Rei
Maria Liakata
14
2
0
23 Oct 2020
ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken
  Language Understanding
ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken Language Understanding
Minjeong Kim
Gyuwan Kim
Sang-Woo Lee
Jung-Woo Ha
VLM
24
34
0
23 Oct 2020
Language-Conditioned Imitation Learning for Robot Manipulation Tasks
Language-Conditioned Imitation Learning for Robot Manipulation Tasks
Simon Stepputtis
Joseph Campbell
Mariano Phielipp
Stefan Lee
Chitta Baral
H. B. Amor
LM&Ro
114
192
0
22 Oct 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
20
39,174
0
22 Oct 2020
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing
  Functional Entropies
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies
Itai Gat
Idan Schwartz
A. Schwing
Tamir Hazan
51
89
0
21 Oct 2020
Multimodal Research in Vision and Language: A Review of Current and
  Emerging Trends
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
18
6
0
19 Oct 2020
Towards Data Distillation for End-to-end Spoken Conversational Question
  Answering
Towards Data Distillation for End-to-end Spoken Conversational Question Answering
Chenyu You
Nuo Chen
Fenglin Liu
Dongchao Yang
Yuexian Zou
6
45
0
18 Oct 2020
Knowledge-Grounded Dialogue Generation with Pre-trained Language Models
Knowledge-Grounded Dialogue Generation with Pre-trained Language Models
Xueliang Zhao
Wei Yu Wu
Can Xu
Chongyang Tao
Dongyan Zhao
Rui Yan
175
191
0
17 Oct 2020
Answer-checking in Context: A Multi-modal FullyAttention Network for
  Visual Question Answering
Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering
Hantao Huang
Tao Han
Wei Han
D. Yap
Cheng-Ming Chiang
13
2
0
17 Oct 2020
Unsupervised Natural Language Inference via Decoupled Multimodal
  Contrastive Learning
Unsupervised Natural Language Inference via Decoupled Multimodal Contrastive Learning
Wanyun Cui
Guangyu Zheng
Wei Wang
SSL
14
21
0
16 Oct 2020
Natural Language Rationales with Full-Stack Visual Reasoning: From
  Pixels to Semantic Frames to Commonsense Graphs
Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs
Ana Marasović
Chandra Bhagavatula
J. S. Park
Ronan Le Bras
Noah A. Smith
Yejin Choi
ReLM
LRM
18
62
0
15 Oct 2020
Vokenization: Improving Language Understanding with Contextualized,
  Visual-Grounded Supervision
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision
Hao Tan
Mohit Bansal
CLIP
11
120
0
14 Oct 2020
A Multi-Modal Method for Satire Detection using Textual and Visual Cues
A Multi-Modal Method for Satire Detection using Textual and Visual Cues
Lily Li
Or Levi
Pedram Hosseini
David A. Broniatowski
9
21
0
13 Oct 2020
CAPT: Contrastive Pre-Training for Learning Denoised Sequence
  Representations
CAPT: Contrastive Pre-Training for Learning Denoised Sequence Representations
Fuli Luo
Pengcheng Yang
Shicheng Li
Xuancheng Ren
Xu Sun
VLM
SSL
10
16
0
13 Oct 2020
Contrast and Classify: Training Robust VQA Models
Contrast and Classify: Training Robust VQA Models
Yash Kant
A. Moudgil
Dhruv Batra
Devi Parikh
Harsh Agrawal
19
5
0
13 Oct 2020
Webly Supervised Image Classification with Metadata: Automatic Noisy
  Label Correction via Visual-Semantic Graph
Webly Supervised Image Classification with Metadata: Automatic Noisy Label Correction via Visual-Semantic Graph
Jingkang Yang
Weirong Chen
Litong Feng
Xiaopeng Yan
Huabin Zheng
Wayne Zhang
NoLa
19
13
0
12 Oct 2020
Beyond Language: Learning Commonsense from Images for Reasoning
Beyond Language: Learning Commonsense from Images for Reasoning
Wanqing Cui
Yanyan Lan
Liang Pang
Jiafeng Guo
Xueqi Cheng
LRM
13
5
0
10 Oct 2020
comp-syn: Perceptually Grounded Word Embeddings with Color
comp-syn: Perceptually Grounded Word Embeddings with Color
Bhargav Srinivasa Desikan
Tasker Hull
E. Nadler
Douglas Guilbeault
Aabir Abubaker Kar
Mark Chu
Donald Ruggiero Lo Sardo
6
7
0
08 Oct 2020
ALFWorld: Aligning Text and Embodied Environments for Interactive
  Learning
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
Mohit Shridhar
Xingdi Yuan
Marc-Alexandre Côté
Yonatan Bisk
Adam Trischler
Matthew J. Hausknecht
LM&Ro
LLMAG
16
392
0
08 Oct 2020
Multi-label classification of promotions in digital leaflets using
  textual and visual information
Multi-label classification of promotions in digital leaflets using textual and visual information
R. Arroyo
David Jiménez-Cabello
Javier Martínez-Cebrián
6
3
0
07 Oct 2020
ZEST: Zero-shot Learning from Text Descriptions using Textual Similarity
  and Visual Summarization
ZEST: Zero-shot Learning from Text Descriptions using Textual Similarity and Visual Summarization
Tzuf Paz-Argaman
Y. Atzmon
Gal Chechik
Reut Tsarfaty
VLM
16
32
0
07 Oct 2020
Learning to Represent Image and Text with Denotation Graph
Learning to Represent Image and Text with Denotation Graph
Bowen Zhang
Hexiang Hu
Vihan Jain
Eugene Ie
Fei Sha
6
21
0
06 Oct 2020
Support-set bottlenecks for video-text representation learning
Support-set bottlenecks for video-text representation learning
Mandela Patrick
Po-Yao (Bernie) Huang
Yuki M. Asano
Florian Metze
Alexander G. Hauptmann
João Henriques
Andrea Vedaldi
20
242
0
06 Oct 2020
Pathological Visual Question Answering
Pathological Visual Question Answering
Xuehai He
Zhuo Cai
Wenlan Wei
Yichen Zhang
Luntian Mou
Eric P. Xing
P. Xie
62
24
0
06 Oct 2020
Attention Guided Semantic Relationship Parsing for Visual Question
  Answering
Attention Guided Semantic Relationship Parsing for Visual Question Answering
M. Farazi
Salman Khan
Nick Barnes
11
2
0
05 Oct 2020
Multi-Modal Open-Domain Dialogue
Multi-Modal Open-Domain Dialogue
Kurt Shuster
Eric Michael Smith
Da Ju
Jason Weston
AI4CE
28
42
0
02 Oct 2020
Contrastive Learning of Medical Visual Representations from Paired
  Images and Text
Contrastive Learning of Medical Visual Representations from Paired Images and Text
Yuhao Zhang
Hang Jiang
Yasuhide Miura
Christopher D. Manning
C. Langlotz
MedIm
13
730
0
02 Oct 2020
Learning Object Detection from Captions via Textual Scene Attributes
Learning Object Detection from Captions via Textual Scene Attributes
Achiya Jerbi
Roei Herzig
Jonathan Berant
Gal Chechik
Amir Globerson
22
21
0
30 Sep 2020
GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing
GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing
Tao Yu
Chien-Sheng Wu
Xi Victoria Lin
Bailin Wang
Y. Tan
Xinyi Yang
Dragomir R. Radev
R. Socher
Caiming Xiong
LMTD
11
247
0
29 Sep 2020
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
Xiaowei Hu
Xi Yin
Kevin Qinghong Lin
Lijuan Wang
L. Zhang
Jianfeng Gao
Zicheng Liu
VLM
8
56
0
28 Sep 2020
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal
  Transformers
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
Jaemin Cho
Jiasen Lu
Dustin Schwenk
Hannaneh Hajishirzi
Aniruddha Kembhavi
VLM
MLLM
19
102
0
23 Sep 2020
Preserving Integrity in Online Social Networks
Preserving Integrity in Online Social Networks
A. Halevy
Cristian Canton Ferrer
Hao Ma
Umut Ozertem
Patrick Pantel
Marzieh Saeidi
Fabrizio Silvestri
Ves Stoyanov
17
57
0
22 Sep 2020
MUTANT: A Training Paradigm for Out-of-Distribution Generalization in
  Visual Question Answering
MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
OOD
14
139
0
18 Sep 2020
A Multimodal Memes Classification: A Survey and Open Research Issues
A Multimodal Memes Classification: A Survey and Open Research Issues
Tariq Habib Afridi
A. Alam
Muhammad Numan Khan
Jawad Khan
Young-Koo Lee
19
34
0
17 Sep 2020
Multi-modal Summarization for Video-containing Documents
Multi-modal Summarization for Video-containing Documents
Xiyan Fu
Jun Wang
Zhenglu Yang
20
23
0
17 Sep 2020
Machine Learning for Temporal Data in Finance: Challenges and
  Opportunities
Machine Learning for Temporal Data in Finance: Challenges and Opportunities
J. Wittenbach
Learning McLean
Virginia Brian
AI4TS
4
1
0
11 Sep 2020
Denoising Large-Scale Image Captioning from Alt-text Data using Content
  Selection Models
Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models
Khyathi Raghavi Chandu
Piyush Sharma
Soravit Changpinyo
Ashish V. Thapliyal
Radu Soricut
DiffM
VLM
21
3
0
10 Sep 2020
Investigating Gender Bias in BERT
Investigating Gender Bias in BERT
Rishabh Bhardwaj
Navonil Majumder
Soujanya Poria
25
105
0
10 Sep 2020
Visual Relationship Detection with Visual-Linguistic Knowledge from
  Multimodal Representations
Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations
Meng-Jiun Chiou
Roger Zimmermann
Jiashi Feng
13
1
0
10 Sep 2020
Video Moment Retrieval via Natural Language Queries
Xinli Yu
Mohsen Malmir
C. He
Yue Liu
Rex Wu
9
1
0
04 Sep 2020
A Comparison of Pre-trained Vision-and-Language Models for Multimodal
  Representation Learning across Medical Images and Reports
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports
Yikuan Li
Hanyin Wang
Yuan Luo
6
63
0
03 Sep 2020
Practical Cross-modal Manifold Alignment for Grounded Language
Practical Cross-modal Manifold Alignment for Grounded Language
A. Nguyen
Luke E. Richards
Gaoussou Youssouf Kebe
Edward Raff
Kasra Darvish
Frank Ferraro
Cynthia Matuszek
6
4
0
01 Sep 2020
Active Contrastive Learning of Audio-Visual Video Representations
Active Contrastive Learning of Audio-Visual Video Representations
Shuang Ma
Zhaoyang Zeng
Daniel J. McDuff
Yale Song
VLM
SSL
16
8
0
31 Aug 2020
A Survey of Visual Analytics Techniques for Machine Learning
A Survey of Visual Analytics Techniques for Machine Learning
Jun Yuan
Changjian Chen
Weikai Yang
Mengchen Liu
Jiazhi Xia
Shixia Liu
19
215
0
21 Aug 2020
Linguistically-aware Attention for Reducing the Semantic-Gap in
  Vision-Language Tasks
Linguistically-aware Attention for Reducing the Semantic-Gap in Vision-Language Tasks
K. Gouthaman
Athira M. Nambiar
K. Srinivas
Anurag Mittal
VLM
19
12
0
18 Aug 2020
Previous
123...3839404142
Next