ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.06165
  4. Cited By
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
v1v2v3v4v5 (latest)

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

European Conference on Computer Vision (ECCV), 2020
13 April 2020
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
Lei Zhang
Lijuan Wang
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
    VLM
ArXiv (abs)PDFHTML

Papers citing "Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks"

21 / 1,171 papers shown
Title
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework
  of Vision-and-Language BERTs
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTsTransactions of the Association for Computational Linguistics (TACL), 2020
Emanuele Bugliarello
Robert Bamler
Naoaki Okazaki
Desmond Elliott
160
124
0
30 Nov 2020
A Recurrent Vision-and-Language BERT for Navigation
A Recurrent Vision-and-Language BERT for NavigationComputer Vision and Pattern Recognition (CVPR), 2020
Yicong Hong
Qi Wu
Yuankai Qi
Cristian Rodriguez-Opazo
Stephen Gould
LM&Ro
222
358
0
26 Nov 2020
Multimodal Learning for Hateful Memes Detection
Multimodal Learning for Hateful Memes Detection
Yi Zhou
Zhenhao Chen
205
71
0
25 Nov 2020
Neuro-Symbolic Representations for Video Captioning: A Case for
  Leveraging Inductive Biases for Vision and Language
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language
Hassan Akbari
Hamid Palangi
Jianwei Yang
Sudha Rao
Asli Celikyilmaz
Roland Fernandez
P. Smolensky
Jianfeng Gao
Shih-Fu Chang
145
3
0
18 Nov 2020
Unsupervised Vision-and-Language Pre-training Without Parallel Images
  and Captions
Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions
Liunian Harold Li
Haoxuan You
Zhecan Wang
Alireza Zareian
Shih-Fu Chang
Kai-Wei Chang
SSLVLM
147
12
0
24 Oct 2020
Multimodal Research in Vision and Language: A Review of Current and
  Emerging Trends
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
221
6
0
19 Oct 2020
Vokenization: Improving Language Understanding with Contextualized,
  Visual-Grounded Supervision
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision
Hao Tan
Joey Tianyi Zhou
CLIP
139
126
0
14 Oct 2020
Attention Guided Semantic Relationship Parsing for Visual Question
  Answering
Attention Guided Semantic Relationship Parsing for Visual Question Answering
M. Farazi
Salman Khan
Nick Barnes
72
3
0
05 Oct 2020
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
Xiaowei Hu
Xi Yin
Kevin Qinghong Lin
Lijuan Wang
Guang Dai
Jianfeng Gao
Zicheng Liu
VLM
163
58
0
28 Sep 2020
Weakly supervised cross-domain alignment with optimal transport
Weakly supervised cross-domain alignment with optimal transport
Siyang Yuan
Ke Bai
Liqun Chen
Yizhe Zhang
Chenyang Tao
Chunyuan Li
Guoyin Wang
Ricardo Henao
Lawrence Carin
OT
98
7
0
14 Aug 2020
Decomposing Generation Networks with Structure Prediction for Recipe
  Generation
Decomposing Generation Networks with Structure Prediction for Recipe GenerationPattern Recognition (Pattern Recognit.), 2020
Hao Wang
Guosheng Lin
Guosheng Lin
Chunyan Miao
73
3
0
27 Jul 2020
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through
  Scene Graph
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
Fei Yu
Jiji Tang
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
275
398
0
30 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
VirTex: Learning Visual Representations from Textual AnnotationsComputer Vision and Pattern Recognition (CVPR), 2020
Karan Desai
Justin Johnson
SSLVLM
344
457
0
11 Jun 2020
Large-Scale Adversarial Training for Vision-and-Language Representation
  Learning
Large-Scale Adversarial Training for Vision-and-Language Representation LearningNeural Information Processing Systems (NeurIPS), 2020
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjDVLM
297
527
0
11 Jun 2020
M3P: Learning Universal Representations via Multitask Multilingual
  Multimodal Pre-training
M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training
Minheng Ni
Haoyang Huang
Lin Su
Edward Cui
Taroon Bharti
Lijuan Wang
Jianfeng Gao
Dongdong Zhang
Nan Duan
154
7
0
04 Jun 2020
TIME: Text and Image Mutual-Translation Adversarial Networks
TIME: Text and Image Mutual-Translation Adversarial NetworksAAAI Conference on Artificial Intelligence (AAAI), 2020
Bingchen Liu
Kunpeng Song
Yizhe Zhu
Gerard de Melo
Ahmed Elgammal
93
34
0
27 May 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation
  Pre-training
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-trainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLMVLMOffRLAI4TS
365
531
0
01 May 2020
XGPT: Cross-modal Generative Pre-Training for Image Captioning
XGPT: Cross-modal Generative Pre-Training for Image CaptioningNatural Language Processing and Chinese Computing (NLPCC), 2020
Qiaolin Xia
Haoyang Huang
Nan Duan
Dongdong Zhang
Lei Ji
Zhifang Sui
Edward Cui
Taroon Bharti
Xin Liu
Ming Zhou
MLLMVLM
167
84
0
03 Mar 2020
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
Accuracy vs. Complexity: A Trade-off in Visual Question Answering ModelsPattern Recognition (Pattern Recognit.), 2020
M. Farazi
Salman H. Khan
Nick Barnes
148
18
0
20 Jan 2020
Explain and Improve: LRP-Inference Fine-Tuning for Image Captioning
  Models
Explain and Improve: LRP-Inference Fine-Tuning for Image Captioning ModelsInformation Fusion (Inf. Fusion), 2020
Jiamei Sun
Sebastian Lapuschkin
Wojciech Samek
Alexander Binder
FAtt
294
34
0
04 Jan 2020
CRIC: A VQA Dataset for Compositional Reasoning on Vision and
  Commonsense
CRIC: A VQA Dataset for Compositional Reasoning on Vision and CommonsenseIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019
Difei Gao
Ruiping Wang
Shiguang Shan
Xilin Chen
CoGeLRM
177
36
0
08 Aug 2019
Previous
123...222324