ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.06165
  4. Cited By
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
v1v2v3v4v5 (latest)

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

European Conference on Computer Vision (ECCV), 2020
13 April 2020
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
Lei Zhang
Lijuan Wang
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
    VLM
ArXiv (abs)PDFHTML

Papers citing "Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks"

50 / 1,171 papers shown
Title
Training on Synthetic Data Beats Real Data in Multimodal Relation
  Extraction
Training on Synthetic Data Beats Real Data in Multimodal Relation Extraction
Zilin Du
Haoxin Li
Xu Guo
Boyang Li
191
2
0
05 Dec 2023
How to Configure Good In-Context Sequence for Visual Question Answering
How to Configure Good In-Context Sequence for Visual Question AnsweringComputer Vision and Pattern Recognition (CVPR), 2023
Li Li
Jiawei Peng
Huiyi Chen
Chongyang Gao
Xu Yang
MLLM
156
35
0
04 Dec 2023
Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image
  Captioning
Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image CaptioningIEEE Transactions on Geoscience and Remote Sensing (TGRS), 2023
Cong Yang
Zuchao Li
Lefei Zhang
99
49
0
02 Dec 2023
LightCLIP: Learning Multi-Level Interaction for Lightweight
  Vision-Language Models
LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models
Ying Nie
Wei He
Kai Han
Yehui Tang
Tianyu Guo
Fanyi Du
Yunhe Wang
VLM
144
5
0
01 Dec 2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware
  representations to LLMs and Emergent Cross-modal Reasoning
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning
Artemis Panagopoulou
Le Xue
Ning Yu
Junnan Li
Dongxu Li
Shafiq Joty
Ran Xu
Silvio Savarese
Caiming Xiong
Juan Carlos Niebles
VLMMLLM
225
68
0
30 Nov 2023
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with
  Semantic Vector-Quantized Tokenizer
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Jacob Zhiyuan Fang
Skyler Zheng
Vasu Sharma
Robinson Piramuthu
VLM
220
1
0
28 Nov 2023
Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf
  Vision-Language Models
Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language ModelsComputer Vision and Pattern Recognition (CVPR), 2023
Jiayun Luo
Siddhesh Khandelwal
Leonid Sigal
Boyang Albert Li
MLLMVLM
364
11
0
28 Nov 2023
Compositional Chain-of-Thought Prompting for Large Multimodal Models
Compositional Chain-of-Thought Prompting for Large Multimodal ModelsComputer Vision and Pattern Recognition (CVPR), 2023
Chancharik Mitra
Brandon Huang
Trevor Darrell
Roei Herzig
MLLMLRM
233
147
0
27 Nov 2023
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning
  Benchmark for Expert AGI
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGIComputer Vision and Pattern Recognition (CVPR), 2023
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
...
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLMELMVLM
633
1,413
0
27 Nov 2023
EVCap: Retrieval-Augmented Image Captioning with External Visual-Name
  Memory for Open-World Comprehension
EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World ComprehensionComputer Vision and Pattern Recognition (CVPR), 2023
Jiaxuan Li
D. Vo
Akihiro Sugimoto
Hideki Nakayama
KELMVLM
205
35
0
27 Nov 2023
Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage
  and Sharing in LLMs
Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs
Yunxin Li
Baotian Hu
Wei Wang
Xiaochun Cao
Min Zhang
96
6
0
27 Nov 2023
Align before Adapt: Leveraging Entity-to-Region Alignments for
  Generalizable Video Action Recognition
Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2023
Yifei Chen
Dapeng Chen
Ruijin Liu
Sai Zhou
Wenyuan Xue
Wei Peng
204
12
0
27 Nov 2023
Vamos: Versatile Action Models for Video Understanding
Vamos: Versatile Action Models for Video UnderstandingEuropean Conference on Computer Vision (ECCV), 2023
Shijie Wang
Qi Zhao
Minh Quan Do
Nakul Agarwal
Kwonjoon Lee
Chen Sun
282
32
0
22 Nov 2023
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided
  Code-Vision Representation
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision RepresentationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yangyi Chen
Xingyao Wang
Pengfei Yu
Derek Hoiem
Heng Ji
164
13
0
22 Nov 2023
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with
  Spatial Relation Matching
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation MatchingEuropean Conference on Computer Vision (ECCV), 2023
Meng Chu
Zhedong Zheng
Wei Ji
Tingyu Wang
Tat-Seng Chua
159
20
0
21 Nov 2023
Open-Vocabulary Camouflaged Object Segmentation
Open-Vocabulary Camouflaged Object Segmentation
Youwei Pang
Xiaoqi Zhao
Jiaming Zuo
Lihe Zhang
Huchuan Lu
VLMObjD
223
11
0
19 Nov 2023
DRESS: Instructing Large Vision-Language Models to Align and Interact
  with Humans via Natural Language Feedback
DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
Yangyi Chen
Karan Sikka
Michael Cogswell
Heng Ji
Ajay Divakaran
285
94
0
16 Nov 2023
AI Recommendation System for Enhanced Customer Experience: A Novel
  Image-to-Text Method
AI Recommendation System for Enhanced Customer Experience: A Novel Image-to-Text Method
Mohamaed Foued Ayedi
Hiba Ben Salem
Soulaimen Hammami
Ahmed Ben Said
Rateb Jabbar
Achraf Chabbouh
188
4
0
16 Nov 2023
Contrastive Transformer Learning with Proximity Data Generation for
  Text-Based Person Search
Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search
Hefeng Wu
Weifeng Chen
Zhibin Liu
Tianshui Chen
Zhiguang Chen
Liang Lin
144
18
0
15 Nov 2023
Fast Certification of Vision-Language Models Using Incremental
  Randomized Smoothing
Fast Certification of Vision-Language Models Using Incremental Randomized Smoothing
Ashutosh Nirala
Ameya Joshi
Chinmay Hegde
S Sarkar
VLM
208
0
0
15 Nov 2023
Violet: A Vision-Language Model for Arabic Image Captioning with Gemini
  Decoder
Violet: A Vision-Language Model for Arabic Image Captioning with Gemini Decoder
Abdelrahman Mohamed
Fakhraddin Alwajih
El Moatez Billah Nagoudi
Alcides Alcoba Inciarte
Muhammad Abdul-Mageed
VLMMLLM
99
11
0
15 Nov 2023
Correlation-Guided Query-Dependency Calibration for Video Temporal
  Grounding
Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
WonJun Moon
Sangeek Hyun
Subeen Lee
Jae-Pil Heo
176
12
0
15 Nov 2023
Improving Hateful Meme Detection through Retrieval-Guided Contrastive
  Learning
Improving Hateful Meme Detection through Retrieval-Guided Contrastive LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Jingbiao Mei
Jinghong Chen
Weizhe Lin
Bill Byrne
Marcus Tomalin
VLM
113
15
0
14 Nov 2023
Improving Vision-and-Language Reasoning via Spatial Relations Modeling
Improving Vision-and-Language Reasoning via Spatial Relations Modeling
Cheng Yang
Rui Xu
Ye Guo
Peixiang Huang
Yiru Chen
Wenkui Ding
Zhongyuan Wang
Hong Zhou
LRM
120
7
0
09 Nov 2023
Multitask Multimodal Prompted Training for Interactive Embodied Task
  Completion
Multitask Multimodal Prompted Training for Interactive Embodied Task CompletionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Georgios Pantazopoulos
Malvina Nikandrou
Amit Parekh
Bhathiya Hemanthage
Arash Eshghi
Ioannis Konstas
Verena Rieser
Oliver Lemon
Alessandro Suglia
LM&Ro
136
9
0
07 Nov 2023
Meta-Adapter: An Online Few-shot Learner for Vision-Language Model
Meta-Adapter: An Online Few-shot Learner for Vision-Language ModelNeural Information Processing Systems (NeurIPS), 2023
Cheng Cheng
Lin Song
Ruoyi Xue
Hang Wang
Hongbin Sun
Yixiao Ge
Ying Shan
VLMObjD
230
42
0
07 Nov 2023
MetaReVision: Meta-Learning with Retrieval for Visually Grounded
  Compositional Concept Acquisition
MetaReVision: Meta-Learning with Retrieval for Visually Grounded Compositional Concept AcquisitionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Guangyue Xu
Parisa Kordjamshidi
Joyce Chai
117
2
0
02 Nov 2023
CROMA: Remote Sensing Representations with Contrastive Radar-Optical
  Masked Autoencoders
CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked AutoencodersNeural Information Processing Systems (NeurIPS), 2023
A. Fuller
K. Millard
James R. Green
167
116
0
01 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering
  (VQA) Approaches, Challenges, and Opportunities
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and OpportunitiesInformation Fusion (Inf. Fusion), 2023
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
264
64
0
01 Nov 2023
Class Incremental Learning with Pre-trained Vision-Language Models
Class Incremental Learning with Pre-trained Vision-Language Models
Xialei Liu
Xusheng Cao
Haori Lu
Jia-Wen Xiao
Andrew D. Bagdanov
Ming-Ming Cheng
VLM
200
16
0
31 Oct 2023
Women Wearing Lipstick: Measuring the Bias Between an Object and Its
  Related Gender
Women Wearing Lipstick: Measuring the Bias Between an Object and Its Related GenderConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ahmed Sabir
Lluís Padró
167
1
0
29 Oct 2023
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
3D-Aware Visual Question Answering about Parts, Poses and OcclusionsNeural Information Processing Systems (NeurIPS), 2023
Xingrui Wang
Wufei Ma
Zhuowan Li
Adam Kortylewski
Yaoyao Liu
CoGe
178
17
0
27 Oct 2023
Prompt Me Up: Unleashing the Power of Alignments for Multimodal Entity
  and Relation Extraction
Prompt Me Up: Unleashing the Power of Alignments for Multimodal Entity and Relation ExtractionACM Multimedia (ACM MM), 2023
Xuming Hu
Junzhe Chen
Aiwei Liu
Shiao Meng
Lijie Wen
Philip S. Yu
141
25
0
25 Oct 2023
UniMAP: Universal SMILES-Graph Representation Learning
UniMAP: Universal SMILES-Graph Representation Learning
Shikun Feng
Lixin Yang
Wei-Ying Ma
Yanyan Lan
OffRL
122
9
0
22 Oct 2023
Multiscale Superpixel Structured Difference Graph Convolutional Network
  for VL Representation
Multiscale Superpixel Structured Difference Graph Convolutional Network for VL Representation
Siyu Zhang
Ye-Ting Chen
Fang Wang
Yaoru Sun
Jun Yang
Lizhi Bai
SSL
193
0
0
20 Oct 2023
InvGC: Robust Cross-Modal Retrieval by Inverse Graph Convolution
InvGC: Robust Cross-Modal Retrieval by Inverse Graph Convolution
Xiangru Jian
Yimu Wang
147
6
0
20 Oct 2023
Multi-level Contrastive Learning for Script-based Character
  Understanding
Multi-level Contrastive Learning for Script-based Character Understanding
Dawei Li
Hengyuan Zhang
Yanran Li
Shiping Yang
164
17
0
20 Oct 2023
PrivImage: Differentially Private Synthetic Image Generation using
  Diffusion Models with Semantic-Aware Pretraining
PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining
Kecen Li
Chen Gong
Zhixiang Li
Yuzhong Zhao
Xinwen Hou
Tianhao Wang
254
16
0
19 Oct 2023
Grounded and Well-rounded: A Methodological Approach to the Study of
  Cross-modal and Cross-lingual Grounding
Grounded and Well-rounded: A Methodological Approach to the Study of Cross-modal and Cross-lingual GroundingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Timothee Mickus
Elaine Zosa
Denis Paperno
110
0
0
18 Oct 2023
Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and
  Gallery Banks
Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery BanksConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yimu Wang
Xiangru Jian
Bo Xue
126
20
0
17 Oct 2023
Matrix Compression via Randomized Low Rank and Low Precision
  Factorization
Matrix Compression via Randomized Low Rank and Low Precision FactorizationNeural Information Processing Systems (NeurIPS), 2023
R. Saha
Varun Srivastava
Mert Pilanci
177
30
0
17 Oct 2023
UNK-VQA: A Dataset and a Probe into the Abstention Ability of
  Multi-modal Large Models
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yanyang Guo
Fangkai Jiao
Zhiqi Shen
Liqiang Nie
Mohan S. Kankanhalli
MLLM
248
10
0
17 Oct 2023
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
PELA: Learning Parameter-Efficient Models with Low-Rank ApproximationComputer Vision and Pattern Recognition (CVPR), 2023
Yangyang Guo
Guangzhi Wang
Mohan S. Kankanhalli
153
7
0
16 Oct 2023
Few-shot Action Recognition with Captioning Foundation Models
Few-shot Action Recognition with Captioning Foundation Models
Xiang Wang
Shiwei Zhang
Hangjie Yuan
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
VLM
207
9
0
16 Oct 2023
CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes
CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes
Yulei Qin
Xingyu Chen
Chunjiang Ge
Chaoyou Fu
Yun Gu
Ke Li
Xing Sun
Rongrong Ji
148
3
0
15 Oct 2023
JM3D & JM3D-LLM: Elevating 3D Understanding with Joint Multi-modal Cues
JM3D & JM3D-LLM: Elevating 3D Understanding with Joint Multi-modal Cues
Jiayi Ji
Haowei Wang
Changli Wu
Yiwei Ma
Xiaoshuai Sun
Rongrong Ji
216
1
0
14 Oct 2023
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
Yueming Lyu
Kang Zhao
Bo Peng
H. Chen
Yue Jiang
Yingya Zhang
Jing Dong
Caifeng Shan
149
2
0
12 Oct 2023
Distilling Efficient Vision Transformers from CNNs for Semantic
  Segmentation
Distilling Efficient Vision Transformers from CNNs for Semantic SegmentationPattern Recognition (Pattern Recogn.), 2023
Xueye Zheng
Yunhao Luo
Pengyuan Zhou
Lin Wang
153
25
0
11 Oct 2023
SpikeCLIP: A Contrastive Language-Image Pretrained Spiking Neural Network
SpikeCLIP: A Contrastive Language-Image Pretrained Spiking Neural NetworkNeural Networks (Neural Netw.), 2023
Changze Lv
Tianlong Li
Changze Lv
Yufei Gu
Jianhan Xu
Cenyuan Zhang
Muling Wu
Xiaoqing Zheng
Xuanjing Huang
CLIPVLM
422
3
0
10 Oct 2023
Controllable Chest X-Ray Report Generation from Longitudinal
  Representations
Controllable Chest X-Ray Report Generation from Longitudinal RepresentationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Francesco Dalla Serra
Chaoyang Wang
Fani Deligianni
Jeffrey Stephen Dalton
Alison Q. OÑeil
MedIm
143
22
0
09 Oct 2023
Previous
123...567...222324
Next