Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2004.06165
Cited By
v1
v2
v3
v4
v5 (latest)
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
European Conference on Computer Vision (ECCV), 2020
13 April 2020
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
Lei Zhang
Lijuan Wang
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks"
50 / 1,171 papers shown
Title
Training on Synthetic Data Beats Real Data in Multimodal Relation Extraction
Zilin Du
Haoxin Li
Xu Guo
Boyang Li
191
2
0
05 Dec 2023
How to Configure Good In-Context Sequence for Visual Question Answering
Computer Vision and Pattern Recognition (CVPR), 2023
Li Li
Jiawei Peng
Huiyi Chen
Chongyang Gao
Xu Yang
MLLM
156
35
0
04 Dec 2023
Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
IEEE Transactions on Geoscience and Remote Sensing (TGRS), 2023
Cong Yang
Zuchao Li
Lefei Zhang
99
49
0
02 Dec 2023
LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models
Ying Nie
Wei He
Kai Han
Yehui Tang
Tianyu Guo
Fanyi Du
Yunhe Wang
VLM
144
5
0
01 Dec 2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning
Artemis Panagopoulou
Le Xue
Ning Yu
Junnan Li
Dongxu Li
Shafiq Joty
Ran Xu
Silvio Savarese
Caiming Xiong
Juan Carlos Niebles
VLM
MLLM
225
68
0
30 Nov 2023
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Jacob Zhiyuan Fang
Skyler Zheng
Vasu Sharma
Robinson Piramuthu
VLM
220
1
0
28 Nov 2023
Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Jiayun Luo
Siddhesh Khandelwal
Leonid Sigal
Boyang Albert Li
MLLM
VLM
364
11
0
28 Nov 2023
Compositional Chain-of-Thought Prompting for Large Multimodal Models
Computer Vision and Pattern Recognition (CVPR), 2023
Chancharik Mitra
Brandon Huang
Trevor Darrell
Roei Herzig
MLLM
LRM
233
147
0
27 Nov 2023
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Computer Vision and Pattern Recognition (CVPR), 2023
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
...
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLM
ELM
VLM
633
1,413
0
27 Nov 2023
EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
Computer Vision and Pattern Recognition (CVPR), 2023
Jiaxuan Li
D. Vo
Akihiro Sugimoto
Hideki Nakayama
KELM
VLM
205
35
0
27 Nov 2023
Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs
Yunxin Li
Baotian Hu
Wei Wang
Xiaochun Cao
Min Zhang
96
6
0
27 Nov 2023
Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
Computer Vision and Pattern Recognition (CVPR), 2023
Yifei Chen
Dapeng Chen
Ruijin Liu
Sai Zhou
Wenyuan Xue
Wei Peng
204
12
0
27 Nov 2023
Vamos: Versatile Action Models for Video Understanding
European Conference on Computer Vision (ECCV), 2023
Shijie Wang
Qi Zhao
Minh Quan Do
Nakul Agarwal
Kwonjoon Lee
Chen Sun
282
32
0
22 Nov 2023
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yangyi Chen
Xingyao Wang
Pengfei Yu
Derek Hoiem
Heng Ji
164
13
0
22 Nov 2023
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
European Conference on Computer Vision (ECCV), 2023
Meng Chu
Zhedong Zheng
Wei Ji
Tingyu Wang
Tat-Seng Chua
159
20
0
21 Nov 2023
Open-Vocabulary Camouflaged Object Segmentation
Youwei Pang
Xiaoqi Zhao
Jiaming Zuo
Lihe Zhang
Huchuan Lu
VLM
ObjD
223
11
0
19 Nov 2023
DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
Yangyi Chen
Karan Sikka
Michael Cogswell
Heng Ji
Ajay Divakaran
285
94
0
16 Nov 2023
AI Recommendation System for Enhanced Customer Experience: A Novel Image-to-Text Method
Mohamaed Foued Ayedi
Hiba Ben Salem
Soulaimen Hammami
Ahmed Ben Said
Rateb Jabbar
Achraf Chabbouh
188
4
0
16 Nov 2023
Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search
Hefeng Wu
Weifeng Chen
Zhibin Liu
Tianshui Chen
Zhiguang Chen
Liang Lin
144
18
0
15 Nov 2023
Fast Certification of Vision-Language Models Using Incremental Randomized Smoothing
Ashutosh Nirala
Ameya Joshi
Chinmay Hegde
S Sarkar
VLM
208
0
0
15 Nov 2023
Violet: A Vision-Language Model for Arabic Image Captioning with Gemini Decoder
Abdelrahman Mohamed
Fakhraddin Alwajih
El Moatez Billah Nagoudi
Alcides Alcoba Inciarte
Muhammad Abdul-Mageed
VLM
MLLM
99
11
0
15 Nov 2023
Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
WonJun Moon
Sangeek Hyun
Subeen Lee
Jae-Pil Heo
176
12
0
15 Nov 2023
Improving Hateful Meme Detection through Retrieval-Guided Contrastive Learning
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Jingbiao Mei
Jinghong Chen
Weizhe Lin
Bill Byrne
Marcus Tomalin
VLM
113
15
0
14 Nov 2023
Improving Vision-and-Language Reasoning via Spatial Relations Modeling
Cheng Yang
Rui Xu
Ye Guo
Peixiang Huang
Yiru Chen
Wenkui Ding
Zhongyuan Wang
Hong Zhou
LRM
120
7
0
09 Nov 2023
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Georgios Pantazopoulos
Malvina Nikandrou
Amit Parekh
Bhathiya Hemanthage
Arash Eshghi
Ioannis Konstas
Verena Rieser
Oliver Lemon
Alessandro Suglia
LM&Ro
136
9
0
07 Nov 2023
Meta-Adapter: An Online Few-shot Learner for Vision-Language Model
Neural Information Processing Systems (NeurIPS), 2023
Cheng Cheng
Lin Song
Ruoyi Xue
Hang Wang
Hongbin Sun
Yixiao Ge
Ying Shan
VLM
ObjD
230
42
0
07 Nov 2023
MetaReVision: Meta-Learning with Retrieval for Visually Grounded Compositional Concept Acquisition
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Guangyue Xu
Parisa Kordjamshidi
Joyce Chai
117
2
0
02 Nov 2023
CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders
Neural Information Processing Systems (NeurIPS), 2023
A. Fuller
K. Millard
James R. Green
167
116
0
01 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Information Fusion (Inf. Fusion), 2023
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
264
64
0
01 Nov 2023
Class Incremental Learning with Pre-trained Vision-Language Models
Xialei Liu
Xusheng Cao
Haori Lu
Jia-Wen Xiao
Andrew D. Bagdanov
Ming-Ming Cheng
VLM
200
16
0
31 Oct 2023
Women Wearing Lipstick: Measuring the Bias Between an Object and Its Related Gender
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ahmed Sabir
Lluís Padró
167
1
0
29 Oct 2023
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
Neural Information Processing Systems (NeurIPS), 2023
Xingrui Wang
Wufei Ma
Zhuowan Li
Adam Kortylewski
Yaoyao Liu
CoGe
178
17
0
27 Oct 2023
Prompt Me Up: Unleashing the Power of Alignments for Multimodal Entity and Relation Extraction
ACM Multimedia (ACM MM), 2023
Xuming Hu
Junzhe Chen
Aiwei Liu
Shiao Meng
Lijie Wen
Philip S. Yu
141
25
0
25 Oct 2023
UniMAP: Universal SMILES-Graph Representation Learning
Shikun Feng
Lixin Yang
Wei-Ying Ma
Yanyan Lan
OffRL
122
9
0
22 Oct 2023
Multiscale Superpixel Structured Difference Graph Convolutional Network for VL Representation
Siyu Zhang
Ye-Ting Chen
Fang Wang
Yaoru Sun
Jun Yang
Lizhi Bai
SSL
193
0
0
20 Oct 2023
InvGC: Robust Cross-Modal Retrieval by Inverse Graph Convolution
Xiangru Jian
Yimu Wang
147
6
0
20 Oct 2023
Multi-level Contrastive Learning for Script-based Character Understanding
Dawei Li
Hengyuan Zhang
Yanran Li
Shiping Yang
164
17
0
20 Oct 2023
PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining
Kecen Li
Chen Gong
Zhixiang Li
Yuzhong Zhao
Xinwen Hou
Tianhao Wang
254
16
0
19 Oct 2023
Grounded and Well-rounded: A Methodological Approach to the Study of Cross-modal and Cross-lingual Grounding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Timothee Mickus
Elaine Zosa
Denis Paperno
110
0
0
18 Oct 2023
Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery Banks
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yimu Wang
Xiangru Jian
Bo Xue
126
20
0
17 Oct 2023
Matrix Compression via Randomized Low Rank and Low Precision Factorization
Neural Information Processing Systems (NeurIPS), 2023
R. Saha
Varun Srivastava
Mert Pilanci
177
30
0
17 Oct 2023
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yanyang Guo
Fangkai Jiao
Zhiqi Shen
Liqiang Nie
Mohan S. Kankanhalli
MLLM
248
10
0
17 Oct 2023
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
Computer Vision and Pattern Recognition (CVPR), 2023
Yangyang Guo
Guangzhi Wang
Mohan S. Kankanhalli
153
7
0
16 Oct 2023
Few-shot Action Recognition with Captioning Foundation Models
Xiang Wang
Shiwei Zhang
Hangjie Yuan
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
VLM
207
9
0
16 Oct 2023
CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes
Yulei Qin
Xingyu Chen
Chunjiang Ge
Chaoyou Fu
Yun Gu
Ke Li
Xing Sun
Rongrong Ji
148
3
0
15 Oct 2023
JM3D & JM3D-LLM: Elevating 3D Understanding with Joint Multi-modal Cues
Jiayi Ji
Haowei Wang
Changli Wu
Yiwei Ma
Xiaoshuai Sun
Rongrong Ji
216
1
0
14 Oct 2023
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
Yueming Lyu
Kang Zhao
Bo Peng
H. Chen
Yue Jiang
Yingya Zhang
Jing Dong
Caifeng Shan
149
2
0
12 Oct 2023
Distilling Efficient Vision Transformers from CNNs for Semantic Segmentation
Pattern Recognition (Pattern Recogn.), 2023
Xueye Zheng
Yunhao Luo
Pengyuan Zhou
Lin Wang
153
25
0
11 Oct 2023
SpikeCLIP: A Contrastive Language-Image Pretrained Spiking Neural Network
Neural Networks (Neural Netw.), 2023
Changze Lv
Tianlong Li
Changze Lv
Yufei Gu
Jianhan Xu
Cenyuan Zhang
Muling Wu
Xiaoqing Zheng
Xuanjing Huang
CLIP
VLM
422
3
0
10 Oct 2023
Controllable Chest X-Ray Report Generation from Longitudinal Representations
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Francesco Dalla Serra
Chaoyang Wang
Fani Deligianni
Jeffrey Stephen Dalton
Alison Q. OÑeil
MedIm
143
22
0
09 Oct 2023
Previous
1
2
3
...
5
6
7
...
22
23
24
Next