Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2004.06165
Cited By
v1
v2
v3
v4
v5 (latest)
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
European Conference on Computer Vision (ECCV), 2020
13 April 2020
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
Lei Zhang
Lijuan Wang
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks"
50 / 1,171 papers shown
Title
Generalization Boosted Adapter for Open-Vocabulary Segmentation
Wenhao Xu
Changwei Wang
Xuxiang Feng
Rongtao Xu
Longzhao Huang
Zherui Zhang
Li Guo
Shibiao Xu
VLM
211
6
0
13 Sep 2024
ComAlign: Compositional Alignment in Vision-Language Models
Ali Abdollah
Amirmohammad Izadi
Armin Saghafian
Reza Vahidimajd
Mohammad Mozafari
Amirreza Mirzaei
Mohammadmahdi Samiei
M. Baghshah
CoGe
VLM
134
1
0
12 Sep 2024
FODA-PG for Enhanced Medical Imaging Narrative Generation: Adaptive Differentiation of Normal and Abnormal Attributes
IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024
Kai Shu
Yuzhuo Jia
Ziyang Zhang
Jiechao Gao
MedIm
191
0
0
06 Sep 2024
See or Guess: Counterfactually Regularized Image Captioning
ACM Multimedia (MM), 2024
Qian Cao
Xu Chen
Ruihua Song
Xiting Wang
Xinting Huang
Yuchen Ren
CML
133
1
0
29 Aug 2024
Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction
Yuka Ko
Sheng Li
Chao-Han Huck Yang
Tatsuya Kawahara
AuLLM
89
5
0
29 Aug 2024
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DV
VLM
158
2
0
28 Aug 2024
Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis
IEEE International Joint Conference on Neural Network (IJCNN), 2024
Aishik Nagar
Shantanu Jaiswal
Cheston Tan
ReLM
LRM
90
18
0
27 Aug 2024
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
British Machine Vision Conference (BMVC), 2024
Nicholas Moratelli
Davide Caffagni
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
CLIP
184
6
0
26 Aug 2024
Probing the Robustness of Vision-Language Pretrained Models: A Multimodal Adversarial Attack Approach
Jiwei Guan
Tianyu Ding
Longbing Cao
Lei Pan
Chen Wang
Xi Zheng
AAML
203
3
0
24 Aug 2024
Efficient and Versatile Robust Fine-Tuning of Zero-shot Models
European Conference on Computer Vision (ECCV), 2024
Sungyeon Kim
Boseung Jeong
Donghyun Kim
Suha Kwak
VLM
142
7
0
11 Aug 2024
Loc4Plan: Locating Before Planning for Outdoor Vision and Language Navigation
ACM Multimedia (MM), 2024
Huilin Tian
Jingke Meng
Wei-Shi Zheng
Yuan-Ming Li
Junkai Yan
Yunong Zhang
181
5
0
09 Aug 2024
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
European Conference on Computer Vision (ECCV), 2024
William Y. Zhu
Keren Ye
Junjie Ke
Jiahui Yu
Leonidas Guibas
P. Milanfar
Feng Yang
205
2
0
07 Aug 2024
Sample-agnostic Adversarial Perturbation for Vision-Language Pre-training Models
ACM Multimedia (MM), 2024
Haonan Zheng
Wen Jiang
Xinyang Deng
Wenrui Li
VLM
AAML
133
4
0
06 Aug 2024
ExpertAF: Expert Actionable Feedback from Video
Computer Vision and Pattern Recognition (CVPR), 2024
Kumar Ashutosh
Tushar Nagarajan
Georgios Pavlakos
Kris Kitani
Kristen Grauman
VGen
370
7
0
01 Aug 2024
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
European Conference on Computer Vision (ECCV), 2024
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
123
10
0
29 Jul 2024
FlexAttention for Efficient High-Resolution Vision-Language Models
European Conference on Computer Vision (ECCV), 2024
Junyan Li
Delin Chen
Tianle Cai
Peihao Chen
Yining Hong
Zhenfang Chen
Yikang Shen
Chuang Gan
VLM
210
6
0
29 Jul 2024
MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
Biao Wu
Yutong Xie
Zeyu Zhang
Minh Hieu Phan
Qi Chen
Ling-Hao Chen
Qi Wu
LM&MA
147
0
0
28 Jul 2024
MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
Jihyung Kil
Zheda Mai
Justin Lee
Zihe Wang
Kerrie Cheng
Jingyan Bai
Ye Liu
A. Chowdhury
Wei-Lun Chao
CoGe
VLM
232
31
0
23 Jul 2024
HAPFI: History-Aware Planning based on Fused Information
Sujin Jeon
Suyeon Shin
Byoung-Tak Zhang
116
0
0
23 Jul 2024
Benchmark Granularity and Model Robustness for Image-Text Retrieval
Mariya Hendriksen
Shuo Zhang
R. Reinanda
Mohamed Yahya
Edgar Meij
Maarten de Rijke
197
0
0
21 Jul 2024
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
Gengze Zhou
Yicong Hong
Zun Wang
Xin Eric Wang
Qi Wu
LM&Ro
192
60
0
17 Jul 2024
Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding
Minghui Wu
Chenxu Zhao
Anyang Su
Donglin Di
Tianyu Fu
...
Min He
Ya Gao
Meng Ma
Kun Yan
Ping Wang
201
5
0
11 Jul 2024
How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?
Yuxin Chen
Zongyang Ma
Ziqi Zhang
Chen Ma
Chunfeng Yuan
Bing Li
Junfu Pu
Ying Shan
Xiaojuan Qi
Weiming Hu
114
2
0
10 Jul 2024
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts
Yijia Xiao
Edward Sun
Tianyu Liu
Wei Wang
LRM
134
94
0
06 Jul 2024
SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation
Sayan Nag
Koustava Goswami
Srikrishna Karanam
198
6
0
02 Jul 2024
The Odyssey of Commonsense Causality: From Foundational Benchmarks to Cutting-Edge Reasoning
Shaobo Cui
Zhijing Jin
Bernhard Schölkopf
Boi Faltings
CML
LRM
150
7
0
27 Jun 2024
Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Heng Li
Heng Li
Zhi-Qi Cheng
Yifei Dong
Yuxuan Zhou
Jun-Yan He
Jingdong Sun
Teruko Mitamura
Alexander G. Hauptmann
LM&Ro
184
15
0
27 Jun 2024
Figuring out Figures: Using Textual References to Caption Scientific Figures
Stanley Cao
Kevin Liu
152
0
0
25 Jun 2024
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
Michal Golovanevsky
William Rudman
Vedant Palit
Ritambhara Singh
Carsten Eickhoff
258
10
0
24 Jun 2024
Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation
Chunyuan Deng
Yilun Zhao
Yuzhao Heng
Yitong Li
Jiannan Cao
Xiangru Tang
Arman Cohan
167
24
0
20 Jun 2024
IWISDM: Assessing instruction following in multimodal models at scale
Xiaoxuan Lei
Lucas Gomez
Hao Yuan Bai
P. Bashivan
VLM
276
2
0
20 Jun 2024
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
Yujie Lu
Dongfu Jiang
Wenhu Chen
William Yang Wang
Yejin Choi
Bill Yuchen Lin
VLM
304
48
0
16 Jun 2024
Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIP
Shuyang Lin
Tong Jia
Hao Wang
Bowen Ma
Mingyuan Li
Dongyue Chen
VLM
ObjD
138
2
0
16 Jun 2024
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags
Daiqing Qi
Handong Zhao
Zijun Wei
Sheng Li
175
3
0
16 Jun 2024
Explore the Limits of Omni-modal Pretraining at Scale
Yiyuan Zhang
Handong Li
Jing Liu
Xiangyu Yue
VLM
LRM
147
1
0
13 Jun 2024
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
Miaosen Zhang
Yixuan Wei
Zhen Xing
Yifei Ma
Zuxuan Wu
...
Zheng Zhang
Jingdong Sun
Chong Luo
Xin Geng
Baining Guo
VLM
146
2
0
13 Jun 2024
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models
Samar Fares
Klea Ziu
Toluwani Aremu
Nikita Durasov
Martin Takáč
Pascal Fua
Karthik Nandakumar
Ivan Laptev
VLM
AAML
153
8
0
13 Jun 2024
ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery
Kam Woh Ng
Xiatian Zhu
Yi-Zhe Song
Tao Xiang
142
2
0
12 Jun 2024
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
Irene Huang
Wei Lin
M. Jehanzeb Mirza
Jacob A. Hansen
Sivan Doveh
...
Trevor Darrel
Chuang Gan
Aude Oliva
Rogerio Feris
Leonid Karlinsky
CoGe
LRM
133
15
0
12 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAML
VLM
268
23
0
08 Jun 2024
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Weihao Ye
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
146
8
0
01 Jun 2024
Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models
Himangi Mittal
Nakul Agarwal
Shao-Yuan Lo
Kwonjoon Lee
218
26
0
30 May 2024
Enhancing Large Vision Language Models with Self-Training on Image Comprehension
Yihe Deng
Pan Lu
Fan Yin
Ziniu Hu
Sheng Shen
James Zou
Kai-Wei Chang
Wei Wang
SyDa
VLM
LRM
178
67
0
30 May 2024
Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text Retrieval
Rui Yang
Shuang Wang
Yi Han
Yuanheng Li
Dong Zhao
Dou Quan
Yanhe Guo
Licheng Jiao
159
7
0
29 May 2024
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Ya Lu
Jishnu Jaykumar
Yunhui Guo
Nicholas Ruozzi
Yu Xiang
VLM
ISeg
388
9
0
28 May 2024
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
Yunqi Zhang
Songda Li
Chunyuan Deng
Luyi Wang
Hui Zhao
203
0
0
27 May 2024
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
Cristian Rodriguez-Opazo
Ehsan Abbasnejad
Damien Teney
Edison Marrese-Taylor
Hamed Damirchi
Anton Van Den Hengel
VLM
208
1
0
27 May 2024
DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
Yuzhong Zhao
Feng Liu
Yue Liu
Mingxiang Liao
Chen Gong
QiXiang Ye
Fang Wan
ObjD
85
4
0
25 May 2024
LAM3D: Large Image-Point-Cloud Alignment Model for 3D Reconstruction from Single Image
Ruikai Cui
Xibin Song
Weixuan Sun
Senbo Wang
Weizhe Liu
...
Taizhang Shang
Yang Li
Nick Barnes
Hongdong Li
Pan Ji
3DV
128
7
0
24 May 2024
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Xiyao Wang
Jiuhai Chen
Zhaoyang Wang
Yuhang Zhou
Yiyang Zhou
...
Wanrong Zhu
Tom Goldstein
Parminder Bhatia
Furong Huang
Cao Xiao
305
59
0
24 May 2024
Previous
1
2
3
4
5
6
...
22
23
24
Next