ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.06165
  4. Cited By
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
v1v2v3v4v5 (latest)

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

European Conference on Computer Vision (ECCV), 2020
13 April 2020
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
Lei Zhang
Lijuan Wang
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
    VLM
ArXiv (abs)PDFHTML

Papers citing "Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks"

50 / 1,171 papers shown
Title
Generalization Boosted Adapter for Open-Vocabulary Segmentation
Generalization Boosted Adapter for Open-Vocabulary Segmentation
Wenhao Xu
Changwei Wang
Xuxiang Feng
Rongtao Xu
Longzhao Huang
Zherui Zhang
Li Guo
Shibiao Xu
VLM
211
6
0
13 Sep 2024
ComAlign: Compositional Alignment in Vision-Language Models
ComAlign: Compositional Alignment in Vision-Language Models
Ali Abdollah
Amirmohammad Izadi
Armin Saghafian
Reza Vahidimajd
Mohammad Mozafari
Amirreza Mirzaei
Mohammadmahdi Samiei
M. Baghshah
CoGeVLM
134
1
0
12 Sep 2024
FODA-PG for Enhanced Medical Imaging Narrative Generation: Adaptive
  Differentiation of Normal and Abnormal Attributes
FODA-PG for Enhanced Medical Imaging Narrative Generation: Adaptive Differentiation of Normal and Abnormal AttributesIEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024
Kai Shu
Yuzhuo Jia
Ziyang Zhang
Jiechao Gao
MedIm
191
0
0
06 Sep 2024
See or Guess: Counterfactually Regularized Image Captioning
See or Guess: Counterfactually Regularized Image CaptioningACM Multimedia (MM), 2024
Qian Cao
Xu Chen
Ruihua Song
Xiting Wang
Xinting Huang
Yuchen Ren
CML
133
1
0
29 Aug 2024
Benchmarking Japanese Speech Recognition on ASR-LLM Setups with
  Multi-Pass Augmented Generative Error Correction
Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction
Yuka Ko
Sheng Li
Chao-Han Huck Yang
Tatsuya Kawahara
AuLLM
89
5
0
29 Aug 2024
Pixels to Prose: Understanding the art of Image Captioning
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DVVLM
158
2
0
28 Aug 2024
Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and
  Analysis
Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and AnalysisIEEE International Joint Conference on Neural Network (IJCNN), 2024
Aishik Nagar
Shantanu Jaiswal
Cheston Tan
ReLMLRM
90
18
0
27 Aug 2024
Revisiting Image Captioning Training Paradigm via Direct CLIP-based
  Optimization
Revisiting Image Captioning Training Paradigm via Direct CLIP-based OptimizationBritish Machine Vision Conference (BMVC), 2024
Nicholas Moratelli
Davide Caffagni
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
CLIP
184
6
0
26 Aug 2024
Probing the Robustness of Vision-Language Pretrained Models: A
  Multimodal Adversarial Attack Approach
Probing the Robustness of Vision-Language Pretrained Models: A Multimodal Adversarial Attack Approach
Jiwei Guan
Tianyu Ding
Longbing Cao
Lei Pan
Chen Wang
Xi Zheng
AAML
203
3
0
24 Aug 2024
Efficient and Versatile Robust Fine-Tuning of Zero-shot Models
Efficient and Versatile Robust Fine-Tuning of Zero-shot ModelsEuropean Conference on Computer Vision (ECCV), 2024
Sungyeon Kim
Boseung Jeong
Donghyun Kim
Suha Kwak
VLM
142
7
0
11 Aug 2024
Loc4Plan: Locating Before Planning for Outdoor Vision and Language
  Navigation
Loc4Plan: Locating Before Planning for Outdoor Vision and Language NavigationACM Multimedia (MM), 2024
Huilin Tian
Jingke Meng
Wei-Shi Zheng
Yuan-Ming Li
Junkai Yan
Yunong Zhang
181
5
0
09 Aug 2024
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language
  Modeling
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language ModelingEuropean Conference on Computer Vision (ECCV), 2024
William Y. Zhu
Keren Ye
Junjie Ke
Jiahui Yu
Leonidas Guibas
P. Milanfar
Feng Yang
205
2
0
07 Aug 2024
Sample-agnostic Adversarial Perturbation for Vision-Language
  Pre-training Models
Sample-agnostic Adversarial Perturbation for Vision-Language Pre-training ModelsACM Multimedia (MM), 2024
Haonan Zheng
Wen Jiang
Xinyang Deng
Wenrui Li
VLMAAML
133
4
0
06 Aug 2024
ExpertAF: Expert Actionable Feedback from Video
ExpertAF: Expert Actionable Feedback from VideoComputer Vision and Pattern Recognition (CVPR), 2024
Kumar Ashutosh
Tushar Nagarajan
Georgios Pavlakos
Kris Kitani
Kristen Grauman
VGen
370
7
0
01 Aug 2024
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger
  Visual Cues
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual CuesEuropean Conference on Computer Vision (ECCV), 2024
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
123
10
0
29 Jul 2024
FlexAttention for Efficient High-Resolution Vision-Language Models
FlexAttention for Efficient High-Resolution Vision-Language ModelsEuropean Conference on Computer Vision (ECCV), 2024
Junyan Li
Delin Chen
Tianle Cai
Peihao Chen
Yining Hong
Zhenfang Chen
Yikang Shen
Chuang Gan
VLM
210
6
0
29 Jul 2024
MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
Biao Wu
Yutong Xie
Zeyu Zhang
Minh Hieu Phan
Qi Chen
Ling-Hao Chen
Qi Wu
LM&MA
147
0
0
28 Jul 2024
MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
Jihyung Kil
Zheda Mai
Justin Lee
Zihe Wang
Kerrie Cheng
Jingyan Bai
Ye Liu
A. Chowdhury
Wei-Lun Chao
CoGeVLM
232
31
0
23 Jul 2024
HAPFI: History-Aware Planning based on Fused Information
HAPFI: History-Aware Planning based on Fused Information
Sujin Jeon
Suyeon Shin
Byoung-Tak Zhang
116
0
0
23 Jul 2024
Benchmark Granularity and Model Robustness for Image-Text Retrieval
Benchmark Granularity and Model Robustness for Image-Text Retrieval
Mariya Hendriksen
Shuo Zhang
R. Reinanda
Mohamed Yahya
Edgar Meij
Maarten de Rijke
197
0
0
21 Jul 2024
NavGPT-2: Unleashing Navigational Reasoning Capability for Large
  Vision-Language Models
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
Gengze Zhou
Yicong Hong
Zun Wang
Xin Eric Wang
Qi Wu
LM&Ro
192
60
0
17 Jul 2024
Hypergraph Multi-modal Large Language Model: Exploiting EEG and
  Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video
  Understanding
Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding
Minghui Wu
Chenxu Zhao
Anyang Su
Donglin Di
Tianyu Fu
...
Min He
Ya Gao
Meng Ma
Kun Yan
Ping Wang
201
5
0
11 Jul 2024
How to Make Cross Encoder a Good Teacher for Efficient Image-Text
  Retrieval?
How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?
Yuxin Chen
Zongyang Ma
Ziqi Zhang
Chen Ma
Chunfeng Yuan
Bing Li
Junfu Pu
Ying Shan
Xiaojuan Qi
Weiming Hu
114
2
0
10 Jul 2024
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual
  Contexts
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts
Yijia Xiao
Edward Sun
Tianyu Liu
Wei Wang
LRM
134
94
0
06 Jul 2024
SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring
  Expression Segmentation
SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation
Sayan Nag
Koustava Goswami
Srikrishna Karanam
198
6
0
02 Jul 2024
The Odyssey of Commonsense Causality: From Foundational Benchmarks to
  Cutting-Edge Reasoning
The Odyssey of Commonsense Causality: From Foundational Benchmarks to Cutting-Edge Reasoning
Shaobo Cui
Zhijing Jin
Bernhard Schölkopf
Boi Faltings
CMLLRM
150
7
0
27 Jun 2024
Human-Aware Vision-and-Language Navigation: Bridging Simulation to
  Reality with Dynamic Human Interactions
Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Heng Li
Heng Li
Zhi-Qi Cheng
Yifei Dong
Yuxuan Zhou
Jun-Yan He
Jingdong Sun
Teruko Mitamura
Alexander G. Hauptmann
LM&Ro
184
15
0
27 Jun 2024
Figuring out Figures: Using Textual References to Caption Scientific
  Figures
Figuring out Figures: Using Textual References to Caption Scientific Figures
Stanley Cao
Kevin Liu
152
0
0
25 Jun 2024
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
Michal Golovanevsky
William Rudman
Vedant Palit
Ritambhara Singh
Carsten Eickhoff
258
10
0
24 Jun 2024
Unveiling the Spectrum of Data Contamination in Language Models: A
  Survey from Detection to Remediation
Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation
Chunyuan Deng
Yilun Zhao
Yuzhao Heng
Yitong Li
Jiannan Cao
Xiangru Tang
Arman Cohan
167
24
0
20 Jun 2024
IWISDM: Assessing instruction following in multimodal models at scale
IWISDM: Assessing instruction following in multimodal models at scale
Xiaoxuan Lei
Lucas Gomez
Hao Yuan Bai
P. Bashivan
VLM
276
2
0
20 Jun 2024
WildVision: Evaluating Vision-Language Models in the Wild with Human
  Preferences
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
Yujie Lu
Dongfu Jiang
Wenhu Chen
William Yang Wang
Yejin Choi
Bill Yuchen Lin
VLM
304
48
0
16 Jun 2024
Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIP
Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIP
Shuyang Lin
Tong Jia
Hao Wang
Bowen Ma
Mingyuan Li
Dongyue Chen
VLMObjD
138
2
0
16 Jun 2024
Reminding Multimodal Large Language Models of Object-aware Knowledge
  with Retrieved Tags
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags
Daiqing Qi
Handong Zhao
Zijun Wei
Sheng Li
175
3
0
16 Jun 2024
Explore the Limits of Omni-modal Pretraining at Scale
Explore the Limits of Omni-modal Pretraining at Scale
Yiyuan Zhang
Handong Li
Jing Liu
Xiangyu Yue
VLMLRM
147
1
0
13 Jun 2024
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks
  and Algorithms
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
Miaosen Zhang
Yixuan Wei
Zhen Xing
Yifei Ma
Zuxuan Wu
...
Zheng Zhang
Jingdong Sun
Chong Luo
Xin Geng
Baining Guo
VLM
146
2
0
13 Jun 2024
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models
Samar Fares
Klea Ziu
Toluwani Aremu
Nikita Durasov
Martin Takáč
Pascal Fua
Karthik Nandakumar
Ivan Laptev
VLMAAML
153
8
0
13 Jun 2024
ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery
ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery
Kam Woh Ng
Xiatian Zhu
Yi-Zhe Song
Tao Xiang
142
2
0
12 Jun 2024
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
Irene Huang
Wei Lin
M. Jehanzeb Mirza
Jacob A. Hansen
Sivan Doveh
...
Trevor Darrel
Chuang Gan
Aude Oliva
Rogerio Feris
Leonid Karlinsky
CoGeLRM
133
15
0
12 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAMLVLM
268
23
0
08 Jun 2024
Image Captioning via Dynamic Path Customization
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Weihao Ye
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
146
8
0
01 Jun 2024
Can't make an Omelette without Breaking some Eggs: Plausible Action
  Anticipation using Large Video-Language Models
Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models
Himangi Mittal
Nakul Agarwal
Shao-Yuan Lo
Kwonjoon Lee
218
26
0
30 May 2024
Enhancing Large Vision Language Models with Self-Training on Image
  Comprehension
Enhancing Large Vision Language Models with Self-Training on Image Comprehension
Yihe Deng
Pan Lu
Fan Yin
Ziniu Hu
Sheng Shen
James Zou
Kai-Wei Chang
Wei Wang
SyDaVLMLRM
178
67
0
30 May 2024
Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing
  Image-Text Retrieval
Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text Retrieval
Rui Yang
Shuang Wang
Yi Han
Yuanheng Li
Dong Zhao
Dou Quan
Yanhe Guo
Licheng Jiao
159
7
0
29 May 2024
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Ya Lu
Jishnu Jaykumar
Yunhui Guo
Nicholas Ruozzi
Yu Xiang
VLMISeg
388
9
0
28 May 2024
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias
  Towards Vision-Language Tasks
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
Yunqi Zhang
Songda Li
Chunyuan Deng
Luyi Wang
Hui Zhao
203
0
0
27 May 2024
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
Cristian Rodriguez-Opazo
Ehsan Abbasnejad
Damien Teney
Edison Marrese-Taylor
Hamed Damirchi
Anton Van Den Hengel
VLM
208
1
0
27 May 2024
DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
Yuzhong Zhao
Feng Liu
Yue Liu
Mingxiang Liao
Chen Gong
QiXiang Ye
Fang Wan
ObjD
85
4
0
25 May 2024
LAM3D: Large Image-Point-Cloud Alignment Model for 3D Reconstruction
  from Single Image
LAM3D: Large Image-Point-Cloud Alignment Model for 3D Reconstruction from Single Image
Ruikai Cui
Xibin Song
Weixuan Sun
Senbo Wang
Weizhe Liu
...
Taizhang Shang
Yang Li
Nick Barnes
Hongdong Li
Pan Ji
3DV
128
7
0
24 May 2024
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Xiyao Wang
Jiuhai Chen
Zhaoyang Wang
Yuhang Zhou
Yiyang Zhou
...
Wanrong Zhu
Tom Goldstein
Parminder Bhatia
Furong Huang
Cao Xiao
305
59
0
24 May 2024
Previous
123456...222324
Next