ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.06165
  4. Cited By
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
v1v2v3v4v5 (latest)

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

European Conference on Computer Vision (ECCV), 2020
13 April 2020
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
Lei Zhang
Lijuan Wang
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
    VLM
ArXiv (abs)PDFHTML

Papers citing "Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks"

50 / 1,171 papers shown
Title
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Jiayi Ji
Jie Lou
Debing Zhang
Rongrong Ji
371
6
0
26 Mar 2025
Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation
Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation
Ziming Wei
Bingqian Lin
Yunshuang Nie
Jiaqi Chen
Shikui Ma
Hang Xu
Xiaodan Liang
298
2
0
23 Mar 2025
HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions
HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions
Yifei Dong
Fengyi Wu
Qi He
Heng Li
Heng Li
...
Yuxuan Zhou
Yuxuan Zhou
Jingdong Sun
Zhi-Qi Cheng
Alexander G. Hauptmann
LM&Ro
186
1
0
18 Mar 2025
Stitch-a-Recipe: Video Demonstration from Multistep Descriptions
Stitch-a-Recipe: Video Demonstration from Multistep Descriptions
Chi Hsuan Wu
Kumar Ashutosh
Kristen Grauman
DiffM
194
0
0
18 Mar 2025
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster InferenceComputer Vision and Pattern Recognition (CVPR), 2025
Hao Yin
Guangzong Si
Zilei Wang
181
4
0
17 Mar 2025
SuperCap: Multi-resolution Superpixel-based Image Captioning
Henry Senior
Luca Rossi
Gregory Slabaugh
Shanxin Yuan
VLM
207
0
0
11 Mar 2025
A Woman with a Knife or A Knife with a Woman? Measuring Directional Bias Amplification in Image Captions
A Woman with a Knife or A Knife with a Woman? Measuring Directional Bias Amplification in Image Captions
Rahul Nair
Bhanu Tokas
Neel Shah
215
0
0
10 Mar 2025
TPC: Cross-Temporal Prediction Connection for Vision-Language Model Hallucination Reduction
Chao Wang
Weiwei Fu
Yang Zhou
MLLMVLM
249
2
0
06 Mar 2025
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question AnsweringComputer Vision and Pattern Recognition (CVPR), 2025
Tianyu Huai
Jie Zhou
Xingjiao Wu
Qin Chen
Qingchun Bai
Ze Zhou
Liang He
MoE
185
7
0
01 Mar 2025
Data Distributional Properties As Inductive Bias for Systematic Generalization
Data Distributional Properties As Inductive Bias for Systematic GeneralizationComputer Vision and Pattern Recognition (CVPR), 2025
Felipe del-Rio
Alain Raymond-Sáez
Daniel Florea
Rodrigo Toro Icarte
Julio Hurtado
Cristian B. Calderon
Á. Soto
AI4CE
272
1
0
27 Feb 2025
MICINet: Multi-Level Inter-Class Confusing Information Removal for Reliable Multimodal Classification
MICINet: Multi-Level Inter-Class Confusing Information Removal for Reliable Multimodal Classification
Tianze Zhang
Shu Shen
Chao Chen
281
0
0
27 Feb 2025
Grad-ECLIP: Gradient-based Visual and Textual Explanations for CLIP
Grad-ECLIP: Gradient-based Visual and Textual Explanations for CLIP
Chenyang Zhao
Kun Wang
J. H. Hsiao
Antoni B. Chan
CLIP
203
6
0
26 Feb 2025
Can Hallucination Correction Improve Video-Language Alignment?
Can Hallucination Correction Improve Video-Language Alignment?Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Lingjun Zhao
Mingyang Xie
Paola Cascante-Bonilla
Hal Daumé III
Kwonjoon Lee
HILMVLM
239
1
0
20 Feb 2025
Color Universal Design Neural Network for the Color Vision Deficiencies
Color Universal Design Neural Network for the Color Vision Deficiencies
Sunyong Seo
Jinho Park
272
2
0
12 Feb 2025
Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions
Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex InteractionsAAAI Conference on Artificial Intelligence (AAAI), 2024
Prajwal Gatti
Kshitij Parikh
Dhriti Prasanna Paul
Manish Gupta
Anand Mishra
396
3
0
12 Feb 2025
Vision-Language Models for Edge Networks: A Comprehensive Survey
Vision-Language Models for Edge Networks: A Comprehensive SurveyIEEE Internet of Things Journal (IEEE IoT J.), 2025
Ahmed Sharshar
Latif U. Khan
Waseem Ullah
Mohsen Guizani
VLM
273
10
0
11 Feb 2025
Efficient Vision Language Model Fine-tuning for Text-based Person Anomaly Search
Efficient Vision Language Model Fine-tuning for Text-based Person Anomaly SearchThe Web Conference (WWW), 2025
J. He
Shengeng Tang
Ao Liu
Lechao Cheng
Jingjing Wu
Yanyan Wei
159
2
0
05 Feb 2025
Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation
Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation
Lin Chen
Qi Yang
Kun Ding
Tianying Wang
Gang Shen
Fei Li
Qiyuan Cao
Shiming Xiang
VLM
127
1
0
29 Jan 2025
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
Anupam Pandey
Deepjyoti Bodo
Arpan Phukan
Asif Ekbal
271
2
0
13 Jan 2025
Improving Generated and Retrieved Knowledge Combination Through
  Zero-shot Generation
Improving Generated and Retrieved Knowledge Combination Through Zero-shot Generation
Xinkai Du
Quanjie Han
Chao Lv
Yi Liu
Yalin Sun
Hao Shu
Hongbo Shan
Maosong Sun
RALM
269
0
0
25 Dec 2024
Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation
Reframing Image Difference Captioning with BLIP2IDC and Synthetic AugmentationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Gautier Evennou
Antoine Chaffin
Vivien Chappelier
Ewa Kijak
DiffM
200
1
0
20 Dec 2024
G-VEval: A Versatile Metric for Evaluating Image and Video Captions
  Using GPT-4o
G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4oAAAI Conference on Artificial Intelligence (AAAI), 2024
Tony Cheng Tong
Sirui He
Z. Shao
Dit-Yan Yeung
192
12
0
18 Dec 2024
Bringing Multimodality to Amazon Visual Search System
Bringing Multimodality to Amazon Visual Search SystemKnowledge Discovery and Data Mining (KDD), 2024
Xinliang Zhu
Michael Huang
Han Ding
Jinyu Yang
Kelvin Chen
...
Son Dinh Tran
Benjamin Z. Yao
Doug Gray
Anuj Bindal
Arnab Dhua
191
7
0
17 Dec 2024
Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track
Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track
D. Gupta
Dina Demner-Fushman
LM&MA
155
1
0
15 Dec 2024
UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models
  for Universal Cross-Domain Retrieval
UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain RetrievalIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Haoyu Jiang
Zhi-Qi Cheng
Gabriel Moreira
Jiawen Zhu
Yuxuan Zhou
Bukun Ren
Jun-Yan He
Jingdong Sun
Xian-Sheng Hua
VLM
207
0
0
14 Dec 2024
Advances in Transformers for Robotic Applications: A Review
Advances in Transformers for Robotic Applications: A Review
Nikunj Sanghai
Nik Bear Brown
AI4CE
247
4
0
13 Dec 2024
Composed Image Retrieval for Training-Free Domain Conversion
Composed Image Retrieval for Training-Free Domain ConversionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Nikos Efthymiadis
Bill Psomas
Zakaria Laskar
Konstantinos Karantzalos
Yannis Avrithis
Ondřej Chum
Giorgos Tolias
236
2
0
04 Dec 2024
Learning to Reason Iteratively and Parallelly for Complex Visual
  Reasoning Scenarios
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning ScenariosNeural Information Processing Systems (NeurIPS), 2024
Shantanu Jaiswal
Debaditya Roy
Basura Fernando
Cheston Tan
ReLMLRM
247
4
0
20 Nov 2024
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
Raihan Kabir
Naznin Haque
Md. Saiful Islam
Marium-E. Jannat
CoGe
173
5
0
17 Nov 2024
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
407
2
0
15 Nov 2024
Spider: Any-to-Many Multimodal LLM
Spider: Any-to-Many Multimodal LLM
Jinxiang Lai
Jie Zhang
Jun Liu
Jian Li
Xiaocheng Lu
Song Guo
MLLM
392
4
0
14 Nov 2024
ViTOC: Vision Transformer and Object-aware Captioner
ViTOC: Vision Transformer and Object-aware Captioner
Feiyang Huang
251
2
0
09 Nov 2024
Hierarchical Visual Feature Aggregation for OCR-Free Document
  Understanding
Hierarchical Visual Feature Aggregation for OCR-Free Document UnderstandingNeural Information Processing Systems (NeurIPS), 2024
Jaeyoo Park
Jin Young Choi
Jeonghyung Park
Bohyung Han
VLM
47
7
0
08 Nov 2024
Semantic-Aligned Adversarial Evolution Triangle for High-Transferability
  Vision-Language Attack
Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language AttackIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Yang Liu
Sensen Gao
Qing Guo
Ke Ma
Yihao Huang
Simeng Qin
Yang Liu
Ivor Tsang Fellow
Xiaochun Cao
AAML
143
5
0
04 Nov 2024
Aggregate-and-Adapt Natural Language Prompts for Downstream
  Generalization of CLIP
Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIPNeural Information Processing Systems (NeurIPS), 2024
Chen Huang
Skyler Seto
Samira Abnar
David Grangier
Navdeep Jaitly
J. Susskind
VLM
159
4
0
31 Oct 2024
Preserving Pre-trained Representation Space: On Effectiveness of
  Prefix-tuning for Large Multi-modal Models
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Donghoon Kim
Gusang Lee
Kyuhong Shim
B. Shim
205
5
0
29 Oct 2024
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language
  Tuning
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language TuningInternational Journal of Computer Vision (IJCV), 2024
Zhiwei Hao
Jianyuan Guo
Li Shen
Yong Luo
Han Hu
Yonggang Wen
VLM
181
2
0
23 Oct 2024
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
Tian-Zi Niu
Zhen-Duo Chen
Xin Luo
Xin-Shun Xu
108
0
0
22 Oct 2024
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping
  Language-Image Pre-training
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-trainingIEEE transactions on multimedia (IEEE TMM), 2024
Muhe Ding
Yang Ma
Pengda Qin
Yue Yu
Yuhong Li
Liqiang Nie
140
2
0
18 Oct 2024
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language AlignmentInternational Conference on Learning Representations (ICLR), 2024
Chenhang Cui
An Zhang
Yiyang Zhou
Zhaorun Chen
Gelei Deng
Huaxiu Yao
Tat-Seng Chua
420
12
0
18 Oct 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for
  Vision-Language Pre-Training
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-TrainingACM Multimedia (ACM MM), 2022
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
265
9
0
16 Oct 2024
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic ModelingComputer Vision and Pattern Recognition (CVPR), 2024
Jian Yang
Dacheng Yin
Yizhou Zhou
Fengyun Rao
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
DiffM
225
11
0
14 Oct 2024
Prompting Video-Language Foundation Models with Domain-specific
  Fine-grained Heuristics for Video Question Answering
Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering
Ting Yu
Kunhao Fu
Shuhui Wang
Qingming Huang
Jun Yu
207
6
0
12 Oct 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and TrainingInternational Journal of Computer Vision (IJCV), 2024
Sara Sarto
Nicholas Moratelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
175
8
0
09 Oct 2024
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Hanqi Jiang
Xixuan Hao
Yuzhou Huang
Chong Ma
Jiaxun Zhang
Yi Pan
Ruimao Zhang
MedIm
265
1
0
01 Oct 2024
TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving
  Fine-Grained Zero-Shot Image Captioning
TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving Fine-Grained Zero-Shot Image CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Joshua Forster Feinglass
Yezhou Yang
131
0
0
30 Sep 2024
Multi-modal Generative AI: Multi-modal LLMs, Diffusions and the Unification
Multi-modal Generative AI: Multi-modal LLMs, Diffusions and the Unification
X. Wang
Yuwei Zhou
Bin Huang
Hong Chen
Wenwu Zhu
DiffM
285
1
0
23 Sep 2024
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
Lin Li
Guikun Chen
Hanrong Shi
Jun Xiao
Long Chen
243
22
0
21 Sep 2024
KALE: An Artwork Image Captioning System Augmented with Heterogeneous
  Graph
KALE: An Artwork Image Captioning System Augmented with Heterogeneous GraphInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Yanbei Jiang
Krista A. Ehinger
Jey Han Lau
SLR
185
6
0
17 Sep 2024
QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems
QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems
Zhixian He
Pengcheng Zhao
Fuwei Zhang
Shujin Lin
209
0
0
14 Sep 2024
Previous
12345...222324
Next