Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.19651
Cited By
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
28 March 2024
Kai Zhang
Yi Luan
Hexiang Hu
Kenton Lee
Siyuan Qiao
Wenhu Chen
Yu-Chuan Su
Ming-Wei Chang
VLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions"
27 / 27 papers shown
Title
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
Tiancheng Gu
Kaicheng Yang
Ziyong Feng
Xingjun Wang
Yanzhao Zhang
Dingkun Long
Yingda Chen
Weidong Cai
Jiankang Deng
VLM
82
0
0
24 Apr 2025
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
Cheng-Yu Hsieh
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Hadi Pouransari
VLM
61
0
0
11 Apr 2025
Transfer between Modalities with MetaQueries
Xichen Pan
Satya Narayan Shukla
Aashu Singh
Zhuokai Zhao
Shlok Kumar Mishra
...
Jiuhai Chen
Kunpeng Li
F. Xu
Ji Hou
Saining Xie
DiffM
41
6
0
08 Apr 2025
IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval
Bangwei Liu
Yicheng Bao
Shaohui Lin
Xuhong Wang
Xin Tan
Y. Wang
Yuan Xie
Chaochao Lu
55
0
0
01 Apr 2025
good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval
Pranavi Kolouju
Eric Xing
Robert Pless
Nathan Jacobs
Abby Stylianou
3DV
55
0
0
22 Mar 2025
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang
Jing Yu
Keke Gai
Jiamin Zhuang
Gang Xiong
Gaopeng Gou
Qi Wu
VGen
39
1
0
21 Mar 2025
Towards Understanding Graphical Perception in Large Multimodal Models
Kai Zhang
Jianwei Yang
J. Inala
Chandan Singh
Jianfeng Gao
Yu Su
Chenglong Wang
42
1
0
13 Mar 2025
Data-Efficient Generalization for Zero-shot Composed Image Retrieval
Zining Chen
Zhicheng Zhao
Fei Su
Xiaoqin Zhang
Shijian Lu
VLM
40
0
0
07 Mar 2025
ABC: Achieving Better Control of Multimodal Embeddings using VLMs
Benjamin Schneider
Florian Kerschbaum
Wenhu Chen
50
0
0
01 Mar 2025
SuperRAG: Beyond RAG with Layout-Aware Graph Modeling
Jeff Yang
Duy-Khanh Vu
Minh-Tien Nguyen
Xuan-Quang Nguyen
Linh Nguyen
H. Le
3DV
65
0
0
28 Feb 2025
Joint Fusion and Encoding: Advancing Multimodal Retrieval from the Ground Up
Lang Huang
Qiyu Wu
Zhongtao Miao
T. Yamasaki
50
0
0
27 Feb 2025
A Comprehensive Survey on Composed Image Retrieval
Xuemeng Song
Haoqiang Lin
Haokun Wen
Bohan Hou
Mingzhu Xu
Liqiang Nie
42
1
0
19 Feb 2025
Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval
Ze Liu
Zhengyang Liang
Junjie Zhou
Zheng Liu
Defu Lian
OffRL
53
0
0
17 Feb 2025
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Ziyan Jiang
Rui Meng
Xinyi Yang
Semih Yavuz
Yingbo Zhou
Wenhu Chen
MLLM
VLM
51
18
0
03 Jan 2025
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval
Yuanmin Tang
Xiaoting Qin
J. Zhang
Jing Yu
Gaopeng Gou
Gang Xiong
Qingwei Ling
Saravan Rajmohan
Dongmei Zhang
Qi Wu
LRM
66
1
0
15 Dec 2024
Composed Image Retrieval for Training-Free Domain Conversion
Nikos Efthymiadis
Bill Psomas
Zakaria Laskar
Konstantinos Karantzalos
Yannis Avrithis
Ondřej Chum
Giorgos Tolias
65
0
0
04 Dec 2024
Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy
Y. Li
Fan Ma
Yi Yang
133
2
0
24 Nov 2024
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
Maitreya Patel
Abhiram Kusumba
Sheng Cheng
Changhoon Kim
Tejas Gokhale
Chitta Baral
Yezhou Yang
CoGe
CLIP
50
7
0
04 Nov 2024
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs
Sheng-Chieh Lin
Chankyu Lee
M. Shoeybi
Jimmy J. Lin
Bryan Catanzaro
Wei Ping
57
10
0
04 Nov 2024
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
Wenbo Hu
Jia-Chen Gu
Zi-Yi Dou
Mohsen Fayyaz
Pan Lu
Kai-Wei Chang
Nanyun Peng
VLM
54
4
0
10 Oct 2024
Video Enriched Retrieval Augmented Generation Using Aligned Video Captions
Kevin Dela Rosa
21
0
0
27 May 2024
CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion
Geonmo Gu
Sanghyuk Chun
Wonjae Kim
HeeJae Jun
Yoohoon Kang
Sangdoo Yun
DiffM
16
50
0
21 Mar 2023
Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Hexiang Hu
Yi Luan
Yang Chen
Urvashi Khandelwal
Mandar Joshi
Kenton Lee
Kristina Toutanova
Ming-Wei Chang
VLM
43
54
0
22 Feb 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
382
4,010
0
28 Jan 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
1