ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.17109
  4. Cited By
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
v1v2 (latest)

Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval

Computer Vision and Pattern Recognition (CVPR), 2025
21 March 2025
Yuanmin Tang
Jing Yu
Keke Gai
Jiamin Zhuang
Gang Xiong
Gaopeng Gou
Qi Wu
    VGen
ArXiv (abs)PDFHTML

Papers citing "Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval"

46 / 46 papers shown
CrossJEPA: Cross-Modal Joint-Embedding Predictive Architecture for Efficient 3D Representation Learning from 2D Images
CrossJEPA: Cross-Modal Joint-Embedding Predictive Architecture for Efficient 3D Representation Learning from 2D Images
Avishka Perera
Kumal Hewagamage
Saeedha Nazar
Kavishka Abeywardana
Hasitha Gallella
Ranga Rodrigo
Mohamed Afham
3DV
176
0
0
23 Nov 2025
Self-Correction Distillation for Structured Data Question Answering
Self-Correction Distillation for Structured Data Question Answering
Yushan Zhu
Wen Zhang
Long Jin
Mengshu Sun
Ling Zhong
...
Juan-Zi Li
Lei Liang
Chong Long
Chao Deng
Junlan Feng
209
0
0
11 Nov 2025
SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model
SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model
Lin Lin
Jiefeng Long
Zhihe Wan
Y. Wang
Dingkang Yang
...
Yan Qiu
Haiyang Yu
Xiao Liang
Hongsheng Li
Chao Feng
248
3
0
14 Oct 2025
CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
Weihuang Lin
Yiwei Ma
Jinfa Huang
Xiaoshuai Sun
Rongrong Ji
LRM
147
0
0
09 Oct 2025
HLFormer: Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning
HLFormer: Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning
Jun Li
Jinpeng Wang
Chaolei Tan
Niu Lian
Long Chen
Yaowei Wang
Min Zhang
Shu-Tao Xia
Bin Chen
242
4
0
23 Jul 2025
DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval
DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval
Yuxin Yang
Yinan Zhou
Yuxin Chen
Ziqi Zhang
Zongyang Ma
...
Bing Li
Lin Song
Jun Gao
Peng Li
Weiming Hu
464
1
0
23 May 2025
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
Tiancheng Gu
Kaicheng Yang
Ziyong Feng
Xingjun Wang
Yanzhao Zhang
Dingkun Long
Yingda Chen
Weidong Cai
Jiankang Deng
VLM
903
35
0
24 Apr 2025
Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval
Fine-grained Textual Inversion Network for Zero-Shot Composed Image RetrievalAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2024
Haoqiang Lin
Haokun Wen
Xuemeng Song
Meng Liu
Yupeng Hu
Liqiang Nie
421
28
0
25 Mar 2025
Composed Multi-modal Retrieval: A Survey of Approaches and Applications
Composed Multi-modal Retrieval: A Survey of Approaches and Applications
Kun Zhang
Jingyu Li
Zhiyu Li
Jingjing Zhang
F. Li
...
Nan Chen
Lei Zhang
Yongdong Zhang
Zhendong Mao
S.Kevin Zhou
402
1
0
03 Mar 2025
A Comprehensive Survey on Composed Image Retrieval
A Comprehensive Survey on Composed Image Retrieval
Xuemeng Song
Haoqiang Lin
Haokun Wen
Bohan Hou
Mingzhu Xu
Liqiang Nie
479
7
0
19 Feb 2025
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for
  Training-Free Zero-Shot Composed Image Retrieval
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image RetrievalComputer Vision and Pattern Recognition (CVPR), 2024
Yuanmin Tang
Xiaoting Qin
Jing Zhang
Jing Yu
Gaopeng Gou
Gang Xiong
Qingwei Ling
Saravan Rajmohan
Dongmei Zhang
Qi Wu
LRM
400
11
0
15 Dec 2024
Pseudo-triplet Guided Few-shot Composed Image Retrieval
Pseudo-triplet Guided Few-shot Composed Image Retrieval
Bohan Hou
Haoqiang Lin
Haokun Wen
Meng Liu
Xuemeng Song
312
5
0
08 Jul 2024
Zero-shot Composed Image Retrieval Considering Query-target Relationship
  Leveraging Masked Image-text Pairs
Zero-shot Composed Image Retrieval Considering Query-target Relationship Leveraging Masked Image-text Pairs
Huaying Zhang
Rintaro Yanagi
Ren Togo
Takahiro Ogawa
Miki Haseyama
213
11
0
27 Jun 2024
Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed
  Image Retrieval
Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval
Young Kyun Jang
Dat Huynh
Ashish Shah
Wen-Kai Chen
Ser-Nam Lim
342
32
0
01 May 2024
Visual Delta Generator with Large Multi-modal Models for Semi-supervised
  Composed Image Retrieval
Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval
Young Kyun Jang
Donghyun Kim
Zihang Meng
Dat Huynh
Ser-Nam Lim
188
18
0
23 Apr 2024
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Kai Zhang
Yi Luan
Hexiang Hu
Kenton Lee
Siyuan Qiao
Wenhu Chen
Yu-Chuan Su
Ming-Wei Chang
VLMLRM
295
73
0
28 Mar 2024
Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval
Knowledge-Enhanced Dual-stream Zero-shot Composed Image RetrievalComputer Vision and Pattern Recognition (CVPR), 2024
Yuchen Suo
Fan Ma
Linchao Zhu
Yi Yang
238
42
0
24 Mar 2024
Image2Sentence based Asymmetrical Zero-shot Composed Image Retrieval
Image2Sentence based Asymmetrical Zero-shot Composed Image Retrieval
Yongchao Du
Min Wang
Wen-gang Zhou
Shuping Hui
Houqiang Li
149
18
0
03 Mar 2024
Language-only Efficient Training of Zero-shot Composed Image Retrieval
Language-only Efficient Training of Zero-shot Composed Image RetrievalComputer Vision and Pattern Recognition (CVPR), 2023
Geonmo Gu
Sanghyuk Chun
Wonjae Kim
Yoohoon Kang
Sangdoo Yun
352
31
0
04 Dec 2023
Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval
Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval
Junyang Chen
Hanjiang Lai
VLM
455
16
0
13 Nov 2023
Vision-by-Language for Training-Free Compositional Image Retrieval
Vision-by-Language for Training-Free Compositional Image Retrieval
Shyamgopal Karthik
Karsten Roth
Goran Frehse
Zeynep Akata
CoGe
367
88
0
13 Oct 2023
Learning Interactive Real-World Simulators
Learning Interactive Real-World SimulatorsInternational Conference on Learning Representations (ICLR), 2023
Mengjiao Yang
Yilun Du
Kamyar Ghasemipour
Jonathan Tompson
Leslie Kaelbling
Dale Schuurmans
Pieter Abbeel
LM&RoPINN
345
330
0
09 Oct 2023
Context-I2W: Mapping Images to Context-dependent Words for Accurate
  Zero-Shot Composed Image Retrieval
Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image RetrievalAAAI Conference on Artificial Intelligence (AAAI), 2023
Yuanmin Tang
Jiahao Yu
Keke Gai
Jiamin Zhuang
Gang Xiong
Yue Hu
Qi Wu
203
54
0
28 Sep 2023
GeneCIS: A Benchmark for General Conditional Image Similarity
GeneCIS: A Benchmark for General Conditional Image SimilarityComputer Vision and Pattern Recognition (CVPR), 2023
S. Vaze
Nicolas Carion
Ishan Misra
VLMDiffM
247
40
0
13 Jun 2023
Zero-Shot Composed Image Retrieval with Textual Inversion
Zero-Shot Composed Image Retrieval with Textual InversionIEEE International Conference on Computer Vision (ICCV), 2023
Alberto Baldrati
Lorenzo Agnolucci
Marco Bertini
Marco Bertini
278
160
0
27 Mar 2023
CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion
CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion
Geonmo Gu
Sanghyuk Chun
Wonjae Kim
HeeJae Jun
Yoohoon Kang
Sangdoo Yun
DiffM
550
77
0
21 Mar 2023
Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image
  Retrieval
Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image RetrievalComputer Vision and Pattern Recognition (CVPR), 2023
Kuniaki Saito
Kihyuk Sohn
Xiang Zhang
Chun-Liang Li
Chen-Yu Lee
Kate Saenko
Tomas Pfister
308
166
0
06 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsInternational Conference on Machine Learning (ICML), 2023
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
1.3K
6,661
0
30 Jan 2023
Self-Supervised Learning from Images with a Joint-Embedding Predictive
  Architecture
Self-Supervised Learning from Images with a Joint-Embedding Predictive ArchitectureComputer Vision and Pattern Recognition (CVPR), 2023
Mahmoud Assran
Quentin Duval
Ishan Misra
Piotr Bojanowski
Pascal Vincent
Michael G. Rabbat
Yann LeCun
Nicolas Ballas
SSLAI4TSMDE
465
569
0
19 Jan 2023
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot LearningNeural Information Processing Systems (NeurIPS), 2022
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
695
4,861
0
29 Apr 2022
Conditional Prompt Learning for Vision-Language Models
Conditional Prompt Learning for Vision-Language ModelsComputer Vision and Pattern Recognition (CVPR), 2022
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VLMCLIPVPVLM
508
1,867
0
10 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and GenerationInternational Conference on Machine Learning (ICML), 2022
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLMBDLVLMCLIP
1.3K
5,760
0
28 Jan 2022
High-Resolution Image Synthesis with Latent Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2021
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
DiffM
3.0K
21,096
0
20 Dec 2021
High Fidelity Visualization of What Your Self-Supervised Representation
  Knows About
High Fidelity Visualization of What Your Self-Supervised Representation Knows About
Florian Bordes
Randall Balestriero
Pascal Vincent
DiffM
260
71
0
16 Dec 2021
SimMIM: A Simple Framework for Masked Image Modeling
SimMIM: A Simple Framework for Masked Image Modeling
Zhenda Xie
Zheng Zhang
Yue Cao
Yutong Lin
Jianmin Bao
Zhuliang Yao
Jingdong Sun
Han Hu
433
1,637
0
18 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision LearnersComputer Vision and Pattern Recognition (CVPR), 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViTTPM
2.5K
10,037
0
11 Nov 2021
Finetuned Language Models Are Zero-Shot Learners
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALMUQCV
1.7K
4,618
0
03 Sep 2021
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language
  Models
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language ModelsIEEE International Conference on Computer Vision (ICCV), 2021
Zheyuan Liu
Cristian Rodriguez-Opazo
Damien Teney
Stephen Gould
VLM
296
285
0
09 Aug 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language SupervisionInternational Conference on Machine Learning (ICML), 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
2.0K
41,259
0
26 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
1.4K
55,030
0
22 Oct 2020
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution
  Generalization
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
Dan Hendrycks
Steven Basart
Norman Mu
Saurav Kadavath
Frank Wang
...
Samyak Parajuli
Mike Guo
Basel Alomair
Jacob Steinhardt
Justin Gilmer
OOD
991
2,103
0
29 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
2.0K
52,526
0
28 May 2020
ReZero is All You Need: Fast Convergence at Large Depth
ReZero is All You Need: Fast Convergence at Large DepthConference on Uncertainty in Artificial Intelligence (UAI), 2020
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
AI4CE
363
326
0
10 Mar 2020
Dream to Control: Learning Behaviors by Latent Imagination
Dream to Control: Learning Behaviors by Latent ImaginationInternational Conference on Learning Representations (ICLR), 2019
Danijar Hafner
Timothy Lillicrap
Jimmy Ba
Mohammad Norouzi
VLM
580
1,613
0
03 Dec 2019
Composing Text and Image for Image Retrieval - An Empirical Odyssey
Composing Text and Image for Image Retrieval - An Empirical Odyssey
Nam S. Vo
Lu Jiang
Chen Sun
Kevin Patrick Murphy
Li Li
Li Fei-Fei
James Hays
CoGe
208
423
0
18 Dec 2018
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in ContextEuropean Conference on Computer Vision (ECCV), 2014
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
17.8K
49,453
0
01 May 2014
1