Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2004.01804
Cited By
v1
v2 (latest)
Google Landmarks Dataset v2 -- A Large-Scale Benchmark for Instance-Level Recognition and Retrieval
Computer Vision and Pattern Recognition (CVPR), 2020
3 April 2020
Tobias Weyand
A. Araújo
Bingyi Cao
Jack Sim
Re-assign community
ArXiv (abs)
PDF
HTML
Github (794★)
Papers citing
"Google Landmarks Dataset v2 -- A Large-Scale Benchmark for Instance-Level Recognition and Retrieval"
50 / 246 papers shown
Beyond Generation: Multi-Hop Reasoning for Factual Accuracy in Vision-Language Models
Shamima Hossain
LRM
176
0
0
25 Nov 2025
LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight
Yunze Man
S. S. Wang
Guowen Zhang
Johan Bjorck
Zhiqi Li
Liang-Yan Gui
Jim Fan
Jan Kautz
Yu Wang
Zhiding Yu
121
0
0
25 Nov 2025
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization
Y. Wang
Z. Liu
Ziyi Wang
Pengfei Liu
Han Hu
Yongming Rao
LRM
408
0
0
19 Nov 2025
Real-World Adverse Weather Image Restoration via Dual-Level Reinforcement Learning with High-Quality Cold Start
Fuyang Liu
Jiaqi Xu
Xiaowei Hu
AI4CE
132
0
0
07 Nov 2025
Instance-Level Composed Image Retrieval
Bill Psomas
George Retsinas
Nikos Efthymiadis
P. Filntisis
Yannis Avrithis
Petros Maragos
Ondřej Chum
Giorgos Tolias
163
1
0
29 Oct 2025
FineVision: Open Data Is All You Need
Luis Wiedmann
Orr Zohar
Amir Mahla
Xiaohan Wang
Rui Li
Thibaud Frere
Leandro von Werra
Aritra Roy Gosthipaty
Andrés Marafioti
VLM
195
13
0
20 Oct 2025
CaMiT: A Time-Aware Car Model Dataset for Classification and Generation
Frédéric LIN
Biruk Abere Ambaw
Adrian Daniel Popescu
Hejer Ammar
Romaric Audigier
Hervé Le Borgne
VLM
AI4TS
288
0
0
20 Oct 2025
An Experimental Study of Real-Life LLM-Proposed Performance Improvements
Lirong Yi
Gregory Gay
Philipp Leitner
82
2
0
17 Oct 2025
EgMM-Corpus: A Multimodal Vision-Language Dataset for Egyptian Culture
Mohamed Gamil
Abdelrahman Elsayed
Abdelrahman Lila
Ahmed Gad
Hesham Abdelgawad
Mohamed Aref
Ahmed Fares
VLM
92
0
0
17 Oct 2025
Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering
Yuyang Hong
Jiaqi Gu
Qi Yang
Lubin Fan
Yue-bo Wu
Ying Wang
Kun Ding
Shiming Xiang
Jieping Ye
202
3
0
16 Oct 2025
Seeing and Knowing in the Wild: Open-domain Visual Entity Recognition with Large-scale Knowledge Graphs via Contrastive Learning
Hongkuan Zhou
Lavdim Halilaj
Sebastian Monka
Stefan Schmid
Yuqicheng Zhu
Jingcheng Wu
Nadeem Nazer
Steffen Staab
VLM
137
0
0
15 Oct 2025
Where on Earth? A Vision-Language Benchmark for Probing Model Geolocation Skills Across Scales
Zhaofang Qian
Hardy Chen
Zeyu Wang
Li Zhang
Zijun Wang
...
Xianfeng Tang
Zeyu Zheng
Haoqin Tu
Cihang Xie
Yuyin Zhou
LRM
102
1
0
13 Oct 2025
Instance-Level Generation for Representation Learning
Yankun Wu
Zakaria Laskar
Giorgos Kordopatis-Zilos
Noa Garcia
Giorgos Tolias
141
0
0
10 Oct 2025
MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning
Tajamul Ashraf
Umair Nawaz
Abdelrahman M. Shaker
Rao Muhammad Anwer
Philip Torr
Fahad Shahbaz Khan
Salman Khan
227
0
0
09 Oct 2025
The Overlooked Value of Test-time Reference Sets in Visual Place Recognition
Mubariz Zaffar
Liangliang Nan
Sebastian Scherer
Julian F. P. Kooij
113
0
0
04 Oct 2025
UniMapGen: A Generative Framework for Large-Scale Map Construction from Multi-modal Data
Yujian Yuan
Changjie Wu
Xinyuan Chang
S. Wang
Hang Zhang
Shiyi Liang
Shuang Zeng
Mu Xu
Ning Guo
156
2
0
26 Sep 2025
EfficientDepth: A Fast and Detail-Preserving Monocular Depth Estimation Model
Andrii Litvynchuk
Ivan Livinsky
Anand Ravi
N. Kalantari
Andrii Tsarov
MDE
182
0
0
26 Sep 2025
MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks
Mingsong Li
Lin Liu
Hongjun Wang
Haoxing Chen
Xijun Gu
Shizhan Liu
Dong Gong
Junbo Zhao
Zhenzhong Lan
Jianguo Li
144
0
0
18 Sep 2025
Improving Alignment in LVLMs with Debiased Self-Judgment
Sihan Yang
Chenhang Cui
Zihao Zhao
Yiyang Zhou
Weilong Yan
Ying Wei
Huaxiu Yao
217
0
0
28 Aug 2025
Assessing the Geolocation Capabilities, Limitations and Societal Risks of Generative Vision-Language Models
Oliver Grainge
Sania Waheed
Jack Stilgoe
Michael Milford
Shoaib Ehsan
90
1
0
27 Aug 2025
Can VLMs Recall Factual Associations From Visual References?
Dhananjay Ashok
Ashutosh Chaubey
Hirona J. Arai
Jonathan May
Jesse Thomason
88
0
0
22 Aug 2025
PCHands: PCA-based Hand Pose Synergy Representation on Manipulators with N-DoF
En Yen Puang
Federico Ceola
Giulia Pasquale
Lorenzo Natale
110
0
0
11 Aug 2025
Large Language Models Facilitate Vision Reflection in Image Classification
Guoyuan An
JaeYoon Kim
SungEui Yoon
VLM
94
0
0
02 Aug 2025
Meta CLIP 2: A Worldwide Scaling Recipe
Yung-Sung Chuang
Yang Li
Dong Wang
Ching-Feng Yeh
Kehan Lyu
...
Zhuang Liu
Saining Xie
Anuj Kumar
Shang-Wen Li
Hu Xu
CLIP
VLM
369
16
0
29 Jul 2025
ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering
Duong T. Tran
T. Tran
M. Hauswirth
Danh Le-Phuoc
189
2
0
22 Jul 2025
UniLGL: Learning Uniform Place Recognition for FOV-limited/Panoramic LiDAR Global Localization
Hongming Shen
Xun Chen
Yulin Hui
Zhenyu Wu
Wei Wang
Qiyang Lyu
Tianchen Deng
Danwei W. Wang
253
2
0
16 Jul 2025
Flow-Anything: Learning Real-World Optical Flow Estimation from Large-Scale Single-view Images
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Yingping Liang
Ying Fu
Yutao Hu
Wenqi Shao
Jiaming Liu
Debing Zhang
155
3
0
09 Jun 2025
Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization
Jiulong Wu
Zhengliang Shi
Shuaiqiang Wang
J. Huang
Dawei Yin
Lingyong Yan
Min Cao
Min Zhang
MLLM
332
3
0
04 Jun 2025
UAVPairs: A Challenging Benchmark for Match Pair Retrieval of Large-scale UAV Images
Junhuan Liu
San Jiang
Wei Ge
Wei Huang
Bingxuan Guo
Qingquan Li
133
0
0
28 May 2025
VIBE: Vector Index Benchmark for Embeddings
Elias Jääsaari
Ville Hyvönen
Matteo Ceccarello
Teemu Roos
Martin Aumüller
VLM
343
2
0
23 May 2025
Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs
Hao Wang
Pinzhi Huang
Jihan Yang
Saining Xie
Daisuke Kawahara
474
1
0
21 May 2025
Beginning with You: Perceptual-Initialization Improves Vision-Language Representation and Alignment
Yang Hu
Runchen Wang
Stephen Chong Zhao
Xuhui Zhan
Do Hun Kim
Mark Wallace
David A. Tovar
VLM
279
0
0
20 May 2025
Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models
Maria-Teresa De Rosa Palmini
Eva Cetinic
262
0
0
18 May 2025
Artifacts of Idiosyncracy in Global Street View Data
Conference on Fairness, Accountability and Transparency (FAccT), 2025
Tim Alpherts
Sennay Ghebreab
Nanne van Noord
3DPC
184
1
0
16 May 2025
StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
Daniel A. P. Oliveira
David Martins de Matos
VGen
241
1
0
15 May 2025
OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Wei Yang
Jingjing Fu
Rongpin Wang
Jinyu Wang
Lei Song
Jiang Bian
351
5
0
10 May 2025
Learning Compatible Multi-Prize Subnetworks for Asymmetric Retrieval
Computer Vision and Pattern Recognition (CVPR), 2025
Yushuai Sun
Zikun Zhou
Shihong Deng
Yaowei Wang
Jun Yu
Guangming Lu
Wenjie Pei
263
0
0
16 Apr 2025
MIEB: Massive Image Embedding Benchmark
Chenghao Xiao
Isaac Chung
Imene Kerboua
Jamie Stirling
Xin Zhang
Márton Kardos
Roman Solomatin
Noura Al Moubayed
Kenneth Enevoldsen
Niklas Muennighoff
VLM
491
6
0
14 Apr 2025
Evolved Hierarchical Masking for Self-Supervised Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Zhanzhou Feng
Shiliang Zhang
370
1
0
12 Apr 2025
Boosting multi-demographic federated learning for chest radiograph analysis using general-purpose self-supervised representations
Mahshad Lotfinia
Arash Tayebiarasteh
Samaneh Samiei
Mehdi Joodaki
Soroosh Tayebi Arasteh
336
0
0
11 Apr 2025
Taxonomy-Aware Evaluation of Vision-Language Models
Computer Vision and Pattern Recognition (CVPR), 2025
Vésteinn Snæbjarnarson
Kevin Du
Niklas Stoehr
Serge Belongie
Robert Bamler
Nico Lang
Stella Frank
283
4
0
07 Apr 2025
LOCORE: Image Re-ranking with Long-Context Sequence Modeling
Computer Vision and Pattern Recognition (CVPR), 2025
Zilin Xiao
Pavel Suma
Ayush Sachdeva
Hao-Jen Wang
Giorgos Kordopatis-Zilos
Giorgos Tolias
Vicente Ordonez
264
2
0
27 Mar 2025
Vision as LoRA
Han Wang
Yongjie Ye
Bingru Li
Yuxiang Nie
Jinghui Lu
Jingqun Tang
Yanjie Wang
Can Huang
377
12
0
26 Mar 2025
Distilling Monocular Foundation Model for Fine-grained Depth Completion
Computer Vision and Pattern Recognition (CVPR), 2025
Yingping Liang
Yutao Hu
Wenqi Shao
Ying Fu
MDE
291
8
0
21 Mar 2025
Prototype Perturbation for Relaxing Alignment Constraints in Backward-Compatible Learning
Zikun Zhou
Yushuai Sun
Wenjie Pei
Xuzhao Li
Yaowei Wang
CLL
334
1
0
19 Mar 2025
RFMI: Estimating Mutual Information on Rectified Flow for Text-to-Image Alignment
Chao Wang
Giulio Franzese
A. Finamore
Pietro Michiardi
450
7
0
18 Mar 2025
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
Xinyu Ma
Ziyang Ding
Zhicong Luo
Chong Chen
Zonghao Guo
Yang Li
Xiaoyi Feng
Maosong Sun
VLM
LRM
296
13
0
17 Mar 2025
Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization
Mihcael Green
Matan Levy
Issar Tzachor
Dvir Samuel
N. Darshan
Rami Ben-Ari
267
0
0
10 Mar 2025
Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search
Daniel de Souza Severo
Giuseppe Ottaviano
Matthew Muckley
Karen Ullrich
Matthijs Douze
MQ
294
1
0
16 Jan 2025
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training
Xingyi He He
Hao Yu
Sida Peng
Dongli Tan
Zehong Shen
Hujun Bao
Xiaowei Zhou
291
22
0
13 Jan 2025
1
2
3
4
5
Next