Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.19312
Cited By
DocMMIR: A Framework for Document Multi-modal Information Retrieval
25 May 2025
Zirui Li
Siwei Wu
Xingyu Wang
Yi Zhou
Yizhi Li
Chenghua Lin
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DocMMIR: A Framework for Document Multi-modal Information Retrieval"
13 / 13 papers shown
Title
Qwen3 Technical Report
An Yang
A. Li
Baosong Yang
Beichen Zhang
Binyuan Hui
...
Zekun Wang
Zeyu Cui
Zhenru Zhang
Zhenhong Zhou
Zihan Qiu
LLMAG
OSLM
LRM
68
35
0
14 May 2025
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Michael Tschannen
A. Gritsenko
Xiao Wang
Muhammad Ferjad Naeem
Ibrahim Alabdulmohsin
...
Basil Mustafa
Olivier J. Hénaff
Jeremiah Harmsen
Andreas Steiner
Xiaohua Zhai
VLM
103
54
0
21 Feb 2025
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Ziyan Jiang
Rui Meng
Xinyi Yang
Semih Yavuz
Yingbo Zhou
Wenhu Chen
MLLM
VLM
108
24
0
03 Jan 2025
Unified Multi-Modal Interleaved Document Representation for Information Retrieval
Jaewoo Lee
Joonho Ko
Jinheon Baek
Soyeong Jeong
Sung Ju Hwang
66
0
0
03 Oct 2024
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
Yubo Ma
Yuhang Zang
Liangyu Chen
Meiqi Chen
Yizhu Jiao
...
Liangming Pan
Yu-Gang Jiang
Jiaqi Wang
Yixin Cao
Aixin Sun
ELM
RALM
VLM
57
27
0
01 Jul 2024
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
Siwei Wu
Yizhi Li
Kang Zhu
Ge Zhang
Yiming Liang
...
Wenhu Chen
Wenhao Huang
Noura Al Moubayed
Jie Fu
Chenghua Lin
56
13
0
24 Jan 2024
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Cong Wei
Yang Chen
Haonan Chen
Hexiang Hu
Ge Zhang
Jie Fu
Alan Ritter
Wenhu Chen
56
63
0
28 Nov 2023
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
70
1,076
0
27 Mar 2023
Reproducible scaling laws for contrastive language-image learning
Mehdi Cherti
Romain Beaumont
Ross Wightman
Mitchell Wortsman
Gabriel Ilharco
Cade Gordon
Christoph Schuhmann
Ludwig Schmidt
J. Jitsev
VLM
CLIP
103
776
0
14 Dec 2022
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
Krishna Srinivasan
K. Raman
Jiecao Chen
Michael Bendersky
Marc Najork
VLM
239
313
0
02 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
673
28,659
0
26 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
390
3,778
0
11 Feb 2021
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
246
43,290
0
01 May 2014
1