ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.08802
  4. Cited By
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
v1v2 (latest)

jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images

11 December 2024
Andreas Koukounas
Georgios Mastrapas
Bo Wang
Mohammad Kalim Akram
Sedigheh Eslami
Michael Gunther
Isabelle Mohr
Saba Sturua
Scott Martens
Nan Wang
    VLM
ArXiv (abs)PDFHTMLHuggingFace (5 upvotes)

Papers citing "jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images"

50 / 51 papers shown
Title
One-Step Diffusion Transformer for Controllable Real-World Image Super-Resolution
One-Step Diffusion Transformer for Controllable Real-World Image Super-Resolution
Yushun Fang
Yuxiang Chen
S. Yin
Qiang Hu
Jiangchao Yao
Ya Zhang
Xiaoyun Zhang
Y. Wang
168
0
0
21 Nov 2025
Shortcutting Pre-trained Flow Matching Diffusion Models is Almost Free Lunch
Shortcutting Pre-trained Flow Matching Diffusion Models is Almost Free Lunch
Xu Cai
Yang Wu
Qianli Chen
Haoran Wu
Lichuan Xiang
Hongkai Wen
76
0
0
15 Oct 2025
PRISM: Product Retrieval In Shopping Carts using Hybrid Matching
PRISM: Product Retrieval In Shopping Carts using Hybrid Matching
Arda Kabadayi
Senem Velipasalar
Jiajing Chen
64
0
0
18 Sep 2025
UniFGVC: Universal Training-Free Few-Shot Fine-Grained Vision Classification via Attribute-Aware Multimodal Retrieval
UniFGVC: Universal Training-Free Few-Shot Fine-Grained Vision Classification via Attribute-Aware Multimodal Retrieval
Hongyu Guo
Kuan Zhu
Xiangzhao Hao
Haiyun Guo
Ming Tang
Jinqiao Wang
VLM
84
0
0
06 Aug 2025
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
Meishan Zhang
Xin Zhang
X. Zhao
Shouzheng Huang
Baotian Hu
Min Zhang
169
3
0
28 Jul 2025
Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs
Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs
Hao Wang
Pinzhi Huang
Jihan Yang
Saining Xie
Daisuke Kawahara
403
1
0
21 May 2025
Boosting Text-to-Chart Retrieval through Training with Synthesized Semantic Insights
Boosting Text-to-Chart Retrieval through Training with Synthesized Semantic Insights
Yifan Wu
Lutao Yan
Yizhang Zhu
Yinan Mei
Jiannan Wang
Nan Tang
Yuyu Luo
356
4
0
15 May 2025
One Pic is All it Takes: Poisoning Visual Document Retrieval Augmented Generation with a Single Image
One Pic is All it Takes: Poisoning Visual Document Retrieval Augmented Generation with a Single Image
Ezzeldin Shereen
Dan Ristea
Shae McFadden
Shae McFadden
V. Mavroudis
Chris Hicks
424
1
0
02 Apr 2025
Exploring Multimodal Perception in Large Language Models Through Perceptual Strength Ratings
Exploring Multimodal Perception in Large Language Models Through Perceptual Strength RatingsIEEE Access (IEEE Access), 2025
Jonghyun Lee
Dojun Park
Jiwoo Lee
Hoekeon Choi
Sung-Eun Lee
236
3
0
10 Mar 2025
MOHPER: Multi-objective Hyperparameter Optimization Framework for E-commerce Retrieval System
MOHPER: Multi-objective Hyperparameter Optimization Framework for E-commerce Retrieval System
Jungbae Park
Heonseok Jang
314
0
0
07 Mar 2025
Towards Text-Image Interleaved Retrieval
Towards Text-Image Interleaved RetrievalAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Xin Zhang
Ziqi Dai
Yuchen Ren
Yanzhao Zhang
Dingkun Long
Pengjun Xie
Hao Fei
Jun Yu
Wenjie Li
Min Zhang
145
1
0
18 Feb 2025
MRAMG-Bench: A Comprehensive Benchmark for Advancing Multimodal Retrieval-Augmented Multimodal Generation
MRAMG-Bench: A Comprehensive Benchmark for Advancing Multimodal Retrieval-Augmented Multimodal GenerationAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Qinhan Yu
Zhiyou Xiao
Binghui Li
Zhengren Wang
Chong Chen
Feiyu Xiong
RALMVLM
768
1
0
06 Feb 2025
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMsInternational Conference on Learning Representations (ICLR), 2024
Sheng-Chieh Lin
Chankyu Lee
Mohammad Shoeybi
Jimmy J. Lin
Bryan Catanzaro
Ming-Yu Liu
609
68
0
04 Nov 2024
mGTE: Generalized Long-Context Text Representation and Reranking Models
  for Multilingual Text Retrieval
mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text RetrievalConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xin Zhang
Yanzhao Zhang
Dingkun Long
Wen Xie
Ziqi Dai
...
Pengjun Xie
Fei Huang
Meishan Zhang
Wenjie Li
Min Zhang
270
214
0
29 Jul 2024
Mitigate the Gap: Investigating Approaches for Improving Cross-Modal
  Alignment in CLIP
Mitigate the Gap: Investigating Approaches for Improving Cross-Modal Alignment in CLIP
Sedigheh Eslami
Gerard de Melo
VLM
244
12
0
25 Jun 2024
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Simon Schrodi
David T. Hoffmann
Max Argus
Volker Fischer
Thomas Brox
VLM
374
4
0
11 Apr 2024
Long-CLIP: Unlocking the Long-Text Capability of CLIP
Long-CLIP: Unlocking the Long-Text Capability of CLIPEuropean Conference on Computer Vision (ECCV), 2024
Beichen Zhang
Pan Zhang
Xiao-wen Dong
Yuhang Zang
Yuan Liu
CLIPVLM
362
256
0
22 Mar 2024
Multilingual E5 Text Embeddings: A Technical Report
Multilingual E5 Text Embeddings: A Technical Report
Liang Wang
Nan Yang
Xiaolong Huang
Linjun Yang
Rangan Majumder
Furu Wei
141
299
0
08 Feb 2024
M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Jianlv Chen
Shitao Xiao
Peitian Zhang
Kun Luo
Defu Lian
Zheng Liu
886
893
0
05 Feb 2024
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
ShareGPT4V: Improving Large Multi-Modal Models with Better CaptionsEuropean Conference on Computer Vision (ECCV), 2023
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Conghui He
Yuan Liu
Feng Zhao
Dahua Lin
MLLMVLM
307
907
0
21 Nov 2023
Mistral 7B
Mistral 7B
Albert Q. Jiang
Alexandre Sablayrolles
A. Mensch
Chris Bamford
Devendra Singh Chaplot
...
Teven Le Scao
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoELRM
330
2,857
0
10 Oct 2023
NLLB-CLIP -- train performant multilingual image retrieval model on a
  budget
NLLB-CLIP -- train performant multilingual image retrieval model on a budget
Alexander Visheratin
VLM
316
24
0
04 Sep 2023
SciGraphQA: A Large-Scale Synthetic Multi-Turn Question-Answering
  Dataset for Scientific Graphs
SciGraphQA: A Large-Scale Synthetic Multi-Turn Question-Answering Dataset for Scientific Graphs
Sheng Li
Nima Tajbakhsh
MLLM
167
68
0
07 Aug 2023
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model DesignNeural Information Processing Systems (NeurIPS), 2023
Ibrahim Alabdulmohsin
Xiaohua Zhai
Alexander Kolesnikov
Lucas Beyer
VLM
470
85
0
22 May 2023
Rethinking Benchmarks for Cross-modal Image-text Retrieval
Rethinking Benchmarks for Cross-modal Image-text RetrievalAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023
Wei Chen
Linli Yao
Qin Jin
VLM
190
22
0
21 Apr 2023
EVA-CLIP: Improved Training Techniques for CLIP at Scale
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Quan-Sen Sun
Yuxin Fang
Ledell Yu Wu
Xinlong Wang
Yue Cao
CLIPVLM
643
699
0
27 Mar 2023
Sigmoid Loss for Language Image Pre-Training
Sigmoid Loss for Language Image Pre-TrainingIEEE International Conference on Computer Vision (ICCV), 2023
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIPVLM
1.1K
2,064
0
27 Mar 2023
EVA-02: A Visual Representation for Neon Genesis
EVA-02: A Visual Representation for Neon GenesisImage and Vision Computing (IVC), 2023
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLMViTCLIP
344
383
0
20 Mar 2023
Towards Complex Document Understanding By Discrete Reasoning
Towards Complex Document Understanding By Discrete ReasoningACM Multimedia (ACM MM), 2022
Fengbin Zhu
Wenqiang Lei
Fuli Feng
Chao Wang
Haozhou Zhang
Tat-Seng Chua
223
81
0
25 Jul 2022
No Language Left Behind: Scaling Human-Centered Machine Translation
No Language Left Behind: Scaling Human-Centered Machine Translation
Nllb team
Marta R. Costa-jussá
James Cross
Onur cCelebi
Maha Elbayad
...
Alexandre Mourachko
C. Ropers
Safiyyah Saleem
Holger Schwenk
Jeff Wang
MoE
517
1,546
0
11 Jul 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessNeural Information Processing Systems (NeurIPS), 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
759
3,156
0
27 May 2022
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation DatasetConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Ashish V. Thapliyal
Jordi Pont-Tuset
Xi Chen
Radu Soricut
VGen
391
102
0
25 May 2022
Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive
  Representation Learning
Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation LearningNeural Information Processing Systems (NeurIPS), 2022
Weixin Liang
Yuhui Zhang
Yongchan Kwon
Serena Yeung
James Zou
VLM
349
575
0
03 Mar 2022
LiT: Zero-Shot Transfer with Locked-image text Tuning
LiT: Zero-Shot Transfer with Locked-image text TuningComputer Vision and Pattern Recognition (CVPR), 2021
Xiaohua Zhai
Tianlin Li
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
VLM
544
653
0
15 Nov 2021
Align before Fuse: Vision and Language Representation Learning with
  Momentum Distillation
Align before Fuse: Vision and Language Representation Learning with Momentum DistillationNeural Information Processing Systems (NeurIPS), 2021
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
733
2,404
0
16 Jul 2021
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language ModelsInternational Conference on Learning Representations (ICLR), 2021
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRLAI4TSAI4CEALMAIMat
1.5K
14,676
0
17 Jun 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
717
3,733
0
20 Apr 2021
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual
  Machine Learning
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine LearningAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2021
Krishna Srinivasan
K. Raman
Jiecao Chen
Michael Bendersky
Marc Najork
VLM
450
380
0
02 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language SupervisionInternational Conference on Machine Learning (ICML), 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
1.9K
39,712
0
26 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text SupervisionInternational Conference on Machine Learning (ICML), 2021
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLMCLIP
1.2K
4,753
0
11 Feb 2021
Understanding the Behaviour of Contrastive Loss
Understanding the Behaviour of Contrastive LossComputer Vision and Pattern Recognition (CVPR), 2020
Feng Wang
Huaping Liu
SSL
381
788
0
15 Dec 2020
Towards Zero-shot Cross-lingual Image Retrieval
Towards Zero-shot Cross-lingual Image Retrieval
Pranav Aggarwal
Ajinkya Kale
VLM
208
29
0
24 Nov 2020
RocketQA: An Optimized Training Approach to Dense Passage Retrieval for
  Open-Domain Question Answering
RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question AnsweringNorth American Chapter of the Association for Computational Linguistics (NAACL), 2020
Yingqi Qu
Yuchen Ding
Jing Liu
Kai Liu
Ruiyang Ren
Xin Zhao
Daxiang Dong
Hua Wu
Haifeng Wang
RALMOffRL
425
673
0
16 Oct 2020
Contrastive Learning of Medical Visual Representations from Paired
  Images and Text
Contrastive Learning of Medical Visual Representations from Paired Images and TextMachine Learning in Health Care (MLHC), 2020
Yuhao Zhang
Hang Jiang
Yasuhide Miura
Christopher D. Manning
C. Langlotz
MedIm
566
921
0
02 Oct 2020
DocVQA: A Dataset for VQA on Document Images
DocVQA: A Dataset for VQA on Document Images
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
554
1,048
0
01 Jul 2020
A Simple Framework for Contrastive Learning of Visual Representations
A Simple Framework for Contrastive Learning of Visual RepresentationsInternational Conference on Machine Learning (ICML), 2020
Ting-Li Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
SSL
1.1K
21,918
0
13 Feb 2020
Unsupervised Cross-lingual Representation Learning at Scale
Unsupervised Cross-lingual Representation Learning at ScaleAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Alexis Conneau
Kartikay Khandelwal
Naman Goyal
Vishrav Chaudhary
Guillaume Wenzek
Francisco Guzmán
Edouard Grave
Myle Ott
Luke Zettlemoyer
Veselin Stoyanov
421
7,457
0
05 Nov 2019
Learning Dense Representations for Entity Retrieval
Learning Dense Representations for Entity RetrievalConference on Computational Natural Language Learning (CoNLL), 2019
D. Gillick
Sayali Kulkarni
L. Lansing
Alessandro Presta
Jason Baldridge
Eugene Ie
Diego Garcia-Olano
RALM
236
216
0
23 Sep 2019
Representation Learning with Contrastive Predictive Coding
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord
Yazhe Li
Oriol Vinyals
DRLSSL
1.6K
11,962
0
10 Jul 2018
Bridge Correlational Neural Networks for Multilingual Multimodal
  Representation Learning
Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning
Janarthanan Rajendran
Mitesh M. Khapra
A. Chandar
Balaraman Ravindran
242
57
0
13 Oct 2015
12
Next