ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.16915
  4. Cited By
Multilingual Diversity Improves Vision-Language Representations

Multilingual Diversity Improves Vision-Language Representations

27 May 2024
Thao Nguyen
Matthew Wallingford
Sebastin Santy
Wei-Chiu Ma
Sewoong Oh
Ludwig Schmidt
Pang Wei Koh
Ranjay Krishna
    VLM
ArXivPDFHTML

Papers citing "Multilingual Diversity Improves Vision-Language Representations"

8 / 8 papers shown
Title
Evaluating Vision Language Model Adaptations for Radiology Report Generation in Low-Resource Languages
Evaluating Vision Language Model Adaptations for Radiology Report Generation in Low-Resource Languages
Marco Salmè
R. Sicilia
Paolo Soda
V. Guarrasi
53
0
0
02 May 2025
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities
Zheyuan Zhang
Fengyuan Hu
Jayjun Lee
Freda Shi
Parisa Kordjamshidi
Joyce Chai
Ziqiao Ma
48
11
0
22 Oct 2024
Semantic and Expressive Variation in Image Captions Across Languages
Semantic and Expressive Variation in Image Captions Across Languages
Andre Ye
Sebastin Santy
Jena D. Hwang
Amy X. Zhang
Ranjay Krishna
VLM
46
3
0
22 Oct 2023
Does Progress On Object Recognition Benchmarks Improve Real-World
  Generalization?
Does Progress On Object Recognition Benchmarks Improve Real-World Generalization?
Megan Richards
Polina Kirichenko
Diane Bouchacourt
Mark Ibrahim
VLM
64
11
0
24 Jul 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning
Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning
Da Yin
Liunian Harold Li
Ziniu Hu
Nanyun Peng
Kai-Wei Chang
83
52
0
14 Sep 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,077
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
1