ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.12038
  4. Cited By
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across
  Languages

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

23 August 2023
Jinyi Hu
Yuan Yao
Chong Wang
Shanonan Wang
Yinxu Pan
Qi-An Chen
Tianyu Yu
Han Wu
Yue Zhao
Haoye Zhang
Xu Han
Yankai Lin
Jiao Xue
Dahai Li
Zhiyuan Liu
Maosong Sun
    MLLM
    VLM
ArXivPDFHTML

Papers citing "Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages"

19 / 19 papers shown
Title
CAMeL: Cross-modality Adaptive Meta-Learning for Text-based Person Retrieval
CAMeL: Cross-modality Adaptive Meta-Learning for Text-based Person Retrieval
Hang Yu
Jiahao Wen
Zhedong Zheng
41
0
0
26 Apr 2025
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation
Shota Onohara
Atsuyuki Miyai
Yuki Imajuku
Kazuki Egashira
Jeonghun Baek
Xiang Yue
Graham Neubig
Kiyoharu Aizawa
OSLM
71
1
0
22 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
63
25
0
04 Oct 2024
T3: A Novel Zero-shot Transfer Learning Framework Iteratively Training on an Assistant Task for a Target Task
T3: A Novel Zero-shot Transfer Learning Framework Iteratively Training on an Assistant Task for a Target Task
Xindi Tong
Yujin Zhu
Shijian Fan
Liang Xu
54
1
0
26 Sep 2024
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
Chenglong Wang
Yang Gan
Yifu Huo
Yongyu Mu
Murun Yang
...
Chunliang Zhang
Tongran Liu
Quan Du
Di Yang
Jingbo Zhu
VLM
57
4
0
22 Aug 2024
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Kaichen Zhang
Bo Li
Peiyuan Zhang
Fanyi Pu
Joshua Adrian Cahyono
...
Shuai Liu
Yuanhan Zhang
Jingkang Yang
Chunyuan Li
Ziwei Liu
85
73
0
17 Jul 2024
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
Yusu Qian
Hanrong Ye
J. Fauconnier
Peter Grasch
Yinfei Yang
Zhe Gan
106
13
0
01 Jul 2024
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Jinsheng Huang
Liang Chen
Taian Guo
Fu Zeng
Yusheng Zhao
...
Wei Ju
Luchen Liu
Tianyu Liu
Baobao Chang
Ming Zhang
34
5
0
29 Jun 2024
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Jingqun Tang
Qi Liu
Yongjie Ye
Jinghui Lu
Shubo Wei
...
Yanjie Wang
Yuliang Liu
Hao Liu
Xiang Bai
Can Huang
32
21
0
20 May 2024
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from
  Fine-grained Correctional Human Feedback
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
M. Steyvers
Yuan Yao
Haoye Zhang
Taiwen He
Yifeng Han
...
Xinyue Hu
Zhiyuan Liu
Hai-Tao Zheng
Maosong Sun
Tat-Seng Chua
MLLM
VLM
130
176
0
01 Dec 2023
mPLUG-Owl: Modularization Empowers Large Language Models with
  Multimodality
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
203
883
0
27 Apr 2023
One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
Fan Bao
Shen Nie
Kaiwen Xue
Chongxuan Li
Shiliang Pu
Yaole Wang
Gang Yue
Yue Cao
Hang Su
Jun Zhu
DiffM
188
147
0
12 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Muse: Text-To-Image Generation via Masked Generative Transformers
Muse: Text-To-Image Generation via Masked Generative Transformers
Huiwen Chang
Han Zhang
Jarred Barber
AJ Maschinot
José Lezama
...
Kevin Patrick Murphy
William T. Freeman
Michael Rubinstein
Yuanzhen Li
Dilip Krishnan
DiffM
197
515
0
02 Jan 2023
Multilingual Multimodal Learning with Machine Translated Text
Multilingual Multimodal Learning with Machine Translated Text
Chen Qiu
Dan Oneaţă
Emanuele Bugliarello
Stella Frank
Desmond Elliott
38
13
0
24 Oct 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
382
4,010
0
28 Jan 2022
Visually Grounded Reasoning across Languages and Cultures
Visually Grounded Reasoning across Languages and Cultures
Fangyu Liu
Emanuele Bugliarello
E. Ponti
Siva Reddy
Nigel Collier
Desmond Elliott
VLM
LRM
92
167
0
28 Sep 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,077
0
17 Feb 2021
1