ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.00653
  4. Cited By
Reformulating Vision-Language Foundation Models and Datasets Towards
  Universal Multimodal Assistants

Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

1 October 2023
Tianyu Yu
Jinyi Hu
Yuan Yao
Haoye Zhang
Yue Zhao
Chongyi Wang
Shanonan Wang
Yinxv Pan
Jiao Xue
Dahai Li
Zhiyuan Liu
Hai-Tao Zheng
Maosong Sun
    VLM
    MLLM
ArXivPDFHTML

Papers citing "Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants"

19 / 19 papers shown
Title
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
Kesen Zhao
B. Zhu
Qianru Sun
Hanwang Zhang
MLLM
LRM
86
0
0
25 Apr 2025
Megrez-Omni Technical Report
Boxun Li
Yadong Li
Z. Li
Congyi Liu
Weilin Liu
...
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Y. Wang
44
0
0
19 Feb 2025
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Shuo Xing
Yuping Wang
Peiran Li
Ruizheng Bai
Y. Wang
Chengxuan Qian
Huaxiu Yao
Zhengzhong Tu
87
6
0
18 Feb 2025
Learning to Correction: Explainable Feedback Generation for Visual
  Commonsense Reasoning Distractor
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor
Jiali Chen
Xusen Hei
Yuqi Xue
Yuancheng Wei
Jiayuan Xie
Yi Cai
Qing Li
MLLM
LRM
72
4
0
08 Dec 2024
From Generalist to Specialist: Adapting Vision Language Models via
  Task-Specific Visual Instruction Tuning
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning
Yang Bai
Yang Zhou
Jun Zhou
Rick Siow Mong Goh
Daniel Ting
Yong Liu
VLM
44
0
0
09 Oct 2024
Understanding Multimodal Hallucination with Parameter-Free
  Representation Alignment
Understanding Multimodal Hallucination with Parameter-Free Representation Alignment
Yueqian Wang
Jianxin Liang
Yuxuan Wang
Huishuai Zhang
Dongyan Zhao
41
1
0
02 Sep 2024
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for
  Multimodal Large Language Models
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
Yue Zhang
Hehe Fan
Yi Yang
43
3
0
24 May 2024
Hallucination of Multimodal Large Language Models: A Survey
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
86
139
0
29 Apr 2024
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback
Wenyi Xiao
Ziwei Huang
Leilei Gan
Wanggui He
Haoyuan Li
Zhelun Yu
Hao Jiang
Fei Wu
Linchao Zhu
MLLM
37
22
0
22 Apr 2024
Exploring Perceptual Limitation of Multimodal Large Language Models
Exploring Perceptual Limitation of Multimodal Large Language Models
Jiarui Zhang
Jinyi Hu
Mahyar Khayatkhoei
Filip Ilievski
Maosong Sun
LRM
29
10
0
12 Feb 2024
ViGoR: Improving Visual Grounding of Large Vision Language Models with
  Fine-Grained Reward Modeling
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
Siming Yan
Min Bai
Weifeng Chen
Xiong Zhou
Qixing Huang
Erran L. Li
VLM
21
18
0
09 Feb 2024
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from
  Fine-grained Correctional Human Feedback
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
M. Steyvers
Yuan Yao
Haoye Zhang
Taiwen He
Yifeng Han
...
Xinyue Hu
Zhiyuan Liu
Hai-Tao Zheng
Maosong Sun
Tat-Seng Chua
MLLM
VLM
130
177
0
01 Dec 2023
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across
  Languages
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Jinyi Hu
Yuan Yao
Chong Wang
Shanonan Wang
Yinxu Pan
...
Yankai Lin
Jiao Xue
Dahai Li
Zhiyuan Liu
Maosong Sun
MLLM
VLM
29
48
0
23 Aug 2023
MESED: A Multi-modal Entity Set Expansion Dataset with Fine-grained
  Semantic Classes and Hard Negative Entities
MESED: A Multi-modal Entity Set Expansion Dataset with Fine-grained Semantic Classes and Hard Negative Entities
Y. Li
Tingwei Lu
Yinghui Li
Tianyu Yu
Shulin Huang
Haitao Zheng
Rui Zhang
Jun Yuan
37
11
0
27 Jul 2023
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language
  Models
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu
Peixian Chen
Yunhang Shen
Yulei Qin
Mengdan Zhang
...
Xiawu Zheng
Ke Li
Xing Sun
Zhenyu Qiu
Rongrong Ji
ELM
MLLM
39
759
0
23 Jun 2023
mPLUG-Owl: Modularization Empowers Large Language Models with
  Multimodality
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
206
900
0
27 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
265
4,229
0
30 Jan 2023
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,081
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,693
0
11 Feb 2021
1