ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.19716
  4. Cited By
Enhancing Large Vision Language Models with Self-Training on Image
  Comprehension

Enhancing Large Vision Language Models with Self-Training on Image Comprehension

30 May 2024
Yihe Deng
Pan Lu
Fan Yin
Ziniu Hu
Sheng Shen
James Y. Zou
Kai-Wei Chang
Wei Wang
    SyDa
    VLM
    LRM
ArXivPDFHTML

Papers citing "Enhancing Large Vision Language Models with Self-Training on Image Comprehension"

37 / 37 papers shown
Title
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning
Xiaokun Wang
Chris
Jiangbo Pei
Wei Shen
Yi Peng
...
Ai Jian
Tianyidan Xie
Xuchen Song
Yang Liu
Yahui Zhou
OffRL
LRM
9
0
0
12 May 2025
Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning
Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning
Pengxiang Li
Zhi Gao
Bofei Zhang
Yapeng Mi
Xiaojian Ma
...
Tao Yuan
Yuwei Wu
Yunde Jia
Song-Chun Zhu
Qing Li
LLMAG
65
0
0
30 Apr 2025
Anyprefer: An Agentic Framework for Preference Data Synthesis
Anyprefer: An Agentic Framework for Preference Data Synthesis
Yiyang Zhou
Z. Wang
Tianle Wang
Shangyu Xing
Peng Xia
...
Chetan Bansal
Weitong Zhang
Ying Wei
Mohit Bansal
Huaxiu Yao
52
0
0
27 Apr 2025
VideoPASTA: 7K Preference Pairs That Matter for Video-LLM Alignment
VideoPASTA: 7K Preference Pairs That Matter for Video-LLM Alignment
Yogesh Kulkarni
Pooyan Fazli
27
0
0
18 Apr 2025
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
X. Wang
Z. Yang
Chao Feng
Hongjin Lu
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Furong Huang
Lijuan Wang
OODD
ReLM
VLM
LRM
57
1
0
10 Apr 2025
Enhancing Chart-to-Code Generation in Multimodal Large Language Models via Iterative Dual Preference Learning
Enhancing Chart-to-Code Generation in Multimodal Large Language Models via Iterative Dual Preference Learning
Zhihan Zhang
Yixin Cao
Lizi Liao
23
0
0
03 Apr 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Y. Wang
Shengqiong Wu
Y. Zhang
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
69
7
0
16 Mar 2025
PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models
Zilu Guo
Hongbin Lin
Zhihao Yuan
C. Zheng
Pengshuo Qiu
Dongzhi Jiang
Renrui Zhang
Chun-Mei Feng
Zhen Li
MLLM
3DV
79
1
0
13 Mar 2025
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
Qinghao Ye
Xianhan Zeng
Fu Li
C. Li
Haoqi Fan
CoGe
69
0
0
10 Mar 2025
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Shuo Xing
Yuping Wang
Peiran Li
Ruizheng Bai
Y. Wang
Chengxuan Qian
Huaxiu Yao
Zhengzhong Tu
75
6
0
18 Feb 2025
On the robustness of ChatGPT in teaching Korean Mathematics
On the robustness of ChatGPT in teaching Korean Mathematics
Phuong-Nam Nguyen
Quang Nguyen-The
An Vu-Minh
Diep-Anh Nguyen
Xuan-Lam Pham
RALM
32
0
0
17 Feb 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Ziyu Liu
...
Haodong Duan
W. Zhang
Kai Chen
D. Lin
Jiaqi Wang
VLM
68
17
0
21 Jan 2025
Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations
Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations
Archita Srivastava
Abhas Kumar
Rajesh Kumar
Prabhakar Srinivasan
28
0
0
08 Jan 2025
Diving into Self-Evolving Training for Multimodal Reasoning
Diving into Self-Evolving Training for Multimodal Reasoning
Wei Liu
Junlong Li
Xiwen Zhang
Fan Zhou
Yu Cheng
Junxian He
ReLM
LRM
30
3
0
23 Dec 2024
Optimizing Vision-Language Interactions Through Decoder-Only Models
Optimizing Vision-Language Interactions Through Decoder-Only Models
Kaito Tanaka
Benjamin Tan
Brian Wong
VLM
79
0
0
14 Dec 2024
EVLM: Self-Reflective Multimodal Reasoning for Cross-Dimensional Visual
  Editing
EVLM: Self-Reflective Multimodal Reasoning for Cross-Dimensional Visual Editing
Umar Khalid
Hasan Iqbal
Azib Farooq
Nazanin Rahnavard
Jing Hua
...
H. Iqbal
Azib Farooq
Nazanin Rahnavard
Jing Hua
Chen Chen
60
0
0
13 Dec 2024
MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware
  Multimodal Preference Optimization
MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization
Kangyu Zhu
Peng Xia
Yun-Qing Li
Hongtu Zhu
Sheng Wang
Huaxiu Yao
87
1
0
09 Dec 2024
VideoSAVi: Self-Aligned Video Language Models without Human Supervision
VideoSAVi: Self-Aligned Video Language Models without Human Supervision
Yogesh Kulkarni
Pooyan Fazli
VLM
83
2
0
01 Dec 2024
Efficient Self-Improvement in Multimodal Large Language Models: A
  Model-Level Judge-Free Approach
Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach
Shijian Deng
Wentian Zhao
Yu-Jhe Li
Kun Wan
Daniel Miranda
Ajinkya Kale
Yapeng Tian
LRM
62
0
0
26 Nov 2024
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
44
45
1
15 Nov 2024
Vision-Language Models Can Self-Improve Reasoning via Reflection
Vision-Language Models Can Self-Improve Reasoning via Reflection
Kanzhi Cheng
Yantao Li
Fangzhi Xu
Jianbing Zhang
Hao Zhou
Yang Liu
ReLM
LRM
31
16
0
30 Oct 2024
Unraveling and Mitigating Safety Alignment Degradation of
  Vision-Language Models
Unraveling and Mitigating Safety Alignment Degradation of Vision-Language Models
Qin Liu
Chao Shang
Ling Liu
Nikolaos Pappas
Jie Ma
Neha Anna John
Srikanth Doss Kadarundalagi Raghuram Doss
Lluís Marquez
Miguel Ballesteros
Yassine Benajiba
21
3
0
11 Oct 2024
Beyond Captioning: Task-Specific Prompting for Improved VLM Performance
  in Mathematical Reasoning
Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning
Ayush Singh
Mansi Gupta
Shivank Garg
Abhinav Kumar
Vansh Agrawal
ReLM
LRM
17
0
0
08 Oct 2024
Understanding Alignment in Multimodal LLMs: A Comprehensive Study
Understanding Alignment in Multimodal LLMs: A Comprehensive Study
Elmira Amirloo
J. Fauconnier
Christoph Roesmann
Christian Kerl
Rinu Boney
...
Zirui Wang
Afshin Dehghan
Yinfei Yang
Zhe Gan
Peter Grasch
20
6
0
02 Jul 2024
STLLaVA-Med: Self-Training Large Language and Vision Assistant for
  Medical
STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical
Guohao Sun
Can Qin
Huazhu Fu
Linwei Wang
Zhiqiang Tao
LM&MA
19
3
0
28 Jun 2024
mDPO: Conditional Preference Optimization for Multimodal Large Language
  Models
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
Fei Wang
Wenxuan Zhou
James Y. Huang
Nan Xu
Sheng Zhang
Hoifung Poon
Muhao Chen
54
15
0
17 Jun 2024
Direct Nash Optimization: Teaching Language Models to Self-Improve with
  General Preferences
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Corby Rosset
Ching-An Cheng
Arindam Mitra
Michael Santacroce
Ahmed Hassan Awadallah
Tengyang Xie
139
113
0
04 Apr 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
116
106
0
08 Feb 2024
KTO: Model Alignment as Prospect Theoretic Optimization
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
150
437
0
02 Feb 2024
Self-Rewarding Language Models
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLM
SyDa
ALM
LRM
215
291
0
18 Jan 2024
Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and
  the Case of Information Extraction
Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction
Martin Josifoski
Marija Sakota
Maxime Peyrard
Robert West
SyDa
54
76
0
07 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
198
1,089
0
20 Sep 2022
CyCLIP: Cyclic Contrastive Language-Image Pretraining
CyCLIP: Cyclic Contrastive Language-Image Pretraining
Shashank Goel
Hritik Bansal
S. Bhatia
Ryan A. Rossi
Vishwa Vinay
Aditya Grover
CLIP
VLM
146
131
0
28 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Mohit Bansal
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
174
342
0
13 Jul 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
2,875
0
11 Feb 2021
1