ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.15112
  4. Cited By
InternLM-XComposer: A Vision-Language Large Model for Advanced
  Text-image Comprehension and Composition

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

26 September 2023
Pan Zhang
Xiaoyi Wang
Bin Wang
Yuhang Cao
Chao Xu
Linke Ouyang
Zhiyuan Zhao
Haodong Duan
Songyang Zhang
Shuangrui Ding
Wenwei Zhang
Hang Yan
Xinyu Zhang
Wei Li
Jingwen Li
Kai-xiang Chen
Conghui He
Xingcheng Zhang
Yu Qiao
Da Lin
Jiaqi Wang
    MLLM
ArXivPDFHTML

Papers citing "InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition"

50 / 184 papers shown
Title
Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
Shuo Li
Tao Ji
Xiaoran Fan
Linsheng Lu
L. Yang
...
Y. Wang
Xiaohui Zhao
Tao Gui
Qi Zhang
Xuanjing Huang
35
0
0
15 Oct 2024
Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature
  Aggregation
Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature Aggregation
Shun Qian
Bingquan Liu
Chengjie Sun
Zhen Xu
Baoxun Wang
26
0
0
14 Oct 2024
Deciphering Cross-Modal Alignment in Large Vision-Language Models with
  Modality Integration Rate
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
Qidong Huang
Xiaoyi Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Jiaqi Wang
Dahua Lin
Weiming Zhang
Nenghai Yu
33
1
0
09 Oct 2024
Gamified crowd-sourcing of high-quality data for visual fine-tuning
Gamified crowd-sourcing of high-quality data for visual fine-tuning
Shashank Yadav
Rohan Tomar
Garvit Jain
Chirag Ahooja
Shubham Chaudhary
Charles Elkan
28
0
0
05 Oct 2024
Unified Multi-Modal Interleaved Document Representation for Information
  Retrieval
Unified Multi-Modal Interleaved Document Representation for Information Retrieval
Jaewoo Lee
Joonho Ko
Jinheon Baek
Soyeong Jeong
Sung Ju Hwang
20
1
0
03 Oct 2024
Dog-IQA: Standard-guided Zero-shot MLLM for Mix-grained Image Quality
  Assessment
Dog-IQA: Standard-guided Zero-shot MLLM for Mix-grained Image Quality Assessment
Kai Liu
Ziqing Zhang
Wenbo Li
Renjing Pei
Fenglong Song
Xiaohong Liu
Linghe Kong
Yulun Zhang
VLM
28
0
0
03 Oct 2024
Unleashing the Potentials of Likelihood Composition for Multi-modal
  Language Models
Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models
Shitian Zhao
Renrui Zhang
Xu Luo
Yan Wang
Shanghang Zhang
Peng Gao
18
0
0
01 Oct 2024
World to Code: Multi-modal Data Generation via Self-Instructed
  Compositional Captioning and Filtering
World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering
Jiacong Wang
Bohong Wu
Haiyong Jiang
Xun Zhou
Xin Xiao
Haoyuan Guo
Jun Xiao
VLM
VGen
36
4
0
30 Sep 2024
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs
Zicheng Zhang
Ziheng Jia
H. Wu
Chunyi Li
Zijian Chen
...
Wei Sun
Xiaohong Liu
Xiongkuo Min
Weisi Lin
Guangtao Zhai
17
7
0
30 Sep 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Ye Liu
Zongyang Ma
Zhongang Qi
Yang Wu
Ying Shan
Chang Wen Chen
15
15
0
26 Sep 2024
Phantom of Latent for Large Language and Vision Models
Phantom of Latent for Large Language and Vision Models
Byung-Kwan Lee
Sangyun Chung
Chae Won Kim
Beomchan Park
Yong Man Ro
VLM
LRM
39
3
0
23 Sep 2024
Understanding Multimodal Hallucination with Parameter-Free
  Representation Alignment
Understanding Multimodal Hallucination with Parameter-Free Representation Alignment
Yueqian Wang
Jianxin Liang
Yuxuan Wang
Huishuai Zhang
Dongyan Zhao
39
1
0
02 Sep 2024
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
Baichuan Zhou
Haote Yang
Dairong Chen
Junyan Ye
Tianyi Bai
Jinhua Yu
Songyang Zhang
Dahua Lin
Conghui He
Weijia Li
VLM
50
3
0
30 Aug 2024
GlaLSTM: A Concurrent LSTM Stream Framework for Glaucoma Detection via Biomarker Mining
GlaLSTM: A Concurrent LSTM Stream Framework for Glaucoma Detection via Biomarker Mining
Cheng Huang
Weizheng Xie
Jian Zhou
Karanjit S Kooner
Karanjit Kooner
Yishen Liu
33
1
0
28 Aug 2024
LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models
LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models
Qihang Ge
Wei Sun
Yu Zhang
Yunhao Li
Zhongpeng Ji
Fengyu Sun
Shangling Jui
Xiongkuo Min
Guangtao Zhai
38
4
0
26 Aug 2024
A New Era in Computational Pathology: A Survey on Foundation and
  Vision-Language Models
A New Era in Computational Pathology: A Survey on Foundation and Vision-Language Models
Dibaloke Chanda
Milan Aryal
Nasim Yahya Soltani
Masoud Ganji
AI4CE
VLM
34
7
0
23 Aug 2024
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Yi-Fan Zhang
Huanyu Zhang
Haochen Tian
Chaoyou Fu
Shuangqing Zhang
...
Qingsong Wen
Zhang Zhang
L. Wang
Rong Jin
Tieniu Tan
OffRL
46
35
0
23 Aug 2024
Quality Assessment in the Era of Large Models: A Survey
Quality Assessment in the Era of Large Models: A Survey
Zicheng Zhang
Yingjie Zhou
Chunyi Li
Baixuan Zhao
Xiaohong Liu
Guangtao Zhai
29
10
0
17 Aug 2024
PathInsight: Instruction Tuning of Multimodal Datasets and Models for
  Intelligence Assisted Diagnosis in Histopathology
PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology
Xiaomin Wu
Rui Xu
Pengchen Wei
Wenkang Qin
Peixiang Huang
Ziheng Li
Lin Luo
LM&MA
23
2
0
13 Aug 2024
VITA: Towards Open-Source Interactive Omni Multimodal LLM
VITA: Towards Open-Source Interactive Omni Multimodal LLM
Chaoyou Fu
Haojia Lin
Zuwei Long
Yunhang Shen
Meng Zhao
...
Ran He
Rongrong Ji
Yunsheng Wu
Caifeng Shan
Xing Sun
MLLM
27
79
0
09 Aug 2024
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal
  Large Language Models
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Jiabo Ye
Haiyang Xu
Haowei Liu
Anwen Hu
Ming Yan
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
41
9
0
09 Aug 2024
Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models
  within Perturbed Inputs
Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs
Peng Ding
Jingyu Wu
M. Girolami
Dan Ma
Xuezhi Cao
Xunliang Cai
Shi Chen
T. J. Sullivan
Shujian Huang
AAML
VLM
MLLM
18
4
0
02 Aug 2024
Power-LLaVA: Large Language and Vision Assistant for Power Transmission
  Line Inspection
Power-LLaVA: Large Language and Vision Assistant for Power Transmission Line Inspection
Jiahao Wang
Mingxuan Li
Haichen Luo
Jinguo Zhu
A. Yang
M. Rong
Xiaohua Wang
23
3
0
27 Jul 2024
On Pre-training of Multimodal Language Models Customized for Chart
  Understanding
On Pre-training of Multimodal Language Models Customized for Chart Understanding
Wan-Cyuan Fan
Yen-Chun Chen
Mengchen Liu
Lu Yuan
Leonid Sigal
36
4
0
19 Jul 2024
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Kaichen Zhang
Bo Li
Peiyuan Zhang
Fanyi Pu
Joshua Adrian Cahyono
...
Shuai Liu
Yuanhan Zhang
Jingkang Yang
Chunyuan Li
Ziwei Liu
79
73
0
17 Jul 2024
FIRE: A Dataset for Feedback Integration and Refinement Evaluation of
  Multimodal Models
FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models
Pengxiang Li
Zhi Gao
Bofei Zhang
Tao Yuan
Yuwei Wu
Mehrtash Harandi
Yunde Jia
Song-Chun Zhu
Qing Li
VLM
MLLM
35
2
0
16 Jul 2024
Reflective Instruction Tuning: Mitigating Hallucinations in Large
  Vision-Language Models
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
Jinrui Zhang
Teng Wang
Haigang Zhang
Ping Lu
Feng Zheng
MLLM
LRM
VLM
19
3
0
16 Jul 2024
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large
  Vision-Language Models
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Runhui Huang
Xinpeng Ding
Chunwei Wang
J. N. Han
Yulong Liu
Hengshuang Zhao
Hang Xu
Lu Hou
Wei Zhang
Xiaodan Liang
VLM
18
8
0
11 Jul 2024
SEED-Story: Multimodal Long Story Generation with Large Language Model
SEED-Story: Multimodal Long Story Generation with Large Language Model
Shuai Yang
Yuying Ge
Yang Li
Yukang Chen
Yixiao Ge
Ying Shan
Yingcong Chen
VGen
DiffM
71
25
0
11 Jul 2024
A Single Transformer for Scalable Vision-Language Modeling
A Single Transformer for Scalable Vision-Language Modeling
Yangyi Chen
Xingyao Wang
Hao Peng
Heng Ji
LRM
35
10
0
08 Jul 2024
A Systematic Survey and Critical Review on Evaluating Large Language
  Models: Challenges, Limitations, and Recommendations
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Shafiq R. Joty
Jimmy Huang
ELM
ALM
17
25
0
04 Jul 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model
  Supporting Long-Contextual Input and Output
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Pan Zhang
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Rui Qian
...
Kai Chen
Jifeng Dai
Yu Qiao
Dahua Lin
Jiaqi Wang
43
98
0
03 Jul 2024
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models
  via Counterfactual Probing
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing
Yisong Xiao
Aishan Liu
QianJia Cheng
Zhenfei Yin
Siyuan Liang
Jiapeng Li
Jing Shao
Xianglong Liu
Dacheng Tao
24
4
0
30 Jun 2024
Investigating and Mitigating the Multimodal Hallucination Snowballing in
  Large Vision-Language Models
Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models
Weihong Zhong
Xiaocheng Feng
Liang Zhao
Qiming Li
Lei Huang
Yuxuan Gu
Weitao Ma
Yuan Xu
Bing Qin
MLLM
36
9
0
30 Jun 2024
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Longrong Yang
Dong Shen
Chaoxiang Cai
Fan Yang
Size Li
Di Zhang
Xi Li
MoE
39
2
0
28 Jun 2024
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and
  Understanding
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
Tao Zhang
Xiangtai Li
Hao Fei
Haobo Yuan
Shengqiong Wu
Shunping Ji
Chen Change Loy
Shuicheng Yan
LRM
MLLM
VLM
47
44
0
27 Jun 2024
Evaluating Fairness in Large Vision-Language Models Across Diverse
  Demographic Attributes and Prompts
Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts
Xuyang Wu
Yuan Wang
Hsin-Tai Wu
Zhiqiang Tao
Yi Fang
VLM
27
7
0
25 Jun 2024
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
Xiangyu Zhao
Xiangtai Li
Haodong Duan
Haian Huang
Yining Li
Kai Chen
Hua Yang
VLM
MLLM
37
10
0
25 Jun 2024
Director3D: Real-world Camera Trajectory and 3D Scene Generation from
  Text
Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text
Xinyang Li
Zhangyu Lai
Linning Xu
Yansong Qu
Liujuan Cao
Shengchuan Zhang
Bo Dai
Rongrong Ji
VGen
45
8
0
25 Jun 2024
Evaluating the Quality of Hallucination Benchmarks for Large
  Vision-Language Models
Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models
Bei Yan
Jie Zhang
Zheng Yuan
Shiguang Shan
Xilin Chen
VLM
23
4
0
24 Jun 2024
African or European Swallow? Benchmarking Large Vision-Language Models
  for Fine-Grained Object Classification
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification
Gregor Geigle
Radu Timofte
Goran Glavas
21
9
0
20 Jun 2024
VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large
  Vision-Language Model
VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model
Jie Zhang
Sibo Wang
Xiangkui Cao
Zheng Yuan
Shiguang Shan
Xilin Chen
Wen Gao
VLM
14
8
0
20 Jun 2024
TroL: Traversal of Layers for Large Language and Vision Models
TroL: Traversal of Layers for Large Language and Vision Models
Byung-Kwan Lee
Sangyun Chung
Chae Won Kim
Beomchan Park
Yong Man Ro
24
6
0
18 Jun 2024
VoCo-LLaMA: Towards Vision Compression with Large Language Models
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Xubing Ye
Yukang Gan
Xiaoke Huang
Yixiao Ge
Yansong Tang
MLLM
VLM
19
22
0
18 Jun 2024
LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
Dantong Niu
Yuvan Sharma
Giscard Biamby
Jerome Quenum
Yutong Bai
Baifeng Shi
Trevor Darrell
Roei Herzig
LM&Ro
VLM
28
20
0
17 Jun 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang
Lu Chen
Guodong Zheng
Yifeng Gao
Rui Zheng
...
Yu Qiao
Xuanjing Huang
Feng Zhao
Tao Gui
Jing Shao
VLM
68
22
0
17 Jun 2024
Comparison Visual Instruction Tuning
Comparison Visual Instruction Tuning
Wei Lin
M. Jehanzeb Mirza
Sivan Doveh
Rogerio Feris
Raja Giryes
Sepp Hochreiter
Leonid Karlinsky
38
4
0
13 Jun 2024
Multimodal Table Understanding
Multimodal Table Understanding
Mingyu Zheng
Xinwei Feng
Q. Si
Qiaoqiao She
Zheng-Shen Lin
Wenbin Jiang
Weiping Wang
LMTD
VLM
22
14
0
12 Jun 2024
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities
  in Large Vision-Language Models
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models
Shimin Chen
Yitian Yuan
Shaoxiang Chen
Zequn Jie
Lin Ma
VLM
21
3
0
12 Jun 2024
Vision Model Pre-training on Interleaved Image-Text Data via Latent
  Compression Learning
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Chenyu Yang
Xizhou Zhu
Jinguo Zhu
Weijie Su
Junjie Wang
...
Lewei Lu
Bin Li
Jie Zhou
Yu Qiao
Jifeng Dai
VLM
CLIP
24
4
0
11 Jun 2024
Previous
1234
Next