ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.12793
  4. Cited By
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

21 November 2023
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Conghui He
Jiaqi Wang
Feng Zhao
Dahua Lin
    MLLM
    VLM
ArXivPDFHTML

Papers citing "ShareGPT4V: Improving Large Multi-Modal Models with Better Captions"

50 / 467 papers shown
Title
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and
  Instruction-Tuning Dataset for LVLMs
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Ziyu Liu
Tao Chu
Yuhang Zang
Xilin Wei
Xiaoyi Dong
...
Zijian Liang
Yuanjun Xiong
Yu Qiao
Dahua Lin
Jiaqi Wang
VLM
35
34
0
17 Jun 2024
Unveiling Encoder-Free Vision-Language Models
Unveiling Encoder-Free Vision-Language Models
Haiwen Diao
Yufeng Cui
Xiaotong Li
Yueze Wang
Huchuan Lu
Xinlong Wang
VLM
53
28
0
17 Jun 2024
Exploring the Role of Large Language Models in Prompt Encoding for
  Diffusion Models
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models
Bingqi Ma
Zhuofan Zong
Guanglu Song
Hongsheng Li
Yu Liu
30
20
0
17 Jun 2024
On Efficient Language and Vision Assistants for Visually-Situated
  Natural Language Understanding: What Matters in Reading and Reasoning
On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning
Geewook Kim
Minjoon Seo
VLM
36
2
0
17 Jun 2024
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with
  Geometric Image Generation
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
Shihao Cai
Keqin Bao
Hangyu Guo
Jizhi Zhang
Jun Song
Bo Zheng
39
15
0
17 Jun 2024
Generative Visual Instruction Tuning
Generative Visual Instruction Tuning
Jefferson Hernandez
Ruben Villegas
Vicente Ordonez
VLM
38
3
0
17 Jun 2024
Reminding Multimodal Large Language Models of Object-aware Knowledge
  with Retrieved Tags
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags
Daiqing Qi
Handong Zhao
Zijun Wei
Sheng Li
42
2
0
16 Jun 2024
From Pixels to Prose: A Large Dataset of Dense Image Captions
From Pixels to Prose: A Large Dataset of Dense Image Captions
Vasu Singla
Kaiyu Yue
Sukriti Paul
Reza Shirkavand
Mayuka Jayawardhana
Alireza Ganjdanesh
Heng Huang
A. Bhatele
Gowthami Somepalli
Tom Goldstein
3DV
VLM
28
22
0
14 Jun 2024
First Multi-Dimensional Evaluation of Flowchart Comprehension for
  Multimodal Large Language Models
First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models
Enming Zhang
Ruobing Yao
Huanyong Liu
Junhui Yu
Jiale Wang
ELM
LRM
49
0
0
14 Jun 2024
Towards Vision-Language Geo-Foundation Model: A Survey
Towards Vision-Language Geo-Foundation Model: A Survey
Yue Zhou
Litong Feng
Yiping Ke
Xue Jiang
Junchi Yan
Xue Yang
Wayne Zhang
42
15
0
13 Jun 2024
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Yi-Fan Zhang
Qingsong Wen
Chaoyou Fu
Xue Wang
Zhang Zhang
L. Wang
Rong Jin
34
40
0
12 Jun 2024
What If We Recaption Billions of Web Images with LLaMA-3?
What If We Recaption Billions of Web Images with LLaMA-3?
Xianhang Li
Haoqin Tu
Mude Hui
Zeyu Wang
Bingchen Zhao
...
Jieru Mei
Qing Liu
Huangjie Zheng
Yuyin Zhou
Cihang Xie
VLM
MLLM
41
35
0
12 Jun 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images
  Interleaved with Text
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Qingyun Li
Zhe Chen
Weiyun Wang
Wenhai Wang
Shenglong Ye
...
Dahua Lin
Yu Qiao
Botian Shi
Conghui He
Jifeng Dai
VLM
OffRL
56
20
0
12 Jun 2024
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities
  in Large Vision-Language Models
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models
Shimin Chen
Yitian Yuan
Shaoxiang Chen
Zequn Jie
Lin Ma
VLM
27
3
0
12 Jun 2024
GAIA: Rethinking Action Quality Assessment for AI-Generated Videos
GAIA: Rethinking Action Quality Assessment for AI-Generated Videos
Zijian Chen
Wei Sun
Yuan Tian
Jun Jia
Zicheng Zhang
Jiarui Wang
Ru Huang
Xiongkuo Min
Guangtao Zhai
Wenjun Zhang
EGVM
50
10
0
10 Jun 2024
Vript: A Video Is Worth Thousands of Words
Vript: A Video Is Worth Thousands of Words
Dongjie Yang
Suyuan Huang
Chengqiang Lu
Xiaodong Han
Haoxin Zhang
Yan Gao
Yao Hu
Hai Zhao
VGen
69
22
0
10 Jun 2024
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision
  Language Models
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Peng Xia
Ze Chen
Juanxi Tian
Yangrui Gong
Ruibo Hou
...
Jimeng Sun
Zongyuan Ge
Gang Li
James Zou
Huaxiu Yao
MU
VLM
61
31
0
10 Jun 2024
M3GIA: A Cognition Inspired Multilingual and Multimodal General
  Intelligence Ability Benchmark
M3GIA: A Cognition Inspired Multilingual and Multimodal General Intelligence Ability Benchmark
Wei Song
Yadong Li
Jianhua Xu
Guowei Wu
Lingfeng Ming
...
Weihua Luo
Houyi Li
Yi Du
Fangda Guo
Kaicheng Yu
ELM
LRM
39
7
0
08 Jun 2024
An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal
  Large Language Models
An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models
Xiongtao Zhou
Jie He
Yuhua Ke
Guangyao Zhu
Víctor Gutiérrez-Basulto
Jeff Z. Pan
40
11
0
07 Jun 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
61
31
0
07 Jun 2024
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and
  Effective for LMMs
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
Lingchen Meng
Jianwei Yang
Rui Tian
Xiyang Dai
Zuxuan Wu
Jianfeng Gao
Yu-Gang Jiang
VLM
22
9
0
06 Jun 2024
ShareGPT4Video: Improving Video Understanding and Generation with Better
  Captions
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Lin Chen
Xilin Wei
Jinsong Li
Xiaoyi Dong
Pan Zhang
...
Li Yuan
Yu Qiao
Dahua Lin
Feng Zhao
Jiaqi Wang
72
142
0
06 Jun 2024
VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
Junjie Zhou
Zheng Liu
Shitao Xiao
Bo Zhao
Yongping Xiong
45
21
0
06 Jun 2024
Wings: Learning Multimodal LLMs without Text-only Forgetting
Wings: Learning Multimodal LLMs without Text-only Forgetting
Yi-Kai Zhang
Shiyin Lu
Yang Li
Yanqing Ma
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
VLM
35
6
0
05 Jun 2024
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
Le Zhuo
Ruoyi Du
Han Xiao
Yangguang Li
Dongyang Liu
...
Wanli Ouyang
Ziwei Liu
Yu Qiao
Hongsheng Li
Peng Gao
52
44
0
05 Jun 2024
Parrot: Multilingual Visual Instruction Tuning
Parrot: Multilingual Visual Instruction Tuning
Hai-Long Sun
Da-Wei Zhou
Y. Li
Shiyin Lu
Chao Yi
...
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
MLLM
25
9
0
04 Jun 2024
Dimba: Transformer-Mamba Diffusion Models
Dimba: Transformer-Mamba Diffusion Models
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Debang Li
Youqiang Zhang
Junshi Huang
Mamba
62
16
0
03 Jun 2024
Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language
  Model
Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model
Kezhen Chen
Rahul Thapa
Rahul Chalamala
Ben Athiwaratkun
S. Song
James Y. Zou
VLM
50
0
0
03 Jun 2024
Collaborative Novel Object Discovery and Box-Guided Cross-Modal
  Alignment for Open-Vocabulary 3D Object Detection
Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection
Yang Cao
Yihan Zeng
Hang Xu
Dan Xu
3DPC
ObjD
41
6
0
02 Jun 2024
Bootstrap3D: Improving 3D Content Creation with Synthetic Data
Bootstrap3D: Improving 3D Content Creation with Synthetic Data
Zeyi Sun
Tong Wu
Pan Zhang
Yuhang Zang
Xiao-wen Dong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
34
0
0
31 May 2024
Ovis: Structural Embedding Alignment for Multimodal Large Language Model
Ovis: Structural Embedding Alignment for Multimodal Large Language Model
Shiyin Lu
Yang Li
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
Han-Jia Ye
VLM
MLLM
53
35
0
31 May 2024
InsightSee: Advancing Multi-agent Vision-Language Models for Enhanced
  Visual Understanding
InsightSee: Advancing Multi-agent Vision-Language Models for Enhanced Visual Understanding
Huaxiang Zhang
Yaojia Mu
Guo-Niu Zhu
Zhongxue Gan
43
2
0
31 May 2024
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Andreas Koukounas
Georgios Mastrapas
Michael Gunther
Bo Wang
Scott Martens
...
Saahil Ognawala
Susana Guzman
Maximilian Werk
Nan Wang
Han Xiao
VLM
25
15
0
30 May 2024
NoiseBoost: Alleviating Hallucination with Noise Perturbation for
  Multimodal Large Language Models
NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models
Kai Wu
Boyuan Jiang
Zhengkai Jiang
Qingdong He
Donghao Luo
Shengzhi Wang
Qingwen Liu
Chengjie Wang
VLM
MLLM
30
3
0
30 May 2024
Benchmarking and Improving Detail Image Caption
Benchmarking and Improving Detail Image Caption
Hongyuan Dong
Jiawen Li
Bohong Wu
Jiacong Wang
Yuan Zhang
Haoyuan Guo
VLM
MLLM
35
16
0
29 May 2024
The Evolution of Multimodal Model Architectures
The Evolution of Multimodal Model Architectures
S. Wadekar
Abhishek Chaurasia
Aman Chadha
Eugenio Culurciello
43
14
0
28 May 2024
Seeing the Image: Prioritizing Visual Correlation by Contrastive
  Alignment
Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
Xin Xiao
Bohong Wu
Jiacong Wang
Chunyuan Li
Xun Zhou
Haoyuan Guo
VLM
34
7
0
28 May 2024
Visual Anchors Are Strong Information Aggregators For Multimodal Large
  Language Model
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model
Haogeng Liu
Quanzeng You
Xiaotian Han
Yongfei Liu
Huaibo Huang
Ran He
Hongxia Yang
31
2
0
28 May 2024
Empowering Source-Free Domain Adaptation via MLLM-Guided Reliability-Based Curriculum Learning
Empowering Source-Free Domain Adaptation via MLLM-Guided Reliability-Based Curriculum Learning
Dongjie Chen
Kartik Patwari
Zhengfeng Lai
S. Cheung
Chen-Nee Chuah
Chen-Nee Chuah
45
0
0
28 May 2024
Privacy-Aware Visual Language Models
Privacy-Aware Visual Language Models
Laurens Samson
Nimrod Barazani
S. Ghebreab
Yukiyasu Asano
PILM
VLM
42
1
0
27 May 2024
A Survey of Multimodal Large Language Model from A Data-centric
  Perspective
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping-Chia Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
47
36
0
26 May 2024
CapS-Adapter: Caption-based MultiModal Adapter in Zero-Shot
  Classification
CapS-Adapter: Caption-based MultiModal Adapter in Zero-Shot Classification
Qijie Wang
Guandu Liu
Bin Wang
VLM
27
2
0
26 May 2024
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal
  Models
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Chunjiang Ge
Sijie Cheng
Ziming Wang
Jiale Yuan
Yuan Gao
Jun Song
Shiji Song
Gao Huang
Bo Zheng
MLLM
VLM
26
17
0
24 May 2024
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for
  Multimodal Large Language Models
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
Yue Zhang
Hehe Fan
Yi Yang
45
3
0
24 May 2024
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision
  Models
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Byung-Kwan Lee
Chae Won Kim
Beomchan Park
Yonghyun Ro
MLLM
LRM
36
17
0
24 May 2024
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Xiyao Wang
Jiuhai Chen
Zhaoyang Wang
Yuhang Zhou
Yiyang Zhou
...
Tianyi Zhou
Tom Goldstein
Parminder Bhatia
Furong Huang
Cao Xiao
62
33
0
24 May 2024
Calibrated Self-Rewarding Vision Language Models
Calibrated Self-Rewarding Vision Language Models
Yiyang Zhou
Zhiyuan Fan
Dongjie Cheng
Sihan Yang
Zhaorun Chen
Chenhang Cui
Xiyao Wang
Yun-Qing Li
Linjun Zhang
Huaxiu Yao
VLM
77
26
0
23 May 2024
Unveiling the Tapestry of Consistency in Large Vision-Language Models
Unveiling the Tapestry of Consistency in Large Vision-Language Models
Yuan Zhang
Fei Xiao
Tao Huang
Chun-Kai Fan
Hongyuan Dong
Jiawen Li
Jiacong Wang
Kuan Cheng
Shanghang Zhang
Haoyuan Guo
40
7
0
23 May 2024
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment
  Capability
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
Fei Zhao
Taotian Pang
Chunhui Li
Zhen Wu
Junjie Guo
Shangyu Xing
Xinyu Dai
47
7
0
23 May 2024
Dense Connector for MLLMs
Dense Connector for MLLMs
Huanjin Yao
Wenhao Wu
Taojiannan Yang
Yuxin Song
Mengxi Zhang
Haocheng Feng
Yifan Sun
Zhiheng Li
Wanli Ouyang
Jingdong Wang
MLLM
VLM
39
16
0
22 May 2024
Previous
123...106789
Next