ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.05525
  4. Cited By
DeepSeek-VL: Towards Real-World Vision-Language Understanding

DeepSeek-VL: Towards Real-World Vision-Language Understanding

8 March 2024
Haoyu Lu
Wen Liu
Bo Zhang
Bing-Li Wang
Kai Dong
Bo Liu
Jingxiang Sun
Tongzheng Ren
Zhuoshu Li
Hao-Yu Yang
Yaofeng Sun
Chengqi Deng
Hanwei Xu
Zhenda Xie
Chong Ruan
    VLM
ArXivPDFHTML

Papers citing "DeepSeek-VL: Towards Real-World Vision-Language Understanding"

50 / 222 papers shown
Title
Critique Before Thinking: Mitigating Hallucination through Rationale-Augmented Instruction Tuning
Critique Before Thinking: Mitigating Hallucination through Rationale-Augmented Instruction Tuning
Zexian Yang
Dian Li
Dayan Wu
Gang Liu
Weiping Wang
MLLM
LRM
36
0
0
12 May 2025
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Haokun Lin
Teng Wang
Yixiao Ge
Yuying Ge
Zhichao Lu
Ying Wei
Qingfu Zhang
Zhenan Sun
Ying Shan
MLLM
VLM
64
0
0
08 May 2025
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Biao Gong
Cheng Zou
Dandan Zheng
Hu Yu
Jingdong Chen
...
Qingpei Guo
Rui Liu
Weilong Chai
Xinyu Xiao
Ziyuan Huang
MLLM
74
1
0
05 May 2025
DeCo: Task Decomposition and Skill Composition for Zero-Shot Generalization in Long-Horizon 3D Manipulation
DeCo: Task Decomposition and Skill Composition for Zero-Shot Generalization in Long-Horizon 3D Manipulation
Zixuan Chen
Junhui Yin
Yangtao Chen
Jing Huo
Pinzhuo Tian
Jieqi Shi
Yiwen Hou
Y. Li
Yang Gao
33
0
0
01 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma
Luoxin Ye
Nessa McWeeney
Celso M de Melo
A. Yuille
Jieneng Chen
LRM
57
1
0
01 May 2025
AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images
AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images
Yunhao Li
Sijing Wu
Wei Sun
Zhichao Zhang
Yucheng Zhu
Zicheng Zhang
Huiyu Duan
Xiongkuo Min
Guangtao Zhai
EGVM
81
0
0
30 Apr 2025
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Yuanchen Wu
Lu Zhang
Hang Yao
Junlong Du
Ke Yan
Shouhong Ding
Yunsheng Wu
X. Li
MLLM
71
0
0
29 Apr 2025
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
Kesen Zhao
B. Zhu
Qianru Sun
Hanwang Zhang
MLLM
LRM
81
0
0
25 Apr 2025
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Yusen Zhang
Wenliang Zheng
Aashrith Madasu
Peng Shi
Ryo Kamoi
...
Ranran Haoran Zhang
Avitej Iyer
Renze Lou
Wenpeng Yin
Rui Zhang
63
0
0
25 Apr 2025
V$^2$R-Bench: Holistically Evaluating LVLM Robustness to Fundamental Visual Variations
V2^22R-Bench: Holistically Evaluating LVLM Robustness to Fundamental Visual Variations
Zhiyuan Fan
Yumeng Wang
Sandeep Polisetty
Yi Ren Fung
43
0
0
23 Apr 2025
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
Chris
Yichen Wei
Yi Peng
X. Wang
Weijie Qiu
...
Jianhao Zhang
Y. Hao
Xuchen Song
Yang Liu
Yahui Zhou
OffRL
AI4TS
SyDa
LRM
VLM
74
0
0
23 Apr 2025
TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving
TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving
Daocheng Fu
Zijun Chen
Renqiu Xia
Qi Liu
Yuan Feng
...
Peng Gao
Junchi Yan
Botian Shi
Bo Zhang
Yu Qiao
28
0
0
22 Apr 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Enxin Song
Wenhao Chai
Weili Xu
Jianwen Xie
Yuxuan Liu
Gaoang Wang
57
0
0
20 Apr 2025
ChartQA-X: Generating Explanations for Charts
ChartQA-X: Generating Explanations for Charts
Shamanthak Hegde
Pooyan Fazli
H. Seifi
12
0
0
17 Apr 2025
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models
Zhanglin Wu
Tengfei Song
Ning Xie
Weidong Zhang
Mengli Zhu
...
Pengfei Li
C. Li
Junhao Zhu
Hao-Yu Yang
Shiliang Sun
26
1
0
16 Apr 2025
Instruction-augmented Multimodal Alignment for Image-Text and Element Matching
Instruction-augmented Multimodal Alignment for Image-Text and Element Matching
Xinli Yue
Jianhui Sun
Junda Lu
Liangchao Yao
Fan Xia
Tianyi Wang
Fengyun Rao
Jing Lyu
Yuetang Deng
21
0
0
16 Apr 2025
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
Shengao Wang
Arjun Chandra
Aoming Liu
Venkatesh Saligrama
Boqing Gong
MLLM
VLM
45
0
0
13 Apr 2025
PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks
PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks
J. Wu
Hao Yang
Xinhua Zeng
Guibing He
Z. Chen
Z. Li
X. Zhang
Yangyang Ma
Run Fang
Yang Liu
LRM
70
0
0
12 Apr 2025
FVQ: A Large-Scale Dataset and A LMM-based Method for Face Video Quality Assessment
FVQ: A Large-Scale Dataset and A LMM-based Method for Face Video Quality Assessment
Sijing Wu
Yunhao Li
Ziwen Xu
Yixuan Gao
Huiyu Duan
Wei Sun
Guangtao Zhai
39
1
0
12 Apr 2025
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Jingyuan Zhang
Hongzhi Zhang
Zhou Haonan
Chenxi Sun
Xingguang Ji
Jiakang Wang
Fanheng Kong
Y. Liu
Qi Wang
Fuzheng Zhang
VLM
53
1
0
10 Apr 2025
Investigating Vision-Language Model for Point Cloud-based Vehicle Classification
Investigating Vision-Language Model for Point Cloud-based Vehicle Classification
Yiqiao Li
Jie Wei
Camille Kamga
26
0
0
10 Apr 2025
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
Minghe Gao
Xuqi Liu
Zhongqi Yue
Y. Wu
Shuang Chen
Juncheng Billy Li
Siliang Tang
Fei Wu
Tat-Seng Chua
Yueting Zhuang
OffRL
LRM
39
1
0
09 Apr 2025
SmolVLM: Redefining small and efficient multimodal models
SmolVLM: Redefining small and efficient multimodal models
Andres Marafioti
Orr Zohar
Miquel Farré
Merve Noyan
Elie Bakouch
...
Hugo Larcher
Mathieu Morlon
Lewis Tunstall
Leandro von Werra
Thomas Wolf
VLM
34
4
0
07 Apr 2025
LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts
LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts
Yimu Wang
Mozhgan Nasr Azadani
Sean Sedwards
Krzysztof Czarnecki
MLLM
MoE
52
0
0
07 Apr 2025
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
Yang Jiao
Haibo Qiu
Zequn Jie
S. Chen
Jingjing Chen
Lin Ma
Yu Jiang
26
2
0
06 Apr 2025
SARLANG-1M: A Benchmark for Vision-Language Modeling in SAR Image Understanding
SARLANG-1M: A Benchmark for Vision-Language Modeling in SAR Image Understanding
Yimin Wei
Aoran Xiao
Yexian Ren
Yuting Zhu
Hongruixuan Chen
J. Xia
Naoto Yokoya
VLM
66
0
0
04 Apr 2025
A Survey of Large Language Models in Mental Health Disorder Detection on Social Media
A Survey of Large Language Models in Mental Health Disorder Detection on Social Media
Zhuohan Ge
Nicole Hu
Darian Li
Yubo Wang
Shihao Qi
Yuming Xu
Han Shi
J. Zhang
AI4MH
56
0
0
03 Apr 2025
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
Chaohu Liu
Tianyi Gui
Yu Liu
Linli Xu
VLM
AAML
68
1
0
02 Apr 2025
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Runhui Huang
Chunwei Wang
Junwei Yang
Guansong Lu
Yunlong Yuan
...
Lu Hou
Wei Zhang
Lanqing Hong
Hengshuang Zhao
Hang Xu
MLLM
81
1
0
02 Apr 2025
FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics
FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics
Yixuan Li
Yu Tian
Yipo Huang
Wei Lu
Shiqi Wang
Weisi Lin
Anderson de Rezende Rocha
54
0
0
31 Mar 2025
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu
W. Zhang
Lumin Xu
Sheng Jin
Zhonghua Wu
Qingyi Tao
Wentao Liu
Wei Li
Chen Change Loy
VGen
82
2
0
27 Mar 2025
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
Xiaoqin Wang
Xusen Ma
Xianxu Hou
Meidan Ding
Yudong Li
Junliang Chen
Wenting Chen
Xiaoyang Peng
LinLin Shen
CVBM
71
0
0
27 Mar 2025
InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression
InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression
Dongchen Lu
Yuyao Sun
Zilu Zhang
Leping Huang
Jianliang Zeng
Mao Shu
Huo Cao
39
0
0
27 Mar 2025
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
Alex Jinpeng Wang
Linjie Li
Z. Yang
Lijuan Wang
Min Li
DiffM
68
0
0
26 Mar 2025
3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark
3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark
Ivan Sviridov
Amina Miftakhova
Artemiy Tereshchenko
Galina Zubkova
Pavel Blinov
Andrey Savchenko
LM&MA
29
0
0
26 Mar 2025
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Jiayi Ji
Jie Lou
Debing Zhang
Rongrong Ji
90
0
0
26 Mar 2025
LangBridge: Interpreting Image as a Combination of Language Embeddings
LangBridge: Interpreting Image as a Combination of Language Embeddings
Jiaqi Liao
Yuwei Niu
Fanqing Meng
Hao Li
Changyao Tian
...
Dianqi Li
X. Zhu
Li Yuan
Jifeng Dai
Yu Cheng
MLLM
72
0
0
25 Mar 2025
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
Haoyu Fu
Diankun Zhang
Zongchuang Zhao
Jianfeng Cui
Dingkang Liang
Chong Zhang
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
38
2
0
25 Mar 2025
ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
Jiaqi Liao
Z. Yang
Linjie Li
Dianqi Li
Kevin Qinghong Lin
Yu-Xi Cheng
Lijuan Wang
MLLM
LRM
57
0
0
25 Mar 2025
Towards Online Multi-Modal Social Interaction Understanding
Towards Online Multi-Modal Social Interaction Understanding
X. Li
Shijian Deng
Bolin Lai
Weiguo Pian
James M. Rehg
Yapeng Tian
41
0
0
25 Mar 2025
Commander-GPT: Fully Unleashing the Sarcasm Detection Capability of Multi-Modal Large Language Models
Commander-GPT: Fully Unleashing the Sarcasm Detection Capability of Multi-Modal Large Language Models
Y. Zhang
Chunwang Zou
Bo Wang
Jing Qin
51
0
0
24 Mar 2025
Visual Persona: Foundation Model for Full-Body Human Customization
Visual Persona: Foundation Model for Full-Body Human Customization
Jisu Nam
Soowon Son
Zhan Xu
Jing Shi
Difan Liu
Feng Liu
Aashish Misraa
Seungryong Kim
Yang Zhou
DiffM
37
0
0
19 Mar 2025
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
Jin Wang
Chenghui Lv
Xian Li
Shichao Dong
Huadong Li
Kelu Yao
Chao Li
Wenqi Shao
Ping Luo
59
0
0
19 Mar 2025
Can Large Vision Language Models Read Maps Like a Human?
Can Large Vision Language Models Read Maps Like a Human?
Shuo Xing
Zezhou Sun
Shuangyu Xie
Kaiyuan Chen
Yanjia Huang
Yuping Wang
Jiachen Li
Dezhen Song
Zhengzhong Tu
58
2
0
18 Mar 2025
Federated Continual Instruction Tuning
Federated Continual Instruction Tuning
Haiyang Guo
Fanhu Zeng
Fei Zhu
Wenzhuo Liu
Da-Han Wang
Jian Xu
Xu-Yao Zhang
Cheng-Lin Liu
CLL
FedML
63
1
0
17 Mar 2025
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
Tianle Li
Yongming Rao
Winston Hu
Yu Cheng
MLLM
61
0
0
16 Mar 2025
Utilizing Jailbreak Probability to Attack and Safeguard Multimodal LLMs
Wenzhuo Xu
Zhipeng Wei
Xiongtao Sun
Deyue Zhang
Dongdong Yang
Quanchen Zou
X. Zhang
AAML
47
0
0
10 Mar 2025
Should VLMs be Pre-trained with Image Data?
Sedrick Scott Keh
Jean-Pierre Mercat
S. Gadre
Kushal Arora
Igor Vasiljevic
...
Shuran Song
Russ Tedrake
Thomas Kollar
Ludwig Schmidt
Achal Dave
VLM
49
0
0
10 Mar 2025
PointVLA: Injecting the 3D World into Vision-Language-Action Models
Chengmeng Li
Junjie Wen
Yan Peng
Yaxin Peng
Feifei Feng
Y. X. Zhu
3DPC
69
2
0
10 Mar 2025
Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios
Chenglu Pan
Xiaogang Xu
Ganggui Ding
Y. Zhang
Wenbo Li
Jiarong Xu
Qingbiao Wu
55
0
0
10 Mar 2025
12345
Next