ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.05525
  4. Cited By
DeepSeek-VL: Towards Real-World Vision-Language Understanding

DeepSeek-VL: Towards Real-World Vision-Language Understanding

8 March 2024
Haoyu Lu
Wen Liu
Bo Zhang
Bing-Li Wang
Kai Dong
Bo Liu
Jingxiang Sun
Tongzheng Ren
Zhuoshu Li
Hao-Yu Yang
Yaofeng Sun
Chengqi Deng
Hanwei Xu
Zhenda Xie
Chong Ruan
    VLM
ArXivPDFHTML

Papers citing "DeepSeek-VL: Towards Real-World Vision-Language Understanding"

50 / 225 papers shown
Title
PointVLA: Injecting the 3D World into Vision-Language-Action Models
Chengmeng Li
Junjie Wen
Yan Peng
Yaxin Peng
Feifei Feng
Y. X. Zhu
3DPC
69
2
0
10 Mar 2025
Should VLMs be Pre-trained with Image Data?
Sedrick Scott Keh
Jean-Pierre Mercat
S. Gadre
Kushal Arora
Igor Vasiljevic
...
Shuran Song
Russ Tedrake
Thomas Kollar
Ludwig Schmidt
Achal Dave
VLM
49
0
0
10 Mar 2025
Utilizing Jailbreak Probability to Attack and Safeguard Multimodal LLMs
Wenzhuo Xu
Zhipeng Wei
Xiongtao Sun
Deyue Zhang
Dongdong Yang
Quanchen Zou
X. Zhang
AAML
47
0
0
10 Mar 2025
Evaluation of Safety Cognition Capability in Vision-Language Models for Autonomous Driving
Enming Zhang
Peizhe Gong
Xingyuan Dai
Yisheng Lv
Q. Miao
MLLM
ELM
60
0
0
09 Mar 2025
PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training
Cong Chen
Mingyu Liu
Chenchen Jing
Y. Zhou
Fengyun Rao
Hao Chen
Bo Zhang
Chunhua Shen
MLLM
AAML
VLM
52
2
0
09 Mar 2025
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
Jiaxin Ai
Pengfei Zhou
Zhaopan Xu
Ming Li
Fanrui Zhang
...
Jianwen Sun
Yukang Feng
Baojin Huang
Zhongyuan Wang
K. Zhang
ELM
92
0
0
09 Mar 2025
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
Xudong Lu
Yinghao Chen
Renshou Wu
Haohao Gao
Xi Chen
...
Fangyuan Li
Yafei Wen
Xiaoxin Chen
Shuai Ren
Hongsheng Li
68
0
0
08 Mar 2025
SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?
Xudong Lu
Haohao Gao
Renshou Wu
Shuai Ren
Xiaoxin Chen
Hongsheng Li
Fangyuan Li
ELM
49
0
0
08 Mar 2025
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
Junyan Lin
Haoran Chen
Yue Fan
Yingqi Fan
Xin Jin
Hui Su
Jinlan Fu
Xiaoyu Shen
60
0
0
08 Mar 2025
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
85
2
0
08 Mar 2025
Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information
Junbo Zhao
Ting Zhang
Jiayu Sun
Mi Tian
Hua Huang
36
0
0
07 Mar 2025
FedMABench: Benchmarking Mobile Agents on Decentralized Heterogeneous User Data
Wenhao Wang
Zijie Yu
Rui Ye
J. Zhang
S. Chen
Yanfeng Wang
FedML
45
0
0
07 Mar 2025
A Benchmark for Multi-Lingual Vision-Language Learning in Remote Sensing Image Captioning
Qing Zhou
Tao Yang
Junyu Gao
W. Ni
Junzheng Wu
Qi Wang
48
0
0
06 Mar 2025
Enhancing Abnormality Grounding for Vision Language Models with Knowledge Descriptions
Jun Yu Li
Che Liu
Wenjia Bai
Rossella Arcucci
Cosmin I. Bercea
Julia A. Schnabel
39
0
0
05 Mar 2025
Are Large Vision Language Models Good Game Players?
Xinyu Wang
Bohan Zhuang
Qi Wu
MLLM
ELM
LRM
94
3
0
04 Mar 2025
Advancing vision-language models in front-end development via data synthesis
Tong Ge
Yashu Liu
Jieping Ye
Tianyi Li
Chao Wang
67
0
0
03 Mar 2025
WeGen: A Unified Model for Interactive Multimodal Generation as We Chat
Zhipeng Huang
Shaobin Zhuang
Canmiao Fu
Binxin Yang
Ying Zhang
Chong Sun
Zhizheng Zhang
Yali Wang
Chen Li
Zheng-Jun Zha
DiffM
69
1
0
03 Mar 2025
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Hao Tang
Chenwei Xie
Haiyang Wang
Xiaoyi Bao
Tingyu Weng
Pandeng Li
Yun Zheng
Liwei Wang
ObjD
VLM
54
0
0
03 Mar 2025
An evaluation of DeepSeek Models in Biomedical Natural Language Processing
Zaifu Zhan
Shuang Zhou
Huixue Zhou
Jiawen Deng
Yu Hou
Jeremy Yeung
Rui Zhang
ELM
49
0
0
01 Mar 2025
Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation
Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation
Shaharukh Khan
Ayush Tarun
Ali Faraz
Palash Kamble
Vivek Dahiya
Praveen Kumar Pokala
Ashish Kulkarni
Chandra Khatri
Abhinav Ravi
Shubham Agarwal
80
0
0
27 Feb 2025
ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration
ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration
Minjie Zhu
Y. X. Zhu
Jinming Li
Zhongyi Zhou
Junjie Wen
Xiaoyu Liu
Chaomin Shen
Yaxin Peng
Feifei Feng
LM&Ro
76
2
0
26 Feb 2025
Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts
Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts
Zhenghao Liu
Xingsheng Zhu
Tianshuo Zhou
Xinyi Zhang
Xiaoyuan Yi
Yukun Yan
Yu Gu
Ge Yu
Maosong Sun
RALM
VLM
38
0
0
24 Feb 2025
Vision Language Models in Medicine
Beria Chingnabe Kalpelbe
Angel Gabriel Adaambiik
Wei Peng
VLM
LM&MA
86
2
0
24 Feb 2025
Chitrarth: Bridging Vision and Language for a Billion People
Chitrarth: Bridging Vision and Language for a Billion People
Shaharukh Khan
Ayush Tarun
Abhinav Ravi
Ali Faraz
Akshat Patidar
Praveen Kumar Pokala
Anagha Bhangare
Raja Kolla
Chandra Khatri
Shubham Agarwal
VLM
114
1
0
21 Feb 2025
LOVA3: Learning to Visual Question Answering, Asking and Assessment
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
77
8
0
21 Feb 2025
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework
Zirui Song
Jingpu Yang
Yuan Huang
Jonathan Tonglet
Zeyu Zhang
Tao Cheng
Meng Fang
Iryna Gurevych
X. Chen
LRM
65
1
0
19 Feb 2025
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Shuo Xing
Yuping Wang
Peiran Li
Ruizheng Bai
Y. Wang
Chengxuan Qian
Huaxiu Yao
Zhengzhong Tu
87
6
0
18 Feb 2025
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities
Hanbin Wang
Xiaoxuan Zhou
Zhipeng Xu
Keyuan Cheng
Yuxin Zuo
Kai Tian
Jingwei Song
Junting Lu
Wenhui Hu
Xueyang Liu
LRM
MLLM
76
1
0
17 Feb 2025
MMXU: A Multi-Modal and Multi-X-ray Understanding Dataset for Disease Progression
MMXU: A Multi-Modal and Multi-X-ray Understanding Dataset for Disease Progression
Linjie Mu
Zhongzhen Huang
Shengqian Qin
Yakun Zhu
S. Zhang
Xiaofan Zhang
38
0
0
17 Feb 2025
EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models
EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models
Jiamin Su
Yibo Yan
Fangteng Fu
H. Zhang
Jingheng Ye
Xiang Liu
Jiahao Huo
Huiyu Zhou
Xuming Hu
ELM
52
0
0
17 Feb 2025
Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models
Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models
Samuel Stevens
Wei-Lun Chao
T. Berger-Wolf
Yu-Chuan Su
VLM
72
2
0
10 Feb 2025
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Jinyang Wu
Mingkuan Feng
Shuai Zhang
Ruihan Jin
Feihu Che
Zengqi Wen
J. Tao
LRM
68
8
0
04 Feb 2025
Beyond Token Compression: A Training-Free Reduction Framework for Efficient Visual Processing in MLLMs
Beyond Token Compression: A Training-Free Reduction Framework for Efficient Visual Processing in MLLMs
Hongliang Li
Jiaxin Zhang
Wenhui Liao
Dezhi Peng
Kai Ding
Lianwen Jin
OffRL
MQ
71
0
0
31 Jan 2025
ImageRef-VL: Enabling Contextual Image Referencing in Vision-Language Models
ImageRef-VL: Enabling Contextual Image Referencing in Vision-Language Models
Jingwei Yi
Junhao Yin
Ju Xu
Peng Bao
Y. Wang
Wei Fan
H. Wang
42
0
0
20 Jan 2025
LEO: Boosting Mixture of Vision Encoders for Multimodal Large Language Models
LEO: Boosting Mixture of Vision Encoders for Multimodal Large Language Models
Mozhgan Nasr Azadani
James Riddell
Sean Sedwards
Krzysztof Czarnecki
MLLM
VLM
44
2
0
13 Jan 2025
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Xuanle Zhao
Xianzhen Luo
Qi Shi
C. L. P. Chen
Shuo Wang
Wanxiang Che
Zhiyuan Liu
Maosong Sun
MLLM
54
2
0
11 Jan 2025
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
Ruilin Luo
Zhuofan Zheng
Yifan Wang
Yiyao Yu
Xinzhe Ni
Zicheng Lin
Jin Zeng
Yujiu Yang
LRM
68
12
0
08 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
86
11
0
06 Jan 2025
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Yuhui Zhang
Yuchang Su
Yiming Liu
Xiaohan Wang
James Burgess
...
Josiah Aklilu
Alejandro Lozano
Anjiang Wei
Ludwig Schmidt
Serena Yeung-Levy
50
3
0
06 Jan 2025
Exploring the Implicit Semantic Ability of Multimodal Large Language Models: A Pilot Study on Entity Set Expansion
Hebin Wang
Yangning Li
Yinghui Li
Hai-Tao Zheng
Wenhao Jiang
Hong-Gee Kim
37
0
0
03 Jan 2025
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Wenqi Zhang
Hang Zhang
Xin Li
Jiashuo Sun
Yongliang Shen
Weiming Lu
Deli Zhao
Yueting Zhuang
Lidong Bing
VLM
37
2
0
01 Jan 2025
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLM
VLM
102
1
0
20 Dec 2024
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
Yipeng Zhang
Y. Liu
Zonghao Guo
Yidan Zhang
Xuesong Yang
...
Yuan Yao
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
Maosong Sun
MLLM
VLM
81
0
0
18 Dec 2024
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
Zihui Cheng
Qiguang Chen
Jin Zhang
Hao Fei
Xiaocheng Feng
Wanxiang Che
Min Li
L. Qin
VLM
MLLM
LRM
75
3
0
17 Dec 2024
Attention-driven GUI Grounding: Leveraging Pretrained Multimodal Large
  Language Models without Fine-Tuning
Attention-driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models without Fine-Tuning
Hai-Ming Xu
Qi Chen
Lei Wang
Lingqiao Liu
62
1
0
14 Dec 2024
Falcon-UI: Understanding GUI Before Following User Instructions
Falcon-UI: Understanding GUI Before Following User Instructions
Huawen Shen
Chang-Shu Liu
Gengluo Li
Xinlong Wang
Yu Zhou
Can Ma
Xiangyang Ji
LLMAG
77
4
0
12 Dec 2024
MIT-10M: A Large Scale Parallel Corpus of Multilingual Image Translation
MIT-10M: A Large Scale Parallel Corpus of Multilingual Image Translation
Bo Li
Shaolin Zhu
Lijie Wen
VLM
74
0
0
10 Dec 2024
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Lu Qiu
Yuying Ge
Yi Chen
Yixiao Ge
Ying Shan
Xihui Liu
LLMAG
LRM
94
5
0
05 Dec 2024
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
Xueqing Wu
Yuheng Ding
Bingxuan Li
Pan Lu
Da Yin
Kai-Wei Chang
Nanyun Peng
LRM
100
3
0
03 Dec 2024
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision
  Language Models
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee
Ryo Hachiuma
Yu-Chiang Frank Wang
Y. Ro
Yueh-Hua Wu
VLM
81
0
0
02 Dec 2024
Previous
12345
Next