ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.17811
  4. Cited By
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

29 January 2025
Xiaokang Chen
Zhiyu Wu
Xingchao Liu
Zizheng Pan
Wen Liu
Zhenda Xie
X. Yu
Chong Ruan
    AI4TS
ArXivPDFHTML

Papers citing "Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling"

38 / 88 papers shown
Title
Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models
Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models
Wei Chen
Xin Yan
Bin Wen
Fan Yang
Yan Li
Di Zhang
Long Chen
MLLM
127
0
0
09 Apr 2025
Transfer between Modalities with MetaQueries
Transfer between Modalities with MetaQueries
Xichen Pan
Satya Narayan Shukla
Aashu Singh
Zhuokai Zhao
Shlok Kumar Mishra
...
Jiuhai Chen
Kunpeng Li
F. Xu
Ji Hou
Saining Xie
DiffM
65
12
0
08 Apr 2025
RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with a Multi-Modal Dataset and Retrieval-Augmented Generation Model
RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with a Multi-Modal Dataset and Retrieval-Augmented Generation Model
Congcong Wen
Yiting Lin
Xiaokang Qu
Nan Li
Yong Liao
Hui Lin
Xiang Li
50
0
0
07 Apr 2025
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
Yang Jiao
Haibo Qiu
Zequn Jie
Tian Jin
Jingjing Chen
Lin Ma
Yu Jiang
67
5
0
06 Apr 2025
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
Wulin Xie
Yize Zhang
Chaoyou Fu
Yang Shi
Bingyan Nie
Hongkai Chen
Zheng Zhang
Liang Wang
Tieniu Tan
63
2
0
04 Apr 2025
CliME: Evaluating Multimodal Climate Discourse on Social Media and the Climate Alignment Quotient (CAQ)
CliME: Evaluating Multimodal Climate Discourse on Social Media and the Climate Alignment Quotient (CAQ)
Abhilekh Borah
Hasnat Md Abdullah
Kangda Wei
Ruihong Huang
33
0
0
04 Apr 2025
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation
Zhiyuan Yan
Junyan Ye
Weijia Li
Zilong Huang
Shenghai Yuan
Xiangyang He
Kaiqing Lin
Jun-Jian He
Conghui He
Li Yuan
MLLM
EGVM
115
16
0
03 Apr 2025
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
Runhui Huang
Chunwei Wang
Junwei Yang
Guansong Lu
Yunlong Yuan
...
Lu Hou
Wei Zhang
Lanqing Hong
Hengshuang Zhao
Hang Xu
MLLM
109
5
0
02 Apr 2025
A Large Scale Analysis of Gender Biases in Text-to-Image Generative Models
A Large Scale Analysis of Gender Biases in Text-to-Image Generative Models
Leander Girrbach
Stephan Alaniz
Genevieve Smith
Zeynep Akata
85
0
0
30 Mar 2025
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu
Wentao Zhang
Lumin Xu
Sheng Jin
Zhonghua Wu
Qingyi Tao
Wentao Liu
Wei Li
Chen Change Loy
VGen
350
5
0
27 Mar 2025
UGen: Unified Autoregressive Multimodal Model with Progressive Vocabulary Learning
UGen: Unified Autoregressive Multimodal Model with Progressive Vocabulary Learning
Hongxuan Tang
Hao Liu
Xinyan Xiao
55
1
0
27 Mar 2025
Can Video Diffusion Model Reconstruct 4D Geometry?
Can Video Diffusion Model Reconstruct 4D Geometry?
Jinjie Mai
Wenxuan Zhu
Haozhe Liu
Bing Li
Cheng Zheng
Jürgen Schmidhuber
Bernard Ghanem
VGen
MDE
109
0
0
27 Mar 2025
Efficient Multi-Instance Generation with Janus-Pro-Dirven Prompt Parsing
Efficient Multi-Instance Generation with Janus-Pro-Dirven Prompt Parsing
Fan Qi
Yu Duan
Changsheng Xu
DiffM
66
0
0
27 Mar 2025
Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance
Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance
Jaywon Koo
J. Hernandez
Moayed Haji-Ali
Ziyan Yang
Vicente Ordonez
EGVM
87
0
0
27 Mar 2025
DomainCQA: Crafting Expert-Level QA from Domain-Specific Charts
DomainCQA: Crafting Expert-Level QA from Domain-Specific Charts
Ling Zhong
Yujing Lu
Jing Yang
Weiming Li
Peng Wei
Yongheng Wang
Manni Duan
Qing Zhang
104
2
0
25 Mar 2025
MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow
MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow
Ziyue Wang
Junde Wu
Linghan Cai
Chang Han Low
Xihong Yang
Qiaxuan Li
Yueming Jin
LRM
89
2
0
21 Mar 2025
Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings
Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings
Zonghao Ying
Guangyi Zheng
Yongxin Huang
Deyue Zhang
Wenxin Zhang
Quanchen Zou
Aishan Liu
Xianglong Liu
Dacheng Tao
ELM
94
9
0
19 Mar 2025
Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Junming Liu
Siyuan Meng
Yanting Gao
Song Mao
Pinlong Cai
Guohang Yan
Yirong Chen
Zilin Bian
Botian Shi
Ding Wang
67
1
0
17 Mar 2025
Unified Autoregressive Visual Generation and Understanding with Continuous Tokens
Unified Autoregressive Visual Generation and Understanding with Continuous Tokens
Lijie Fan
Luming Tang
Siyang Qin
Tianhong Li
Xuan S. Yang
...
Tao Zhu
Michael Rubinstein
Michalis Raptis
Deqing Sun
Radu Soricut
69
6
0
17 Mar 2025
PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models
PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models
Runze He
Bo Cheng
Yuhang Ma
Qingxiang Jia
Shanyuan Liu
Ao Ma
Xiaoyu Wu
Liebucha Wu
Dawei Leng
Yuhui Yin
DiffM
VLM
107
0
0
13 Mar 2025
ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content
Bhavik Chandna
Mariam Aboujenane
Usman Naseem
70
0
0
13 Mar 2025
DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation
DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation
Chen Chen
Rui Qian
Wenze Hu
Tsu-Jui Fu
Jialing Tong
...
Lezhi Li
Bowen Zhang
Alex Schwing
Wei Liu
Yue Yang
84
0
0
13 Mar 2025
Neighboring Autoregressive Modeling for Efficient Visual Generation
Yefei He
Yuanyu He
Shaoxuan He
Feng Chen
Hong Zhou
Kai Zhang
Bohan Zhuang
69
2
0
12 Mar 2025
Should VLMs be Pre-trained with Image Data?
Sedrick Scott Keh
Jean Mercat
S. Gadre
Kushal Arora
Igor Vasiljevic
...
Shuran Song
Russ Tedrake
Thomas Kollar
Ludwig Schmidt
Achal Dave
VLM
77
0
0
10 Mar 2025
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Yingzhe Peng
Gongrui Zhang
Miaosen Zhang
Zhiyuan You
Jie Liu
Qipeng Zhu
Kai Yang
Xingzhong Xu
Xin Geng
Xu Yang
LRM
ReLM
134
52
0
10 Mar 2025
V2Flow: Unifying Visual Tokenization and Large Language Model Vocabularies for Autoregressive Image Generation
Guiwei Zhang
Tianyu Zhang
Mohan Zhou
Yalong Bai
Biye Li
108
0
0
10 Mar 2025
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
Yuwei Niu
Munan Ning
Mengren Zheng
Weiyang Jin
Bin Lin
...
Jiaqi Liao
Chaoran Feng
Kunpeng Ning
Bin Zhu
Li Yuan
EGVM
98
23
0
10 Mar 2025
FunBench: Benchmarking Fundus Reading Skills of MLLMs
Qijie Wei
Kaiheng Qian
Xirong Li
82
1
0
02 Mar 2025
MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models
MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models
Qiao Yan
Yuchen Yuan
Xiaowei Hu
Yihan Wang
Jiaqi Xu
Jinpeng Li
Chi-Wing Fu
Pheng Ann Heng
MLLM
VLM
LM&MA
71
0
0
28 Feb 2025
LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation
LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation
Pengzhi Li
Pengfei Yu
Zide Liu
Wei He
Xuhao Pan
Xudong Rao
Tao Wei
Wei Chen
VLM
120
0
0
25 Feb 2025
Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT
Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT
Nidhal Jegham
Marwan Abdelatti
Abdeltawab Hendawi
VLM
LRM
64
2
0
23 Feb 2025
Forgotten Polygons: Multimodal Large Language Models are Shape-Blind
Forgotten Polygons: Multimodal Large Language Models are Shape-Blind
William Rudman
Michal Golovanesky
Amir Bar
Vedant Palit
Yann LeCun
Carsten Eickhoff
Ritambhara Singh
LRM
92
3
0
21 Feb 2025
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Shuo Xing
Yuping Wang
Peiran Li
Ruizheng Bai
Yansen Wang
Chan-wei Hu
Chengxuan Qian
Huaxiu Yao
Zhengzhong Tu
135
6
0
18 Feb 2025
From Principles to Applications: A Comprehensive Survey of Discrete Tokenizers in Generation, Comprehension, Recommendation, and Information Retrieval
From Principles to Applications: A Comprehensive Survey of Discrete Tokenizers in Generation, Comprehension, Recommendation, and Information Retrieval
Jian Jia
Jingtong Gao
Ben Xue
Junhao Wang
Qingpeng Cai
Quan Chen
Xiangyu Zhao
Peng Jiang
Kun Gai
OffRL
103
1
0
18 Feb 2025
DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities
DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities
Chashi Mahiul Islam
Samuel Jacob Chacko
Preston Horne
Xiuwen Liu
125
1
0
11 Feb 2025
Next Patch Prediction for Autoregressive Visual Generation
Next Patch Prediction for Autoregressive Visual Generation
Yatian Pang
Peng Jin
Shuo Yang
Bin Lin
Bin Zhu
...
Liuhan Chen
Francis E. H. Tay
Ser-Nam Lim
Harry Yang
Li Yuan
172
10
0
19 Dec 2024
S$^4$ST: A Strong, Self-transferable, faSt, and Simple Scale Transformation for Transferable Targeted Attack
S4^44ST: A Strong, Self-transferable, faSt, and Simple Scale Transformation for Transferable Targeted Attack
Yongxiang Liu
Bowen Peng
Li Liu
Xuzhao Li
268
0
0
13 Oct 2024
Hallucination of Multimodal Large Language Models: A Survey
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
125
167
0
29 Apr 2024
Previous
12