Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2403.08857
Cited By
v1
v2
v3 (latest)
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
13 March 2024
Minbin Huang
Yanxin Long
Xinchi Deng
Ruihang Chu
Jiangfeng Xiong
Xiaodan Liang
Hong Cheng
Qinglin Lu
Wei Liu
MLLM
EGVM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Papers citing
"DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation"
41 / 41 papers shown
Title
MIRA: Multimodal Iterative Reasoning Agent for Image Editing
Ziyun Zeng
Hang Hua
Jiebo Luo
KELM
LM&Ro
LRM
202
0
0
26 Nov 2025
Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion
Zhuo Li
Junjia Liu
Zhipeng Dong
Tao Teng
Quentin Rouxel
D. Caldwell
Fei Chen
64
0
0
18 Nov 2025
ContextualLVLM-Agent: A Holistic Framework for Multi-Turn Visually-Grounded Dialogue and Complex Instruction Following
Seungmin Han
Haeun Kwon
Ji-jun Park
Taeyang Yoon
LRM
64
1
0
21 Aug 2025
GenTune: Toward Traceable Prompts to Improve Controllability of Image Refinement in Environment Design
ACM Symposium on User Interface Software and Technology (UIST), 2025
Wen-Fan Wang
Ting-Ying Lee
Chien-Ting Lu
Che-Wei Hsu
Nil Ponsa Campany
Yu-Mei Chen
Mike Y. Chen
Bing-Yu Chen
DiffM
103
1
0
21 Aug 2025
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning
Yiming Ren
Zhiqiang Lin
Yu Li
Gao Meng
Weiyun Wang
...
Zicheng Lin
Jifeng Dai
Yujiu Yang
Wenhai Wang
Ruihang Chu
112
3
0
17 Jul 2025
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention
Yucheng Li
Huiqiang Jiang
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Jianfeng Gao
Yue Yang
Lili Qiu
274
16
0
22 Apr 2025
OmniSVG: A Unified Scalable Vector Graphics Generation Model
Yiying Yang
Wei Cheng
Sijin Chen
Xianfang Zeng
Jiaxu Zhang
Liao Wang
Gang Yu
Jiabo He
Xingjun Ma
Yu Jiang
VLM
388
19
0
08 Apr 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Neural Information Processing Systems (NeurIPS), 2024
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
647
113
0
03 Jan 2025
Unbounded: A Generative Infinite Game of Character Life Simulation
International Conference on Learning Representations (ICLR), 2024
Jialu Li
Yuanzhen Li
Neal Wadhwa
Yael Pritch
David E. Jacobs
Michael Rubinstein
Joey Tianyi Zhou
Nataniel Ruiz
VGen
AI4CE
244
11
0
24 Oct 2024
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
Junhao Cheng
Xi Lu
Hanhui Li
Khun Loun Zai
Baiqiao Yin
Yuhao Cheng
Yiqiang Yan
Xiaodan Liang
DiffM
VGen
309
15
0
03 Jun 2024
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Zhimin Li
Jianwei Zhang
Qin Lin
Jiangfeng Xiong
Yanxin Long
...
Wei Liu
Dingyong Wang
Yong Yang
Jie Jiang
Qinglin Lu
ViT
217
211
0
14 May 2024
TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation
Junhao Cheng
Baiqiao Yin
Kaixin Cai
Minbin Huang
Hanhui Li
...
Yue Li
Yifei Li
Yuhao Cheng
Yiqiang Yan
Xiaodan Liang
DiffM
MLLM
282
17
0
29 Apr 2024
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
International Conference on Machine Learning (ICML), 2024
Ling Yang
Zhaochen Yu
Chenlin Meng
Minkai Xu
Stefano Ermon
Tengjiao Wang
CoGe
DiffM
421
186
0
22 Jan 2024
A Survey of Reasoning with Foundation Models
Jiankai Sun
Chuanyang Zheng
Enze Xie
Zhengying Liu
Ruihang Chu
...
Xipeng Qiu
Yi-Chen Guo
Hui Xiong
Qun Liu
Zhenguo Li
ReLM
LRM
AI4CE
484
46
0
17 Dec 2023
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Wei-Ge Chen
Irina Spiridonova
Jianwei Yang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
154
45
0
01 Nov 2023
Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations
Polydoros Giannouris
Zinan Zheng
Ning Wu
Ming Gong
Yangqiu Song
Dongmei Zhang
Jia Li
LRM
340
58
0
31 Oct 2023
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
Zeqiang Lai
Xizhou Zhu
Jifeng Dai
Yu Qiao
Wenhai Wang
MLLM
DiffM
172
24
0
11 Oct 2023
Improved Baselines with Visual Instruction Tuning
Computer Vision and Pattern Recognition (CVPR), 2023
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLM
MLLM
516
3,978
0
05 Oct 2023
Making LLaMA SEE and Draw with SEED Tokenizer
International Conference on Learning Representations (ICLR), 2023
Yuying Ge
Sijie Zhao
Ziyun Zeng
Yixiao Ge
Chen Li
Xintao Wang
Ying Shan
141
174
0
02 Oct 2023
NExT-GPT: Any-to-Any Multimodal LLM
International Conference on Machine Learning (ICML), 2023
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
MLLM
262
685
0
11 Sep 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
5.2K
14,855
0
18 Jul 2023
Emu: Generative Pretraining in Multimodality
International Conference on Learning Representations (ICLR), 2023
Quan-Sen Sun
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Yueze Wang
Hongcheng Gao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
281
154
0
11 Jul 2023
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Hang Zhang
Xin Li
Lidong Bing
MLLM
498
1,434
0
05 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Neural Information Processing Systems (NeurIPS), 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
719
6,327
0
29 May 2023
Generating Images with Multimodal Language Models
Neural Information Processing Systems (NeurIPS), 2023
Jing Yu Koh
Daniel Fried
Ruslan Salakhutdinov
MLLM
271
319
0
26 May 2023
Visual Instruction Tuning
Neural Information Processing Systems (NeurIPS), 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
862
7,083
0
17 Apr 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
2.6K
17,188
0
27 Feb 2023
Grounding Language Models to Images for Multimodal Inputs and Outputs
International Conference on Machine Learning (ICML), 2023
Jing Yu Koh
Ruslan Salakhutdinov
Daniel Fried
MLLM
326
148
0
31 Jan 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
International Conference on Machine Learning (ICML), 2023
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
1.1K
6,368
0
30 Jan 2023
Scalable Diffusion Models with Transformers
IEEE International Conference on Computer Vision (ICCV), 2022
William S. Peebles
Saining Xie
GNN
1.3K
3,938
0
19 Dec 2022
MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Jiazhan Feng
Qingfeng Sun
Can Xu
Lu Wang
Yaming Yang
Chongyang Tao
Dongyan Zhao
Qingwei Lin
223
66
0
10 Nov 2022
OPT: Open Pre-trained Transformer Language Models
Susan Zhang
Stephen Roller
Naman Goyal
Mikel Artetxe
Moya Chen
...
Daniel Simig
Punit Singh Koura
Anjali Sridhar
Tianlu Wang
Luke Zettlemoyer
VLM
OSLM
AI4CE
735
4,292
0
02 May 2022
PaLM: Scaling Language Modeling with Pathways
Journal of machine learning research (JMLR), 2022
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
1.1K
7,275
0
05 Apr 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Neural Information Processing Systems (NeurIPS), 2022
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
2.1K
13,906
0
28 Jan 2022
High-Resolution Image Synthesis with Latent Diffusion Models
Computer Vision and Pattern Recognition (CVPR), 2021
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
DiffM
1.2K
20,333
0
20 Dec 2021
Learning Transferable Visual Models From Natural Language Supervision
International Conference on Machine Learning (ICML), 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
1.9K
39,712
0
26 Feb 2021
Score-Based Generative Modeling through Stochastic Differential Equations
International Conference on Learning Representations (ICLR), 2020
Yang Song
Jascha Narain Sohl-Dickstein
Diederik P. Kingma
Abhishek Kumar
Stefano Ermon
Ben Poole
DiffM
SyDa
1.0K
8,511
0
26 Nov 2020
Denoising Diffusion Implicit Models
International Conference on Learning Representations (ICLR), 2020
Jiaming Song
Chenlin Meng
Stefano Ermon
VLM
DiffM
1.1K
9,859
0
06 Oct 2020
Denoising Diffusion Probabilistic Models
Jonathan Ho
Ajay Jain
Pieter Abbeel
DiffM
3.9K
24,679
0
19 Jun 2020
Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction
IEEE International Conference on Computer Vision (ICCV), 2018
Alaaeldin El-Nouby
Shikhar Sharma
Hannes Schulz
Devon Hjelm
Layla El Asri
Samira Ebrahimi Kahou
Yoshua Bengio
Graham W.Taylor
VLM
242
127
0
24 Nov 2018
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
Han Zhang
Tao Xu
Jiaming Song
Shaoting Zhang
Xiaogang Wang
Xiaolei Huang
Dimitris N. Metaxas
GAN
295
2,869
0
10 Dec 2016
1