ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.08857
  4. Cited By
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
v1v2v3 (latest)

DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation

13 March 2024
Minbin Huang
Yanxin Long
Xinchi Deng
Ruihang Chu
Jiangfeng Xiong
Xiaodan Liang
Hong Cheng
Qinglin Lu
Wei Liu
    MLLMEGVM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation"

41 / 41 papers shown
Title
MIRA: Multimodal Iterative Reasoning Agent for Image Editing
MIRA: Multimodal Iterative Reasoning Agent for Image Editing
Ziyun Zeng
Hang Hua
Jiebo Luo
KELMLM&RoLRM
202
0
0
26 Nov 2025
Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion
Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion
Zhuo Li
Junjia Liu
Zhipeng Dong
Tao Teng
Quentin Rouxel
D. Caldwell
Fei Chen
64
0
0
18 Nov 2025
ContextualLVLM-Agent: A Holistic Framework for Multi-Turn Visually-Grounded Dialogue and Complex Instruction Following
ContextualLVLM-Agent: A Holistic Framework for Multi-Turn Visually-Grounded Dialogue and Complex Instruction Following
Seungmin Han
Haeun Kwon
Ji-jun Park
Taeyang Yoon
LRM
64
1
0
21 Aug 2025
GenTune: Toward Traceable Prompts to Improve Controllability of Image Refinement in Environment Design
GenTune: Toward Traceable Prompts to Improve Controllability of Image Refinement in Environment DesignACM Symposium on User Interface Software and Technology (UIST), 2025
Wen-Fan Wang
Ting-Ying Lee
Chien-Ting Lu
Che-Wei Hsu
Nil Ponsa Campany
Yu-Mei Chen
Mike Y. Chen
Bing-Yu Chen
DiffM
103
1
0
21 Aug 2025
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning
Yiming Ren
Zhiqiang Lin
Yu Li
Gao Meng
Weiyun Wang
...
Zicheng Lin
Jifeng Dai
Yujiu Yang
Wenhai Wang
Ruihang Chu
112
3
0
17 Jul 2025
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention
Yucheng Li
Huiqiang Jiang
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Jianfeng Gao
Yue Yang
Lili Qiu
274
16
0
22 Apr 2025
OmniSVG: A Unified Scalable Vector Graphics Generation Model
OmniSVG: A Unified Scalable Vector Graphics Generation Model
Yiying Yang
Wei Cheng
Sijin Chen
Xianfang Zeng
Jiaxu Zhang
Liao Wang
Gang Yu
Jiabo He
Xingjun Ma
Yu Jiang
VLM
388
19
0
08 Apr 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language TasksNeural Information Processing Systems (NeurIPS), 2024
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLMVLMLRM
647
113
0
03 Jan 2025
Unbounded: A Generative Infinite Game of Character Life Simulation
Unbounded: A Generative Infinite Game of Character Life SimulationInternational Conference on Learning Representations (ICLR), 2024
Jialu Li
Yuanzhen Li
Neal Wadhwa
Yael Pritch
David E. Jacobs
Michael Rubinstein
Joey Tianyi Zhou
Nataniel Ruiz
VGenAI4CE
244
11
0
24 Oct 2024
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
Junhao Cheng
Xi Lu
Hanhui Li
Khun Loun Zai
Baiqiao Yin
Yuhao Cheng
Yiqiang Yan
Xiaodan Liang
DiffMVGen
309
15
0
03 Jun 2024
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with
  Fine-Grained Chinese Understanding
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Zhimin Li
Jianwei Zhang
Qin Lin
Jiangfeng Xiong
Yanxin Long
...
Wei Liu
Dingyong Wang
Yong Yang
Jie Jiang
Qinglin Lu
ViT
217
211
0
14 May 2024
TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation
TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation
Junhao Cheng
Baiqiao Yin
Kaixin Cai
Minbin Huang
Hanhui Li
...
Yue Li
Yifei Li
Yuhao Cheng
Yiqiang Yan
Xiaodan Liang
DiffMMLLM
282
17
0
29 Apr 2024
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and
  Generating with Multimodal LLMs
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMsInternational Conference on Machine Learning (ICML), 2024
Ling Yang
Zhaochen Yu
Chenlin Meng
Minkai Xu
Stefano Ermon
Tengjiao Wang
CoGeDiffM
421
186
0
22 Jan 2024
A Survey of Reasoning with Foundation Models
A Survey of Reasoning with Foundation Models
Jiankai Sun
Chuanyang Zheng
Enze Xie
Zhengying Liu
Ruihang Chu
...
Xipeng Qiu
Yi-Chen Guo
Hui Xiong
Qun Liu
Zhenguo Li
ReLMLRMAI4CE
484
46
0
17 Dec 2023
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation,
  Generation and Editing
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Wei-Ge Chen
Irina Spiridonova
Jianwei Yang
Jianfeng Gao
Chun-yue Li
MLLMVLM
154
45
0
01 Nov 2023
Breaking Language Barriers in Multilingual Mathematical Reasoning:
  Insights and Observations
Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations
Polydoros Giannouris
Zinan Zheng
Ning Wu
Ming Gong
Yangqiu Song
Dongmei Zhang
Jia Li
LRM
340
58
0
31 Oct 2023
Mini-DALLE3: Interactive Text to Image by Prompting Large Language
  Models
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
Zeqiang Lai
Xizhou Zhu
Jifeng Dai
Yu Qiao
Wenhai Wang
MLLMDiffM
172
24
0
11 Oct 2023
Improved Baselines with Visual Instruction Tuning
Improved Baselines with Visual Instruction TuningComputer Vision and Pattern Recognition (CVPR), 2023
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLMMLLM
516
3,978
0
05 Oct 2023
Making LLaMA SEE and Draw with SEED Tokenizer
Making LLaMA SEE and Draw with SEED TokenizerInternational Conference on Learning Representations (ICLR), 2023
Yuying Ge
Sijie Zhao
Ziyun Zeng
Yixiao Ge
Chen Li
Xintao Wang
Ying Shan
141
174
0
02 Oct 2023
NExT-GPT: Any-to-Any Multimodal LLM
NExT-GPT: Any-to-Any Multimodal LLMInternational Conference on Machine Learning (ICML), 2023
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
MLLM
262
685
0
11 Sep 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
5.2K
14,855
0
18 Jul 2023
Emu: Generative Pretraining in Multimodality
Emu: Generative Pretraining in MultimodalityInternational Conference on Learning Representations (ICLR), 2023
Quan-Sen Sun
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Yueze Wang
Hongcheng Gao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
281
154
0
11 Jul 2023
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video
  Understanding
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Hang Zhang
Xin Li
Lidong Bing
MLLM
498
1,434
0
05 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward ModelNeural Information Processing Systems (NeurIPS), 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
719
6,327
0
29 May 2023
Generating Images with Multimodal Language Models
Generating Images with Multimodal Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Jing Yu Koh
Daniel Fried
Ruslan Salakhutdinov
MLLM
271
319
0
26 May 2023
Visual Instruction Tuning
Visual Instruction TuningNeural Information Processing Systems (NeurIPS), 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDaVLMMLLM
862
7,083
0
17 Apr 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
2.6K
17,188
0
27 Feb 2023
Grounding Language Models to Images for Multimodal Inputs and Outputs
Grounding Language Models to Images for Multimodal Inputs and OutputsInternational Conference on Machine Learning (ICML), 2023
Jing Yu Koh
Ruslan Salakhutdinov
Daniel Fried
MLLM
326
148
0
31 Jan 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsInternational Conference on Machine Learning (ICML), 2023
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
1.1K
6,368
0
30 Jan 2023
Scalable Diffusion Models with Transformers
Scalable Diffusion Models with TransformersIEEE International Conference on Computer Vision (ICCV), 2022
William S. Peebles
Saining Xie
GNN
1.3K
3,938
0
19 Dec 2022
MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal
  Open-domain Conversation
MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain ConversationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Jiazhan Feng
Qingfeng Sun
Can Xu
Lu Wang
Yaming Yang
Chongyang Tao
Dongyan Zhao
Qingwei Lin
223
66
0
10 Nov 2022
OPT: Open Pre-trained Transformer Language Models
OPT: Open Pre-trained Transformer Language Models
Susan Zhang
Stephen Roller
Naman Goyal
Mikel Artetxe
Moya Chen
...
Daniel Simig
Punit Singh Koura
Anjali Sridhar
Tianlu Wang
Luke Zettlemoyer
VLMOSLMAI4CE
735
4,292
0
02 May 2022
PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with PathwaysJournal of machine learning research (JMLR), 2022
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILMLRM
1.1K
7,275
0
05 Apr 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
2.1K
13,906
0
28 Jan 2022
High-Resolution Image Synthesis with Latent Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2021
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
DiffM
1.2K
20,333
0
20 Dec 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language SupervisionInternational Conference on Machine Learning (ICML), 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
1.9K
39,712
0
26 Feb 2021
Score-Based Generative Modeling through Stochastic Differential
  Equations
Score-Based Generative Modeling through Stochastic Differential EquationsInternational Conference on Learning Representations (ICLR), 2020
Yang Song
Jascha Narain Sohl-Dickstein
Diederik P. Kingma
Abhishek Kumar
Stefano Ermon
Ben Poole
DiffMSyDa
1.0K
8,511
0
26 Nov 2020
Denoising Diffusion Implicit Models
Denoising Diffusion Implicit ModelsInternational Conference on Learning Representations (ICLR), 2020
Jiaming Song
Chenlin Meng
Stefano Ermon
VLMDiffM
1.1K
9,859
0
06 Oct 2020
Denoising Diffusion Probabilistic Models
Denoising Diffusion Probabilistic Models
Jonathan Ho
Ajay Jain
Pieter Abbeel
DiffM
3.9K
24,679
0
19 Jun 2020
Tell, Draw, and Repeat: Generating and Modifying Images Based on
  Continual Linguistic Instruction
Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic InstructionIEEE International Conference on Computer Vision (ICCV), 2018
Alaaeldin El-Nouby
Shikhar Sharma
Hannes Schulz
Devon Hjelm
Layla El Asri
Samira Ebrahimi Kahou
Yoshua Bengio
Graham W.Taylor
VLM
242
127
0
24 Nov 2018
StackGAN: Text to Photo-realistic Image Synthesis with Stacked
  Generative Adversarial Networks
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
Han Zhang
Tao Xu
Jiaming Song
Shaoting Zhang
Xiaogang Wang
Xiaolei Huang
Dimitris N. Metaxas
GAN
295
2,869
0
10 Dec 2016
1