
Title |
|---|
![]() Enhance Multimodal Consistency and Coherence for Text-Image Plan GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
![]() Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflectionInternational Conference on Learning Representations (ICLR), 2025 |
![]() VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video
PromptingAAAI Conference on Artificial Intelligence (AAAI), 2024 |
![]() ACDC: Autoregressive Coherent Multimodal Generation using Diffusion
Correction Hyungjin Chung Dohun Lee Jong Chul Ye |
![]() VEDIT: Latent Prediction Architecture For Procedural Video
Representation LearningInternational Conference on Learning Representations (ICLR), 2024 |
![]() Show-o: One Single Transformer to Unify Multimodal Understanding and
GenerationInternational Conference on Learning Representations (ICLR), 2024 Jinheng Xie Weijia Mao Zechen Bai David Junhao Zhang Weihao Wang Kevin Qinghong Lin Yuchao Gu Zhijie Chen Zhenheng Yang Mike Zheng Shou |