Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.02252
Cited By
StoryGPT-V: Large Language Models as Consistent Story Visualizers
4 December 2023
Xiaoqian Shen
Mohamed Elhoseiny
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"StoryGPT-V: Large Language Models as Consistent Story Visualizers"
14 / 14 papers shown
Title
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
Mohamed Gado
Towhid Taliee
Muhammad Memon
D. Ignatov
Radu Timofte
34
41
0
27 Apr 2025
Consistent Subject Generation via Contrastive Instantiated Concepts
Lee Hsin-Ying
Kelvin Chan
Ming Yang
DiffM
56
0
0
31 Mar 2025
MirrorStories: Reflecting Diversity through Personalized Narrative Generation with Large Language Models
Sarfaroz Yunusov
Hamza Sidat
Ali Emami
33
0
0
20 Sep 2024
SEED-Story: Multimodal Long Story Generation with Large Language Model
Shuai Yang
Yuying Ge
Yang Li
Yukang Chen
Yixiao Ge
Ying Shan
Yingcong Chen
VGen
DiffM
48
1
0
11 Jul 2024
Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models
Fei Shen
Hu Ye
Sibo Liu
Jun Zhang
Cong Wang
Xiao Han
Wei Yang
49
1
0
02 Jul 2024
Masked Generative Story Transformer with Character Guidance and Caption Augmentation
Christos Papadimitriou
Giorgos Filandrianos
Maria Lymperaiou
Giorgos Stamou
DiffM
54
1
0
13 Mar 2024
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
126
280
0
14 Oct 2023
A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering
Chaoning Zhang
Fachrina Dewi Puspitasari
Sheng Zheng
Chenghao Li
Yu Qiao
...
Caiyan Qin
François Rameau
Lik-Hang Lee
Sung-Ho Bae
Choong Seon Hong
VLM
34
1
0
12 May 2023
InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
Jing Shi
Wei Xiong
Zhe-nan Lin
H. J. Jung
DiffM
100
172
0
06 Apr 2023
Video-P2P: Video Editing with Cross-attention Control
Shaoteng Liu
Yuechen Zhang
Wenbo Li
Zhe-nan Lin
Jiaya Jia
DiffM
VGen
106
131
0
08 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
207
1,899
0
30 Jan 2023
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
239
3,790
0
24 Feb 2021
Imagine This! Scripts to Compositions to Videos
Tanmay Gupta
Dustin Schwenk
Ali Farhadi
Derek Hoiem
Aniruddha Kembhavi
CoGe
VGen
83
76
0
10 Apr 2018
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
200
9,999
0
18 May 2015
1