14
0

STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives

Abstract

This paper introduces StoryAnchors, a unified framework for generating high-quality, multi-scene story frames with strong temporal consistency. The framework employs a bidirectional story generator that integrates both past and future contexts to ensure temporal consistency, character continuity, and smooth scene transitions throughout the narrative. Specific conditions are introduced to distinguish story frame generation from standard video synthesis, facilitating greater scene diversity and enhancing narrative richness. To further improve generation quality, StoryAnchors integrates Multi-Event Story Frame Labeling and Progressive Story Frame Training, enabling the model to capture both overarching narrative flow and event-level dynamics. This approach supports the creation of editable and expandable story frames, allowing for manual modifications and the generation of longer, more complex sequences. Extensive experiments show that StoryAnchors outperforms existing open-source models in key areas such as consistency, narrative coherence, and scene diversity. Its performance in narrative consistency and story richness is also on par with GPT-4o. Ultimately, StoryAnchors pushes the boundaries of story-driven frame generation, offering a scalable, flexible, and highly editable foundation for future research.

View on arXiv
@article{wang2025_2505.08350,
  title={ STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives },
  author={ Bo Wang and Haoyang Huang and Zhiyin Lu and Fengyuan Liu and Guoqing Ma and Jianlong Yuan and Yuan Zhang and Nan Duan },
  journal={arXiv preprint arXiv:2505.08350},
  year={ 2025 }
}
Comments on this paper