Toyteller: AI-powered Visual Storytelling Through Toy-Playing with Character Symbols

International Conference on Human Factors in Computing Systems (CHI), 2025

23 January 2025

John Joon Young Chung

Main:21 Pages

23 Figures

Bibliography:1 Pages

2 Tables

Appendix:1 Pages

Abstract

We introduce Toyteller, an AI-powered storytelling system where users generate a mix of story text and visuals by directly manipulating character symbols like they are toy-playing. Anthropomorphized symbol motions can convey rich and nuanced social interactions; Toyteller leverages these motions (1) to let users steer story text generation and (2) as a visual output format that accompanies story text. We enabled motion-steered text generation and text-steered motion generation by mapping motions and text onto a shared semantic space so that large language models and motion generation models can use it as a translational layer. Technical evaluations showed that Toyteller outperforms a competitive baseline, GPT-4o. Our user study identified that toy-playing helps express intentions difficult to verbalize. However, only motions could not express all user intentions, suggesting combining it with other modalities like language. We discuss the design space of toy-playing interactions and implications for technical HCI research on human-AI interaction.

View on arXiv

Comments on this paper