742

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Siyan Chen
Yanfei Chen
Ying Chen
Zhuo Chen
Feng Cheng
Xuyan Chi
Jian Cong
Qinpeng Cui
Qide Dong
Junliang Fan
Jing Fang
Zetao Fang
Chengjian Feng
Han Feng
Mingyuan Gao
Yu Gao
Qiushan Guo
Boyang Hao
Qingkai Hao
Bibo He
Qian He
Tuyen Hoang
Ruoqing Hu
Xi Hu
Weilin Huang
Zhaoyang Huang
Zhongyi Huang
Siqi Jiang
Wei Jiang
Yunpu Jiang
Zhuo Jiang
Ashley Kim
Jianan Kong
Zhichao Lai
Shanshan Lao
Ai Li
Feiya Li
Gen Li
Huixia Li
JiaShi Li
Liang Li
Ming Li
Tao Li
Xian Li
Xiaojie Li
Xiaoyang Li
Xingxing Li
Yameng Li
Yifu Li
Yiying Li
Chao Liang
Ying Liang
Zhiqiang Liang
Wang Liao
Yalin Liao
Heng Lin
Kengyu Lin
Shanchuan Lin
Xi Lin
Zhijie Lin
Feng Ling
Fangfang Liu
Gaohong Liu
Jiawei Liu
Jie Liu
Shouda Liu
Shu Liu
Sichao Liu
Songwei Liu
Xin Liu
Xue Liu
Yibo Liu
Zikun Liu
Zuxi Liu
Junlin Lyu
Lecheng Lyu
Qian Lyu
Han Mu
Xiaonan Nie
Jingzhe Ning
Xitong Pan
Yanghua Peng
Lianke Qin
Xueqiong Qu
Yuxi Ren
Yuchen Shen
Guang Shi
Lei Shi
Yan Song
Yinglong Song
Fan Sun
Li Sun
Renfei Sun
Zeyu Sun
Wenjing Tang
Zirui Tao
Feng Wang
Furui Wang
Jinran Wang
Junkai Wang
Ke Wang
Kexin Wang
Qingyi Wang
Rui Wang
Sen Wang
Shuai Wang
Tingru Wang
Weichen Wang
Xin Wang
Yanhui Wang
Yue Wang
Yuping Wang
Yuxuan Wang
Ziyu Wang
Guoqiang Wei
Wanru Wei
Di Wu
Guohong Wu
Hanjie Wu
Jian Wu
Jie Wu
Ruolan Wu
Xinglong Wu
Yonghui Wu
Ruiqi Xia
Liang Xiang
Fei Xiao
XueFeng Xiao
Pan Xie
Shuangyi Xie
Shuang Xu
Jinlan Xue
Bangbang Yang
Ceyuan Yang
Jiaqi Yang
Runkai Yang
Tao Yang
Yang Yang
Yihang Yang
ZhiXian Yang
Ziyan Yang
Yifan Yao
Zilyu Ye
Bowen Yu
Chujie Yuan
Linxiao Yuan
Sichun Zeng
Weihong Zeng
Xuejiao Zeng
Yan Zeng
Chuntao Zhang
Heng Zhang
Jingjie Zhang
Kuo Zhang
Liang Zhang
Liying Zhang
Manlin Zhang
Ting Zhang
Weida Zhang
Xiaohe Zhang
Xinyan Zhang
Yan Zhang
Yuan Zhang
Zixiang Zhang
Fengxuan Zhao
Huating Zhao
Yang Zhao
Hao Zheng
Jianbin Zheng
Xiaozheng Zheng
Yangyang Zheng
Yijie Zheng
Jiexin Zhou
Kuan Zhu
Shenhan Zhu
Wenjia Zhu
Benhui Zou
Feilong Zuo
et al. (78 additional authors not shown)
Main:8 Pages
6 Figures
Bibliography:1 Pages
Appendix:2 Pages
Abstract

Recent strides in video generation have paved the way for unified audio-visual generation. In this work, we present Seedance 1.5 pro, a foundational model engineered specifically for native, joint audio-video generation. Leveraging a dual-branch Diffusion Transformer architecture, the model integrates a cross-modal joint module with a specialized multi-stage data pipeline, achieving exceptional audio-visual synchronization and superior generation quality. To ensure practical utility, we implement meticulous post-training optimizations, including Supervised Fine-Tuning (SFT) on high-quality datasets and Reinforcement Learning from Human Feedback (RLHF) with multi-dimensional reward models. Furthermore, we introduce an acceleration framework that boosts inference speed by over 10X. Seedance 1.5 pro distinguishes itself through precise multilingual and dialect lip-syncing, dynamic cinematic camera control, and enhanced narrative coherence, positioning it as a robust engine for professional-grade content creation. Seedance 1.5 pro is now accessible on Volcano Engine at this https URL.

View on arXiv
Comments on this paper