Title |
---|
![]() MIO: A Foundation Model on Multimodal Tokens Zekun Wang King Zhu Chunpu Xu Wangchunshu Zhou Jiaheng Liu ...Yuanxing Zhang Ge Zhang Ke Xu Jie Fu Wenhao Huang |
![]() SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model Carlos Hernandez-Olivan Marc Delcroix Tsubasa Ochiai Daisuke Niizumi Naohiro Tawara Tomohiro Nakatani Shoko Araki |
![]() Large Language Models are Strong Audio-Visual Speech Recognition Learners Umberto Cappellazzo Minsu Kim Honglie Chen Pingchuan Ma Stavros Petridis Daniele Falavigna Alessio Brutti Maja Pantic |
![]() Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models Li-Wei Chen Takuya Higuchi He Bai Ahmed Hussen Abdelaziz Alexander Rudnicky Shinji Watanabe Tatiana Likhomanenko B. Theobald Zakaria Aldeneh |