Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing

4 June 2024

Jacob Whitehill

Papers citing "Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing"

2 / 2 papers shown

Title
From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM Kshitij Ambilduke Ben Peters Sonal Sannigrahi Anil Keshwani Tsz Kin Lam Bruno Martins Marcely Zanon Boito André F. T. Martins 47 0 0 13 Mar 2025
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy? Yiwen Guan V. Trinh Vivek Voleti Jacob Whitehill 34 1 0 13 Sep 2024