Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2402.16021
Cited By
v1
v2 (latest)
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
25 February 2024
Minsu Kim
Jee-weon Jung
Hyeongseop Rha
Soumi Maiti
Siddhant Arora
Xuankai Chang
Shinji Watanabe
Y. Ro
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages"
5 / 5 papers shown
Seeing What You Say: Expressive Image Generation from Speech
Jiyoung Lee
S. Park
Sanghyuk Chun
Soo-Whan Chung
DiffM
VGen
300
1
0
05 Nov 2025
Unified Cross-modal Translation of Score Images, Symbolic Music, and Performance Audio
Jongmin Jung
Dongmin Kim
Sihun Lee
Seola Cho
Hyungjoon Soh
Irmak Bukey
Chris Donahue
Dasaem Jeong
258
0
0
19 May 2025
Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
Spoken Language Technology Workshop (SLT), 2024
Shih-Heng Wang
Jiatong Shi
Chien-yu Huang
Shinji Watanabe
Hung-yi Lee
255
2
0
27 Nov 2024
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data
Yichen Lu
Álvaro Huertas-García
Xuankai Chang
Hengwei Bian
Soumi Maiti
Shinji Watanabe
266
2
0
01 Aug 2024
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing
V. Trinh
Rosy Southwell
Yiwen Guan
Xinlu He
Zhiyong Wang
Jacob Whitehill
OffRL
309
8
0
04 Jun 2024
1
Page 1 of 1