ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.16021
  4. Cited By
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
v1v2 (latest)

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

25 February 2024
Minsu Kim
Jee-weon Jung
Hyeongseop Rha
Soumi Maiti
Siddhant Arora
Xuankai Chang
Shinji Watanabe
Y. Ro
ArXiv (abs)PDFHTML

Papers citing "TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages"

5 / 5 papers shown
Seeing What You Say: Expressive Image Generation from Speech
Seeing What You Say: Expressive Image Generation from Speech
Jiyoung Lee
S. Park
Sanghyuk Chun
Soo-Whan Chung
DiffMVGen
300
1
0
05 Nov 2025
Unified Cross-modal Translation of Score Images, Symbolic Music, and Performance Audio
Unified Cross-modal Translation of Score Images, Symbolic Music, and Performance Audio
Jongmin Jung
Dongmin Kim
Sihun Lee
Seola Cho
Hyungjoon Soh
Irmak Bukey
Chris Donahue
Dasaem Jeong
258
0
0
19 May 2025
Fusion of Discrete Representations and Self-Augmented Representations
  for Multilingual Automatic Speech Recognition
Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech RecognitionSpoken Language Technology Workshop (SLT), 2024
Shih-Heng Wang
Jiatong Shi
Chien-yu Huang
Shinji Watanabe
Hung-yi Lee
255
2
0
27 Nov 2024
SynesLM: A Unified Approach for Audio-visual Speech Recognition and
  Translation via Language Model and Synthetic Data
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data
Yichen Lu
Álvaro Huertas-García
Xuankai Chang
Hengwei Bian
Soumi Maiti
Shinji Watanabe
266
2
0
01 Aug 2024
Discrete Multimodal Transformers with a Pretrained Large Language Model
  for Mixed-Supervision Speech Processing
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing
V. Trinh
Rosy Southwell
Yiwen Guan
Xinlu He
Zhiyong Wang
Jacob Whitehill
OffRL
309
8
0
04 Jun 2024
1
Page 1 of 1