ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.05171
  4. Cited By
fairseq S2T: Fast Speech-to-Text Modeling with fairseq

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

11 October 2020
Changhan Wang
Yun Tang
Xutai Ma
Anne Wu
Sravya Popuri
Dmytro Okhonko
J. Pino
    VLM
    LRM
ArXivPDFHTML

Papers citing "fairseq S2T: Fast Speech-to-Text Modeling with fairseq"

50 / 57 papers shown
Title
Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations
Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations
Linrong Pan
Chenglong Jiang
Gaoze Hou
Ying Gao
43
0
0
08 May 2025
High-Fidelity Simultaneous Speech-To-Speech Translation
High-Fidelity Simultaneous Speech-To-Speech Translation
Tom Labiausse
Laurent Mazaré
Edouard Grave
P. Pérez
Alexandre Défossez
Neil Zeghidour
171
0
0
05 Feb 2025
A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation
A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation
Anna Min
Chenxu Hu
Yi Ren
Hang Zhao
61
0
0
01 Feb 2025
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Tsz Kin Lam
Marco Gaido
Sara Papi
L. Bentivogli
Barry Haddow
36
0
0
04 Jan 2025
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Dennis Fucci
Marco Gaido
Beatrice Savoldi
Matteo Negri
Mauro Cettolo
L. Bentivogli
54
1
0
03 Nov 2024
Speechworthy Instruction-tuned Language Models
Speechworthy Instruction-tuned Language Models
Hyundong Justin Cho
Nicolaas Jedema
Leonardo F. R. Ribeiro
Karishma Sharma
Pedro Szekely
Alessandro Moschitti
Ruben Janssen
Jonathan May
ALM
44
1
0
23 Sep 2024
Can We Achieve High-quality Direct Speech-to-Speech Translation without
  Parallel Speech Data?
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?
Qingkai Fang
Shaolei Zhang
Zhengrui Ma
Min Zhang
Yang Feng
VLM
43
1
0
11 Jun 2024
Direct Models for Simultaneous Translation and Automatic Subtitling:
  FBK@IWSLT2023
Direct Models for Simultaneous Translation and Automatic Subtitling: FBK@IWSLT2023
Sara Papi
Marco Gaido
Matteo Negri
43
7
0
27 Sep 2023
Implicit Memory Transformer for Computationally Efficient Simultaneous
  Speech Translation
Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation
Matthew Raffel
Lizhong Chen
9
5
0
03 Jul 2023
Shiftable Context: Addressing Training-Inference Context Mismatch in
  Simultaneous Speech Translation
Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation
Matthew Raffel
Drew Penney
Lizhong Chen
16
3
0
03 Jul 2023
End-to-End Simultaneous Speech Translation with Differentiable
  Segmentation
End-to-End Simultaneous Speech Translation with Differentiable Segmentation
Shaolei Zhang
Yang Feng
20
17
0
25 May 2023
Improving Metrics for Speech Translation
Improving Metrics for Speech Translation
Claudio Paonessa
Dominik Frefel
Manfred Vogel
23
1
0
22 May 2023
DUB: Discrete Unit Back-translation for Speech Translation
DUB: Discrete Unit Back-translation for Speech Translation
Dong Zhang
Rong Ye
Tom Ko
Mingxuan Wang
Yaqian Zhou
15
23
0
19 May 2023
HeySQuAD: A Spoken Question Answering Dataset
HeySQuAD: A Spoken Question Answering Dataset
Yijing Wu
Sai Krishna Rallabandi
R. Srinivasamurthy
Parag Dakle
Alolika Gon
Preethi Raghavan
29
4
0
26 Apr 2023
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Brian Yan
Jiatong Shi
Yun Tang
H. Inaguma
Yifan Peng
...
Zhaoheng Ni
Moto Hira
Soumi Maiti
J. Pino
Shinji Watanabe
19
20
0
10 Apr 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
42
47
0
21 Mar 2023
Efficient CTC Regularization via Coarse Labels for End-to-End Speech
  Translation
Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation
Biao Zhang
Barry Haddow
Rico Sennrich
17
3
0
21 Feb 2023
SegAugment: Maximizing the Utility of Speech Translation Data with
  Segmentation-based Augmentations
SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations
Ioannis Tsiamas
José A. R. Fonollosa
Marta R. Costa-jussá
41
6
0
19 Dec 2022
A large-scale and PCR-referenced vocal audio dataset for COVID-19
A large-scale and PCR-referenced vocal audio dataset for COVID-19
Jobie Budd
Kieran Baker
E. Karoune
H. Coppock
Selina Patel
...
D. Pigoli
Stephen J. Roberts
Josef Packham
T. Thornley
Chris Holmes
27
5
0
15 Dec 2022
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models
  of Different Modalities
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities
Zhe Zhao
Yudong Li
Cheng-An Hou
Jing-xin Zhao
Rong Tian
...
Xingwu Sun
Zhanhui Kang
Xiaoyong Du
Linlin Shen
Kimmo Yan
VLM
41
23
0
13 Dec 2022
Improving End-to-end Speech Translation by Leveraging Auxiliary Speech
  and Text Data
Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data
Yuhao Zhang
Chen Xu
Bojie Hu
Chunliang Zhang
Tong Xiao
Jingbo Zhu
24
15
0
04 Dec 2022
Make More of Your Data: Minimal Effort Data Augmentation for Automatic
  Speech Recognition and Translation
Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation
Tsz Kin Lam
Shigehiko Schamoni
Stefan Riezler
VLM
39
8
0
27 Oct 2022
RedApt: An Adaptor for wav2vec 2 Encoding \\ Faster and Smaller Speech
  Translation without Quality Compromise
RedApt: An Adaptor for wav2vec 2 Encoding \\ Faster and Smaller Speech Translation without Quality Compromise
Jinming Zhao
Haomiao Yang
Gholamreza Haffari
Ehsan Shareghi
VLM
16
2
0
16 Oct 2022
Direct Speech Translation for Automatic Subtitling
Direct Speech Translation for Automatic Subtitling
Sara Papi
Marco Gaido
Alina Karakanta
Mauro Cettolo
Matteo Negri
Marco Turchi
54
11
0
27 Sep 2022
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples
  on Self-Supervised Speech Recognition models
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models
R. Olivier
H. Abdullah
Bhiksha Raj
AAML
24
1
0
17 Sep 2022
M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation
M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation
Jinming Zhao
Haomiao Yang
Ehsan Shareghi
Gholamreza Haffari
42
19
0
03 Jul 2022
R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS
R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS
Kyle Kastner
Aaron Courville
32
0
0
30 Jun 2022
Non-Parametric Domain Adaptation for End-to-End Speech Translation
Non-Parametric Domain Adaptation for End-to-End Speech Translation
Yichao Du
Weizhi Wang
Zhirui Zhang
Boxing Chen
Tong Xu
Jun Xie
Enhong Chen
51
18
0
23 May 2022
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
Hui Zhang
Tian Yuan
Junkun Chen
Xintong Li
Renjie Zheng
...
Zeyu Chen
Xiaoguang Hu
Dianhai Yu
Yanjun Ma
Liang Huang
AuLLM
29
24
0
20 May 2022
Who Are We Talking About? Handling Person Names in Speech Translation
Who Are We Talking About? Handling Person Names in Speech Translation
Marco Gaido
Matteo Negri
Marco Turchi
23
7
0
13 May 2022
Efficient yet Competitive Speech Translation: FBK@IWSLT2022
Efficient yet Competitive Speech Translation: FBK@IWSLT2022
Marco Gaido
Sara Papi
Dennis Fucci
G. Fiameni
Matteo Negri
Marco Turchi
31
19
0
05 May 2022
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
  Languages
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Felix Wu
Kwangyoun Kim
Shinji Watanabe
Kyu Jeong Han
Ryan T. McDonald
Kilian Q. Weinberger
Yoav Artzi
SyDa
45
37
0
02 May 2022
GigaST: A 10,000-hour Pseudo Speech Translation Corpus
GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Rong Ye
Chengqi Zhao
Tom Ko
Chutong Meng
Tao Wang
Mingxuan Wang
Jun Cao
9
23
0
08 Apr 2022
Does Simultaneous Speech Translation need Simultaneous Models?
Does Simultaneous Speech Translation need Simultaneous Models?
Sara Papi
Marco Gaido
Matteo Negri
Marco Turchi
41
26
0
08 Apr 2022
Enhanced Direct Speech-to-Speech Translation Using Self-supervised
  Pre-training and Data Augmentation
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation
Sravya Popuri
Peng-Jen Chen
Changhan Wang
J. Pino
Yossi Adi
Jiatao Gu
Wei-Ning Hsu
Ann Lee
25
56
0
06 Apr 2022
An Analysis of Semantically-Aligned Speech-Text Embeddings
An Analysis of Semantically-Aligned Speech-Text Embeddings
M. Huzaifah
Ivan Kukanov
25
7
0
04 Apr 2022
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech
  Translation
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation
Chih-Chiang Chang
Hung-yi Lee
27
13
0
22 Mar 2022
STEMM: Self-learning with Speech-text Manifold Mixup for Speech
  Translation
STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation
Qingkai Fang
Rong Ye
Lei Li
Yang Feng
Mingxuan Wang
22
95
0
20 Mar 2022
Speech Resources in the Tamasheq Language
Speech Resources in the Tamasheq Language
Marcely Zanon Boito
Fethi Bougares
Florentin Barbier
Souhir Gahbiche
Loïc Barrault
Mickael Rouvier
Yannick Esteve
28
14
0
13 Jan 2022
Automatic Speech Recognition Datasets in Cantonese: A Survey and New
  Dataset
Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset
Tiezheng Yu
Rita Frieske
Peng-Tao Xu
Samuel Cahyawijaya
Cheuk Tung Shadow Yiu
...
Elham J. Barezi
Qifeng Chen
Xiaojuan Ma
Bertram E. Shi
Pascale Fung
RALM
42
9
0
07 Jan 2022
Voice Quality and Pitch Features in Transformer-Based Speech Recognition
Voice Quality and Pitch Features in Transformer-Based Speech Recognition
Guillermo Cámbara
Jordi Luque
Mireia Farrús
19
0
0
21 Dec 2021
Textless Speech-to-Speech Translation on Real Data
Textless Speech-to-Speech Translation on Real Data
Ann Lee
Hongyu Gong
Paul-Ambroise Duquenne
Holger Schwenk
Peng-Jen Chen
...
Sravya Popuri
Yossi Adi
J. Pino
Jiatao Gu
Wei-Ning Hsu
28
142
0
15 Dec 2021
From Start to Finish: Latency Reduction Strategies for Incremental
  Speech Synthesis in Simultaneous Speech-to-Speech Translation
From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation
Danni Liu
Changhan Wang
Hongyu Gong
Xutai Ma
Yun Tang
J. Pino
17
4
0
15 Oct 2021
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with
  Non-Autoregressive Hidden Intermediates
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates
H. Inaguma
Siddharth Dalmia
Brian Yan
Shinji Watanabe
65
11
0
27 Sep 2021
Multi-Sentence Resampling: A Simple Approach to Alleviate Dataset Length
  Bias and Beam-Search Degradation
Multi-Sentence Resampling: A Simple Approach to Alleviate Dataset Length Bias and Beam-Search Degradation
Ivan Provilkov
A. Malinin
16
4
0
13 Sep 2021
ImageBART: Bidirectional Context with Multinomial Diffusion for
  Autoregressive Image Synthesis
ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis
Patrick Esser
Robin Rombach
A. Blattmann
Bjorn Ommer
DiffM
24
156
0
19 Aug 2021
Simultaneous Speech Translation for Live Subtitling: from Delay to
  Display
Simultaneous Speech Translation for Live Subtitling: from Delay to Display
Alina Karakanta
Sara Papi
Matteo Negri
Marco Turchi
25
10
0
19 Jul 2021
Dealing with training and test segmentation mismatch: FBK@IWSLT2021
Dealing with training and test segmentation mismatch: FBK@IWSLT2021
Sara Papi
Marco Gaido
Matteo Negri
Marco Turchi
31
6
0
23 Jun 2021
Searchable Hidden Intermediates for End-to-End Models of Decomposable
  Sequence Tasks
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks
Siddharth Dalmia
Brian Yan
Vikas Raunak
Florian Metze
Shinji Watanabe
37
30
0
02 May 2021
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
  Learning, Semi-Supervised Learning and Interpretation
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
Changhan Wang
M. Rivière
Ann Lee
Anne Wu
Chaitanya Talnikar
Daniel Haziza
Mary Williamson
J. Pino
Emmanuel Dupoux
SSL
25
460
0
02 Jan 2021
12
Next