ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.15018
  4. Cited By
TorchAudio: Building Blocks for Audio and Speech Processing

TorchAudio: Building Blocks for Audio and Speech Processing

28 October 2021
Yao-Yuan Yang
Moto Hira
Zhaoheng Ni
Anjali Chourdia
Artyom Astafurov
Caroline Chen
Ching-Feng Yeh
Christian Puhrsch
David Pollack
Dmitriy Genzel
Donny Greenberg
Edward Z. Yang
Jason Lian
Jay Mahadeokar
Jeff Hwang
Ji Chen
Peter Goldsborough
Prabhat Roy
Sean Narenthiran
Shinji Watanabe
Soumith Chintala
Vincent Quenneville-Bélair
Yangyang Shi
ArXivPDFHTML

Papers citing "TorchAudio: Building Blocks for Audio and Speech Processing"

27 / 27 papers shown
Title
Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Yi Zhu
C. Goel
Surya Koppisetti
Trang Tran
Ankur Kumar
Gaurav Bharaj
AAML
28
0
0
09 Oct 2024
SCOREQ: Speech Quality Assessment with Contrastive Regression
SCOREQ: Speech Quality Assessment with Contrastive Regression
Alessandro Ragano
Jan Skoglund
Andrew Hines
38
6
0
09 Oct 2024
Lightweight Transducer Based on Frame-Level Criterion
Lightweight Transducer Based on Frame-Level Criterion
Genshun Wan
Mengzhi Wang
Tingzhi Mao
Hang Chen
Z. Ye
36
1
0
05 Sep 2024
An Automated Approach to Collecting and Labeling Time Series Data for
  Event Detection Using Elastic Node Hardware
An Automated Approach to Collecting and Labeling Time Series Data for Event Detection Using Elastic Node Hardware
Tianheng Ling
Islam Mansour
Chao Qian
Gregor Schiele
20
0
0
06 Jul 2024
TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning
TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning
Nemin Wu
Qian Cao
Zhangyu Wang
Zeping Liu
Yanlin Qi
...
Stefano Ermon
T. Ganu
A. Nambi
Ni Lao
Gengchen Mai
61
15
0
21 Jun 2024
AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss
  Weighting
AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss Weighting
Shreyan Ganguly
Roshan Nayak
Rakshith Rao
Ujan Deb
AP Prathosh
24
1
0
11 May 2024
Toward end-to-end interpretable convolutional neural networks for
  waveform signals
Toward end-to-end interpretable convolutional neural networks for waveform signals
Linh Vu
Thu Tran
Wern-Han Lim
Raphael Phan
28
1
0
03 May 2024
A Large-Scale Evaluation of Speech Foundation Models
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
38
19
0
15 Apr 2024
R-Spin: Efficient Speaker and Noise-invariant Representation Learning
  with Acoustic Pieces
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces
Heng-Jui Chang
James R. Glass
25
3
0
15 Nov 2023
Soundbay: Deep Learning Framework for Marine Mammals and Bioacoustic
  Research
Soundbay: Deep Learning Framework for Marine Mammals and Bioacoustic Research
Noam Bressler
Michael Faran
Amit Galor
Michael Moshe Michelashvili
Tomer Nachshon
Noa Weiss
28
0
0
07 Nov 2023
CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech
  Recognition
CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
Tian-Hao Zhang
Dinghao Zhou
Guiping Zhong
Jiaming Zhou
Baoxiang Li
10
3
0
26 Jul 2023
AnuraSet: A dataset for benchmarking Neotropical anuran calls
  identification in passive acoustic monitoring
AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring
Juan Sebastián Canas
Maria Paula Toro-Gómez
L. S. M. Sugai
H. Benítez-Restrepo
J. Rudas
...
José Luiz Massao Moreira Sugai
Carolina Emília dos Santos
R. Bastos
Diego Llusia
J. Ulloa
28
18
0
11 Jul 2023
Allophant: Cross-lingual Phoneme Recognition with Articulatory
  Attributes
Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes
Kevin Glocker
Aaricia Herygers
Munir Georges
21
4
0
07 Jun 2023
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Brian Yan
Jiatong Shi
Yun Tang
H. Inaguma
Yifan Peng
...
Zhaoheng Ni
Moto Hira
Soumi Maiti
J. Pino
Shinji Watanabe
19
20
0
10 Apr 2023
Imitator: Personalized Speech-driven 3D Facial Animation
Imitator: Personalized Speech-driven 3D Facial Animation
Balamurugan Thambiraja
I. Habibie
S. Aliakbarian
Darren Cosker
Christian Theobalt
Justus Thies
CVBM
39
49
0
30 Dec 2022
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from
  Style-Based TTS Models
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models
Yinghao Aaron Li
Cong Han
N. Mesgarani
13
18
0
29 Dec 2022
Evince the artifacts of Spoof Speech by blending Vocal Tract and Voice
  Source Features
Evince the artifacts of Spoof Speech by blending Vocal Tract and Voice Source Features
T. U. K. Reddy
Sahukari Chaitanya Varun
Kota Pranav Kumar Sankala Sreekanth
K. Murty
18
0
0
05 Dec 2022
Neural Transducer Training: Reduced Memory Consumption with Sample-wise
  Computation
Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation
Stefan Braun
Erik McDermott
Roger Hsiao
19
1
0
29 Nov 2022
GANStrument: Adversarial Instrument Sound Synthesis with Pitch-invariant
  Instance Conditioning
GANStrument: Adversarial Instrument Sound Synthesis with Pitch-invariant Instance Conditioning
Gaku Narita
Junichi Shimizu
Taketo Akama
GAN
21
11
0
10 Nov 2022
Denoising neural networks for magnetic resonance spectroscopy
Denoising neural networks for magnetic resonance spectroscopy
Natalie Klein
Amber J. Day
Harris Mason
M. Malone
Sinead Williamson
11
1
0
31 Oct 2022
Pre-trained Speech Representations as Feature Extractors for Speech
  Quality Assessment in Online Conferencing Applications
Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications
Bastiaan Tamm
Helena Balabin
Rik Vandenberghe
Hugo Van hamme
29
9
0
01 Oct 2022
Decoding speech perception from non-invasive brain recordings
Decoding speech perception from non-invasive brain recordings
Alexandre Défossez
Charlotte Caucheteux
Jérémy Rapin
Ori Kabeli
J. King
30
115
0
25 Aug 2022
The Anatomy of Video Editing: A Dataset and Benchmark Suite for
  AI-Assisted Video Editing
The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing
Dawit Mureja Argaw
Fabian Caba Heilbron
Joon-Young Lee
Markus Woodson
In So Kweon
VGen
37
22
0
20 Jul 2022
Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms
Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms
Marco Jiralerspong
Gauthier Gidel
VLM
14
3
0
25 Jun 2022
AugLy: Data Augmentations for Robustness
AugLy: Data Augmentations for Robustness
Zoe Papakipos
Joanna Bitton
AAML
29
52
0
17 Jan 2022
DDSP: Differentiable Digital Signal Processing
DDSP: Differentiable Digital Signal Processing
Jesse Engel
Lamtharn Hantrakul
Chenjie Gu
Adam Roberts
DiffM
83
372
0
14 Jan 2020
NeMo: a toolkit for building AI applications using Neural Modules
NeMo: a toolkit for building AI applications using Neural Modules
Oleksii Kuchaiev
Jason Chun Lok Li
Huyen Nguyen
Oleksii Hrinchuk
Ryan Leary
...
Jack Cook
P. Castonguay
Mariya Popova
Jocelyn Huang
Jonathan M. Cohen
185
291
0
14 Sep 2019
1