ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.09577
  4. Cited By
NeMo: a toolkit for building AI applications using Neural Modules

NeMo: a toolkit for building AI applications using Neural Modules

14 September 2019
Oleksii Kuchaiev
Jason Chun Lok Li
Huyen Nguyen
Oleksii Hrinchuk
Ryan Leary
Boris Ginsburg
Samuel Kriman
Stanislav Beliaev
Vitaly Lavrukhin
Jack Cook
P. Castonguay
Mariya Popova
Jocelyn Huang
Jonathan M. Cohen
ArXiv (abs)PDFHTMLGithub (14646★)

Papers citing "NeMo: a toolkit for building AI applications using Neural Modules"

50 / 199 papers shown
Mind the Motions: Benchmarking Theory-of-Mind in Everyday Body Language
Seungbeen Lee
Jinhong Jeong
Donghyun Kim
Yejin Son
Youngjae Yu
113
1
0
19 Nov 2025
Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large and BEA-Dialogue Datasets
Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large and BEA-Dialogue Datasets
Máté Gedeon
Piroska Zsófia Barta
Péter Mihajlik
Tekla Etelka Gráczi
Anna Kohári
Katalin Mády
88
1
0
17 Nov 2025
Open Source State-Of-the-Art Solution for Romanian Speech Recognition
Open Source State-Of-the-Art Solution for Romanian Speech Recognition
Gabriel Pirlogeanu
Alexandru-Lucian Georgescu
Horia Cucu
101
0
0
05 Nov 2025
ReFESS-QI: Reference-Free Evaluation For Speech Separation With Joint Quality And Intelligibility Scoring
ReFESS-QI: Reference-Free Evaluation For Speech Separation With Joint Quality And Intelligibility Scoring
Ari Frummer
Helin Wang
Tianyu Cao
Adi Arbel
Yuval Sieradzki
Oren Gal
Jesus Villalba
Thomas Thebaud
Najim Dehak
132
1
0
23 Oct 2025
SD-MVSum: Script-Driven Multimodal Video Summarization Method and Datasets
SD-MVSum: Script-Driven Multimodal Video Summarization Method and Datasets
Manolis Mylonas
Charalampia Zerva
Evlampios Apostolidis
Vasileios Mezaris
131
3
0
07 Oct 2025
Contrastive Mutual Information Learning: Toward Robust Representations without Positive-Pair Augmentations
Contrastive Mutual Information Learning: Toward Robust Representations without Positive-Pair Augmentations
Micha Livne
SSL
144
0
0
25 Sep 2025
WolBanking77: Wolof Banking Speech Intent Classification Dataset
WolBanking77: Wolof Banking Speech Intent Classification Dataset
Abdou Karim Kandji
Frédéric Precioso
Cheikh Ba
Samba Ndiaye
Augustin Ndione
234
0
0
23 Sep 2025
Characterizing the Efficiency of Distributed Training: A Power, Performance, and Thermal Perspective
Characterizing the Efficiency of Distributed Training: A Power, Performance, and Thermal Perspective
Seokjin Go
Joongun Park
Spandan More
Hanjiang Wu
Irene Wang
Aaron Jezghani
Tushar Krishna
Divya Mahajan
249
2
0
12 Sep 2025
Training Language Model Agents to Find Vulnerabilities with CTF-Dojo
Training Language Model Agents to Find Vulnerabilities with CTF-Dojo
Terry Yue Zhuo
Dingmin Wang
Hantian Ding
Varun Kumar
Zijian Wang
ELM
215
4
0
25 Aug 2025
Improving French Synthetic Speech Quality via SSML Prosody Control
Improving French Synthetic Speech Quality via SSML Prosody Control
Nassima Ould Ouali
Awais Hussain Sani
Ruben Bueno
Jonah Dauvet
Tim Luka Horstmann
Eric Moulines
100
0
0
24 Aug 2025
FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities
FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities
Lilit Grigoryan
Vladimir Bataev
Nikolay Karpov
A. Andrusenko
Vitaly Lavrukhin
Boris Ginsburg
185
1
0
10 Aug 2025
MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous Decoding
MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous Decoding
Yu Xi
Haoyu Li
Xiaoyu Gu
Yidi Jiang
Kai Yu
344
2
0
01 Jul 2025
Improving Named Entity Transcription with Contextual LLM-based Revision
Improving Named Entity Transcription with Contextual LLM-based Revision
V. Trinh
Xinlu He
Jacob Whitehill
KELM
331
1
0
12 Jun 2025
Joint ASR and Speaker Role Tagging with Serialized Output Training
Joint ASR and Speaker Role Tagging with Serialized Output Training
Anfeng Xu
Tiantian Feng
Zengyi Qin
246
0
0
12 Jun 2025
Audio Generation Through Score-Based Generative Modeling: Design Principles and Implementation
Ge Zhu
Yutong Wen
Zhiyao Duan
DiffMMedIm
241
3
0
10 Jun 2025
WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing
WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing
Yu Nakagome
Michael Hentschel
240
0
0
02 Jun 2025
Pushing the Limits of Beam Search Decoding for Transducer-based ASR models
Pushing the Limits of Beam Search Decoding for Transducer-based ASR models
Lilit Grigoryan
Vladimir Bataev
A. Andrusenko
Hainan Xu
Vitaly Lavrukhin
Boris Ginsburg
214
2
0
30 May 2025
Masked Self-distilled Transducer-based Keyword Spotting with Semi-autoregressive Decoding
Masked Self-distilled Transducer-based Keyword Spotting with Semi-autoregressive Decoding
Yu Xi
Xiaoyu Gu
Haoyu Li
Jun Song
Bo Zheng
Kai Yu
154
0
0
30 May 2025
Word Level Timestamp Generation for Automatic Speech Recognition and Translation
Word Level Timestamp Generation for Automatic Speech Recognition and Translation
Ke Hu
Krishna Puvvada
Elena Rastorgueva
Zhiwen Chen
He Huang
Shuoyang Ding
Kunal Dhawan
Hainan Xu
Jagadeesh Balam
Boris Ginsburg
178
2
0
21 May 2025
SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
Ke Hu
Ehsan Hosseini-Asl
Chen Chen
Edresson Casanova
Subhankar Ghosh
Piotr .Zelasko
Zhiwen Chen
Jia-Nan Li
Jagadeesh Balam
Boris Ginsburg
AuLLM
642
0
0
21 May 2025
WIND: Accelerated RNN-T Decoding with Windowed Inference for Non-blank Detection
WIND: Accelerated RNN-T Decoding with Windowed Inference for Non-blank Detection
Hainan Xu
Vladimir Bataev
Lilit Grigoryan
Boris Ginsburg
213
0
0
19 May 2025
LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models
LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models
Danilo de Oliveira
Julius Richter
Tal Peer
Timo Gerkmann
DiffM
436
2
0
16 May 2025
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
351
3
0
01 May 2025
NeMo-Inspector: A Visualization Tool for LLM Generation Analysis
NeMo-Inspector: A Visualization Tool for LLM Generation Analysis
Daria Gitman
Igor Gitman
Evelina Bakhturina
SyDa
216
0
0
01 May 2025
Dysarthria Normalization via Local Lie Group Transformations for Robust ASR
Dysarthria Normalization via Local Lie Group Transformations for Robust ASR
Mikhail Osipov
418
2
0
16 Apr 2025
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models
Beilong Tang
Bang Zeng
Ming Li
AI4TS
291
6
0
10 Apr 2025
Visual-Aware Speech Recognition for Noisy Scenarios
Visual-Aware Speech Recognition for Noisy Scenarios
Lakshmipathi Balaji
Karan Singla
212
0
0
09 Apr 2025
RNN-Transducer-based Losses for Speech Recognition on Noisy Targets
RNN-Transducer-based Losses for Speech Recognition on Noisy Targets
Vladimir Bataev
390
0
0
09 Apr 2025
OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery
OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery
Vignesh Prabhakar
Md Amirul Islam
Adam Atanas
Longji Xu
J. N. Han
...
Rucha Apte
Robert Clark
Kang Xu
Zihan Wang
Kai Liu
LRM
581
16
0
22 Mar 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
331
1
0
11 Mar 2025
Contextual Cues in Machine Translation: Investigating the Potential of Multi-Source Input Strategies in LLMs and NMT Systems
Lia Shahnazaryan
P. Simianer
Joern Wuebker
253
0
0
10 Mar 2025
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
C. Xie
Shuo Cai
Wenjun Wang
Pengxiang Li
Zhijie Sang
...
Xiaotian Han
Jianbo Yuan
Shengyu Zhang
Leilei Gan
Hongxia Yang
LRM
320
2
0
17 Feb 2025
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning DataIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
Chao-Han Huck Yang
Jagadeesh Balam
Boris Ginsburg
Yu-Te Wang
Hung-yi Lee
AuLLMSyDa
413
40
0
28 Jan 2025
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
TTS-Transducer: End-to-End Speech Synthesis with Neural TransducerIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Vladimir Bataev
Subhankar Ghosh
Vitaly Lavrukhin
Jason Chun Lok Li
AI4TS
288
4
0
10 Jan 2025
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition
Shucong Zhang
Titouan Parcollet
Rogier van Dalen
Sourav Bhattacharya
328
1
0
10 Jan 2025
Open Universal Arabic ASR Leaderboard
Open Universal Arabic ASR Leaderboard
Yingzhi Wang
Anas Alhmoud
Muhammad Alqurishi
ELM
178
6
0
18 Dec 2024
Building a Taiwanese Mandarin Spoken Language Model: A First Attempt
Building a Taiwanese Mandarin Spoken Language Model: A First Attempt
Chih-Kai Yang
Yu-Kuan Fu
Chen-An Li
Yi-Cheng Lin
Yu-Xiang Lin
...
Ulin Sanga
Xuanjun Chen
Po-Chun Hsu
Shu-Wen Yang
Hung-yi Lee
AuLLM
305
13
0
11 Nov 2024
SimpleFSDP: Simpler Fully Sharded Data Parallel with torch.compile
SimpleFSDP: Simpler Fully Sharded Data Parallel with torch.compile
Ruisi Zhang
Tianyu Liu
Will Feng
Andrew Gu
Sanket Purandare
Wanchao Liang
Francisco Massa
331
5
0
01 Nov 2024
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-TuningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Yifan Peng
Krishna Puvvada
Zhehuai Chen
Piotr .Zelasko
He Huang
Kunal Dhawan
Ke Hu
Shinji Watanabe
Jagadeesh Balam
Boris Ginsburg
422
8
0
23 Oct 2024
DENOASR: Debiasing ASRs through Selective Denoising
DENOASR: Debiasing ASRs through Selective Denoising
Anand Rai
S. Jaiswal
Shubham Prakash
Bendi Pragnya Sree
Animesh Mukherjee
250
3
0
22 Oct 2024
How much do contextualized representations encode long-range context?
How much do contextualized representations encode long-range context?North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Simeng Sun
Cheng-Ping Hsieh
337
0
0
16 Oct 2024
Upcycling Large Language Models into Mixture of Experts
Upcycling Large Language Models into Mixture of Experts
Ethan He
Syeda Nahida Akter
R. Prenger
V. Korthikanti
Zijie Yan
Tong Liu
Shiqing Fan
Ashwath Aithal
Mohammad Shoeybi
Bryan Catanzaro
MoE
439
32
0
10 Oct 2024
Transducer Consistency Regularization for Speech to Text Applications
Transducer Consistency Regularization for Speech to Text ApplicationsSpoken Language Technology Workshop (SLT), 2024
Cindy Tseng
Yun Tang
Vijendra Raj Apsingekar
305
0
0
09 Oct 2024
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASRInternational Conference on Learning Representations (ICLR), 2024
Hainan Xu
Travis M. Bartley
Vladimir Bataev
Boris Ginsburg
975
1
0
03 Oct 2024
Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization
Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization
Kaden Uhlig
Joern Wuebker
Raphael Reinauer
John DeNero
394
0
0
26 Sep 2024
NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple
  Speakers
NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple SpeakersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Nohil Park
Heeseung Kim
Che Hyun Lee
Jooyoung Choi
Jiheum Yeom
Sungroh Yoon
173
3
0
24 Sep 2024
VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient
  Speaker-Adaptive Text-to-Speech via Autoguidance
VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient Speaker-Adaptive Text-to-Speech via AutoguidanceIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Jiheum Yeom
Heeseung Kim
Jooyoung Choi
Che Hyun Lee
Nohil Park
Sungroh Yoon
129
1
0
24 Sep 2024
EMMeTT: Efficient Multimodal Machine Translation Training
EMMeTT: Efficient Multimodal Machine Translation TrainingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Piotr Żelasko
Zhehuai Chen
Mengru Wang
Daniel Galvez
Oleksii Hrinchuk
Shuoyang Ding
Ke Hu
Jagadeesh Balam
Vitaly Lavrukhin
Boris Ginsburg
193
4
0
20 Sep 2024
Enhancing Synthetic Training Data for Speech Commands: From ASR-Based
  Filtering to Domain Adaptation in SSL Latent Space
Enhancing Synthetic Training Data for Speech Commands: From ASR-Based Filtering to Domain Adaptation in SSL Latent Space
Sebastião Quintas
Isabelle Ferrané
Thomas Pellegrini
251
0
0
19 Sep 2024
Chain-of-Thought Prompting for Speech Translation
Chain-of-Thought Prompting for Speech TranslationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Ke Hu
Zhehuai Chen
Chao-Han Huck Yang
Piotr Żelasko
Oleksii Hrinchuk
Vitaly Lavrukhin
Jagadeesh Balam
Boris Ginsburg
LRM
457
16
0
17 Sep 2024
1234
Next
Page 1 of 4