v1v2 (latest)

Deep Speech: Scaling up end-to-end speech recognition

17 December 2014

Papers citing "Deep Speech: Scaling up end-to-end speech recognition"

50 / 768 papers shown

Exploring State Space and Reasoning by Elimination in Tsetlin Machines

Ahmed K. Kadhim

Ole-Christoffer Granmo

Lei Jiao

Rishad Shafik

269

12 Jul 2024

Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems

Chao Shen

146

27 Jun 2024

NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation

Niu Guanchen

3DH

275

17 Jun 2024

Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation

Eungbeom Kim

Hantae Kim

Kyogu Lee

186

12 Jun 2024

Embedded Distributed Inference of Deep Neural Networks: A Systematic Review

Federico Nicolás Peccia

Oliver Bringmann

246

06 May 2024

Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment

Aditya Chakravarty

167

02 May 2024

TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting

354

23 Apr 2024

Towards Fast Setup and High Throughput of GPU Serverless Computing

Jingwen Leng

135

23 Apr 2024

Effective internal language model training and fusion for factorized transducer modelIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Ozlem Kalinli

195

02 Apr 2024

PID Control-Based Self-Healing to Improve the Robustness of Large Language Models

245

31 Mar 2024

FastPerson: Enhancing Video Learning through Effective Video Summarization that Preserves Linguistic and Visual Contexts

Kazuki Kawamura

Jun Rekimoto

138

26 Mar 2024

Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data

Yuxuan Li

Sarthak Kumar Maharana

Yunhui Guo

AAML

294

15 Mar 2024

SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition EvaluationComputer Speech and Language (CSL), 2024

192

13 Mar 2024

A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition

209

02 Mar 2024

Speaker-Independent Dysarthria Severity Classification using Self-Supervised Transformers and Multi-Task Learning

148

29 Feb 2024

Representing Online Handwriting for Recognition in Large Vision-Language Models

288

23 Feb 2024

The Balancing Act: Unmasking and Alleviating ASR Biases in Portuguese

113

12 Feb 2024

Arabic Synonym BERT-based Adversarial Examples for Text ClassificationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2024

Norah M. Alshahrani

Saied Alshahrani

Esma Wali

Jeanna Neefe Matthews

AAML

168

05 Feb 2024

Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording PrivilegeIEEE Transactions on Dependable and Secure Computing (IEEE TDSC), 2024

184

28 Jan 2024

NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

155

23 Jan 2024

A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model

107

05 Jan 2024

PhasePerturbation: Speech Data Augmentation via Phase Perturbation for Automatic Speech RecognitionACM Multimedia Asia (MA), 2023

138

13 Dec 2023

USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

...

473

13 Dec 2023

Relational Deep Learning: Graph Representation Learning on Relational Databases

193

07 Dec 2023

MyPortrait: Morphable Prior-Guided Personalized Portrait Generation

171

05 Dec 2023

3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing

Balamurugan Thambiraja

284

01 Dec 2023

MemoryCompanion: A Smart Healthcare Solution to Empower Efficient Alzheimer's Care Via Unleashing Generative AI

20 Nov 2023

CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding

156

15 Nov 2023

Automatic Disfluency Detection from Untranscribed SpeechIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Amrit Romana

K. Koishida

E. Provost

241

01 Nov 2023

Form follows Function: Text-to-Text Conditional Graph Generation based on Functional Requirements

234

01 Nov 2023

Deep Audio Analyzer: a Framework to Industrialize the Research on Audio Forensics

Valerio Francesco Puglisi

O. Giudice

Sebastiano Battiato

197

29 Oct 2023

Personalized Speech-driven Expressive 3D Facial Animation Synthesis with Style Control

Elif Bozkurt

196

25 Oct 2023

LC-TTFS: Towards Lossless Network Conversion for Spiking Neural Networks with TTFS CodingIEEE Transactions on Cognitive and Developmental Systems (IEEE TCDS), 2023

Qu Yang

Malu Zhang

Jibin Wu

Kay Chen Tan

Haizhou Li

213

23 Oct 2023

No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition through Pitch ManipulationAutomatic Speech Recognition & Understanding (ASRU), 2023

189

10 Oct 2023

DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion ModelsACM Transactions on Graphics (TOG), 2023

Sheng Ye

364

30 Sep 2023

Emotional Listener Portrait: Neural Listener Head Generation with EmotionIEEE International Conference on Computer Vision (ICCV), 2023

432

29 Sep 2023

Developing automatic verbatim transcripts for international multilingual meetings: an end-to-end solutionMachine Translation Summit (MT Summit), 2023

134

27 Sep 2023

Privacy-preserving and Privacy-attacking Approaches for Speech and Audio -- A Survey

244

26 Sep 2023

Deepfake audio as a data augmentation technique for training automatic speech to text transcription models

Alexandre R. Ferreira

Cláudio E. C. Campelo

104

22 Sep 2023

A Multiscale Autoencoder (MSAE) Framework for End-to-End Neural Network Speech EnhancementIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Bengt J. Borgström

M. Brandstein

170

21 Sep 2023

AudioFool: Fast, Universal and synchronization-free Cross-Domain Attack on Speech Recognition

158

20 Sep 2023

FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using DiffusionMotion in Games (MiG), 2023

333

20 Sep 2023

Uncertainty Estimation in Instance Segmentation with Star-convex ShapesIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

184

19 Sep 2023

Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation

234

16 Sep 2023

Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from WhisperIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Jeong Hun Yeo

269

15 Sep 2023

PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme InjectionInternational Symposium on Recent Advances in Intrusion Detection (RAID), 2023

Hanqing Guo

Guangjing Wang

Yuanda Wang

Bocheng Chen

Qiben Yan

Li Xiao

AAML

200

13 Sep 2023

DAD++: Improved Data-free Test Time Adversarial Defense

264

10 Sep 2023

Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis

248

31 Aug 2023

ASTER: Automatic Speech Recognition System Accessibility Testing for StutterersInternational Conference on Automated Software Engineering (ASE), 2023

Yi Liu

Yang Liu

148

30 Aug 2023

Compensating Removed Frequency Components: Thwarting Voice Spectrum Reduction AttacksNetwork and Distributed System Security Symposium (NDSS), 2023

Shu Wang

Kun Sun

Qi Li

AAML

170

18 Aug 2023