ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.12764
  4. Cited By
Towards Learning a Universal Non-Semantic Representation of Speech
v1v2v3v4v5v6 (latest)

Towards Learning a Universal Non-Semantic Representation of Speech

Interspeech (Interspeech), 2020
25 February 2020
Joel Shor
A. Jansen
Ronnie Maor
Oran Lang
Omry Tuval
Félix de Chaumont Quitry
Marco Tagliasacchi
Ira Shavitt
Dotan Emanuel
Yinnon A. Haviv
    SSL
ArXiv (abs)PDFHTML

Papers citing "Towards Learning a Universal Non-Semantic Representation of Speech"

50 / 107 papers shown
Generalizable Audio Spoofing Detection using Non-Semantic Representations
Generalizable Audio Spoofing Detection using Non-Semantic Representations
Arnab Das
Yassine El Kheir
Carlos Franzreb
Tim Herzig
Tim Polzehl
Sebastian Möller
233
1
0
29 Aug 2025
Audio Generation Through Score-Based Generative Modeling: Design Principles and Implementation
Ge Zhu
Yutong Wen
Zhiyao Duan
DiffMMedIm
327
3
0
10 Jun 2025
Self-supervised learning method using multiple sampling strategies for general-purpose audio representation
Self-supervised learning method using multiple sampling strategies for general-purpose audio representationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Ibuki Kuroyanagi
Tatsuya Komatsu
SSL
197
2
0
25 May 2025
TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models
TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Junyi Peng
Takanori Ashihara
Marc Delcroix
Tsubasa Ochiai
Oldrich Plchot
Shoko Araki
J. Černocký
ELM
327
3
0
10 May 2025
The order in speech disorder: a scoping review of state of the art machine learning methods for clinical speech classification
The order in speech disorder: a scoping review of state of the art machine learning methods for clinical speech classification
Birger Moëll
Fredrik Sand Aronsson
Per Östberg
Jonas Beskow
204
3
0
03 Mar 2025
Evaluation of Deep Audio Representations for Hearables
Evaluation of Deep Audio Representations for HearablesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Fabian Gröger
Pascal Baumann
Ludovic Amruthalingam
Laurent Simon
Ruksana Giurda
Simone Lionetti
431
1
0
10 Feb 2025
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Jakob Poncelet
Hugo Van hamme
487
1
0
05 Feb 2025
The Unreliability of Acoustic Systems in Alzheimer's Speech Datasets
  with Heterogeneous Recording Conditions
The Unreliability of Acoustic Systems in Alzheimer's Speech Datasets with Heterogeneous Recording Conditions
L. Gauder
Pablo Riera
A. Slachevsky
G. Forno
Adolfo M. Garcia
Luciana Ferrer
262
4
0
11 Sep 2024
STAB: Speech Tokenizer Assessment Benchmark
STAB: Speech Tokenizer Assessment Benchmark
Shikhar Vashishth
Harman Singh
Shikhar Bharadwaj
Sriram Ganapathy
Chulayuth Asawaroengchai
Kartik Audhkhasi
Andrew Rosenberg
Ankur Bapna
Bhuvana Ramabhadran
250
5
0
04 Sep 2024
ICSD: An Open-source Dataset for Infant Cry and Snoring Detection
ICSD: An Open-source Dataset for Infant Cry and Snoring Detection
Qingyu Liu
Longfei Song
Dongxing Xu
Yanhua Long
381
3
0
20 Aug 2024
Predicting Heart Activity from Speech using Data-driven and
  Knowledge-based features
Predicting Heart Activity from Speech using Data-driven and Knowledge-based features
Gasser Elbanna
Z. Mostaani
Mathew Magimai.-Doss
SSL
263
4
0
10 Jun 2024
MAD Speech: Measures of Acoustic Diversity of Speech
MAD Speech: Measures of Acoustic Diversity of Speech
Matthieu Futeral
A. Agostinelli
Marco Tagliasacchi
Neil Zeghidour
Eugene Kharitonov
420
3
0
16 Apr 2024
Exploring the Task-agnostic Trait of Self-supervised Learning in the
  Context of Detecting Mental Disorders
Exploring the Task-agnostic Trait of Self-supervised Learning in the Context of Detecting Mental Disorders
Rohan kumar Gupta
Rohit Sinha
314
0
0
22 Mar 2024
Predicting Generalization of AI Colonoscopy Models to Unseen Data
Predicting Generalization of AI Colonoscopy Models to Unseen Data
Joel Shor
C. McNeil
Yotam Intrator
Joe Ledsam
H. Yamano
...
Masaaki Miyo
Eiji Oki
Ichiro Takemasa
Ehud Rivlin
Roman Goldenberg
260
0
0
14 Mar 2024
HeAR -- Health Acoustic Representations
HeAR -- Health Acoustic Representations
Sebastien Baur
Zaid Nabulsi
Wei-Hung Weng
Jake Garrison
Louis Blankemeier
...
Shwetak N. Patel
S. Shetty
Shruthi Prabhakara
Monde Muyoyeta
Diego Ardila
LM&MA
334
28
0
04 Mar 2024
Tuning In: Analysis of Audio Classifier Performance in Clinical Settings
  with Limited Data
Tuning In: Analysis of Audio Classifier Performance in Clinical Settings with Limited Data
Hamza Mahdi
Eptehal Nashnoush
Rami Saab
Arjun Balachandar
Rishit Dagli
Lucas X. Perri
H. Khosravani
441
1
0
07 Feb 2024
Relationship between auditory and semantic entrainment using Deep Neural
  Networks (DNN)
Relationship between auditory and semantic entrainment using Deep Neural Networks (DNN)
Jay Kejriwal
Štefan Beňuš
232
7
0
27 Dec 2023
The unreasonable effectiveness of AI CADe polyp detectors to generalize
  to new countries
The unreasonable effectiveness of AI CADe polyp detectors to generalize to new countries
Joel Shor
H. Yamano
Daisuke Tsurumaru
Yotam Intrator
Hiroki Kayama
...
Kaho Kobayashi
Eiji Oki
Roman Goldenberg
Ehud Rivlin
Ichiro Takemasa
CML
242
0
0
11 Dec 2023
Reformulating NLP tasks to Capture Longitudinal Manifestation of
  Language Disorders in People with Dementia
Reformulating NLP tasks to Capture Longitudinal Manifestation of Language Disorders in People with Dementia
Dimitris Gkoumas
Matthew Purver
Maria Liakata
238
3
0
15 Oct 2023
A Digital Language Coherence Marker for Monitoring Dementia
A Digital Language Coherence Marker for Monitoring Dementia
Dimitris Gkoumas
Adam Tsakalidis
Maria Liakata
201
2
0
14 Oct 2023
Performance Conditioning for Diffusion-Based Multi-Instrument Music
  Synthesis
Performance Conditioning for Diffusion-Based Multi-Instrument Music SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Ben Maman
Johannes Zeitler
Meinard Muller
Amit H. Bermano
DiffM
222
7
0
21 Sep 2023
Beyond Accuracy: Measuring Representation Capacity of Embeddings to
  Preserve Structural and Contextual Information
Beyond Accuracy: Measuring Representation Capacity of Embeddings to Preserve Structural and Contextual Information
Sarwan Ali
206
0
0
20 Sep 2023
Crowdotic: A Privacy-Preserving Hospital Waiting Room Crowd Density
  Estimation with Non-speech Audio
Crowdotic: A Privacy-Preserving Hospital Waiting Room Crowd Density Estimation with Non-speech AudioWorkshop on Mobile Computing Systems and Applications (HotMobile), 2023
Forsad Al Hossain
Tanjid Hasan Tonmoy
A. Lover
George A. Corey
Mohammad Arif Ul Alam
Tauhidur Rahman
165
4
0
19 Sep 2023
EnCodecMAE: Leveraging neural codecs for universal audio representation
  learning
EnCodecMAE: Leveraging neural codecs for universal audio representation learning
L. Pepino
Pablo Riera
Luciana Ferrer
297
12
0
14 Sep 2023
Optimizing Audio Augmentations for Contrastive Learning of
  Health-Related Acoustic Signals
Optimizing Audio Augmentations for Contrastive Learning of Health-Related Acoustic Signals
Louis Blankemeier
Sebastien Baur
Wei-Hung Weng
Jake Garrison
Yossi Matias
Shruthi Prabhakara
Diego Ardila
Zaid Nabulsi
252
0
0
11 Sep 2023
PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined
  Keywords
PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined KeywordsInterspeech (Interspeech), 2023
Yong-Hyeok Lee
Namhyun Cho
266
32
0
31 Aug 2023
MASR: Multi-label Aware Speech Representation
MASR: Multi-label Aware Speech RepresentationAutomatic Speech Recognition & Understanding (ASRU), 2023
Anjali Raj
Shikhar Bharadwaj
Sriram Ganapathy
Min Ma
Shikhar Vashishth
SSL
216
0
0
20 Jul 2023
Representation Learning With Hidden Unit Clustering For Low Resource
  Speech Applications
Representation Learning With Hidden Unit Clustering For Low Resource Speech ApplicationsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Varun Krishna
T. Sai
Sriram Ganapathy
SSL
205
3
0
14 Jul 2023
Speech-based Age and Gender Prediction with Transformers
Speech-based Age and Gender Prediction with Transformers
Felix Burkhardt
Johannes Wagner
H. Wierstorf
F. Eyben
Björn Schuller
163
30
0
29 Jun 2023
Female mosquito detection by means of AI techniques inside release
  containers in the context of a Sterile Insect Technique program
Female mosquito detection by means of AI techniques inside release containers in the context of a Sterile Insect Technique programEuropean Signal Processing Conference (EUSIPCO), 2023
Javier Naranjo-Alcazar
Jordi Grau-Haro
D. Almenar
P. Zuccarello
129
1
0
19 Jun 2023
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture
  Linguistic Knowledge?
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?Interspeech (Interspeech), 2023
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
Yusuke Ijima
Taichi Asami
Marc Delcroix
Yukinori Honma
SSLELM
292
15
0
14 Jun 2023
Label Aware Speech Representation Learning For Language Identification
Label Aware Speech Representation Learning For Language IdentificationInterspeech (Interspeech), 2023
Shikhar Vashishth
Shikhar Bharadwaj
Sriram Ganapathy
Ankur Bapna
Min Ma
Wei Han
Vera Axelrod
Partha P. Talukdar
SSL
197
4
0
07 Jun 2023
Self-supervised Audio Teacher-Student Transformer for Both Clip-level
  and Frame-level Tasks
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level TasksIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Xian Li
Nian Shao
Xiaofei Li
ViTCLIP
442
52
0
07 Jun 2023
Automatic Data Augmentation for Domain Adapted Fine-Tuning of
  Self-Supervised Speech Representations
Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech RepresentationsInterspeech (Interspeech), 2023
Salah Zaiem
Titouan Parcollet
S. Essid
235
2
0
01 Jun 2023
The Tunnel Effect: Building Data Representations in Deep Neural Networks
The Tunnel Effect: Building Data Representations in Deep Neural NetworksNeural Information Processing Systems (NeurIPS), 2023
Wojciech Masarczyk
M. Ostaszewski
Ehsan Imani
Razvan Pascanu
Piotr Milo's
Tomasz Trzciñski
414
35
0
31 May 2023
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech
  Recognition
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech RecognitionInterspeech (Interspeech), 2023
Wangyou Zhang
Y. Qian
292
12
0
25 May 2023
Happy or Evil Laughter? Analysing a Database of Natural Audio Samples
Happy or Evil Laughter? Analysing a Database of Natural Audio Samples
Aljoscha Dusterhoft
Felix Burkhardt
Björn W. Schuller
134
2
0
23 May 2023
Pengi: An Audio Language Model for Audio Tasks
Pengi: An Audio Language Model for Audio TasksNeural Information Processing Systems (NeurIPS), 2023
Soham Deshmukh
Benjamin Elizalde
Rita Singh
Huaming Wang
MLLMAuLLM
528
268
0
19 May 2023
Self-supervised Neural Factor Analysis for Disentangling Utterance-level
  Speech Representations
Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech RepresentationsInternational Conference on Machine Learning (ICML), 2023
Wei-wei Lin
Chenhang He
Man-Wai Mak
Youzhi Tu
231
6
0
14 May 2023
V2Meow: Meowing to the Visual Beat via Video-to-Music Generation
V2Meow: Meowing to the Visual Beat via Video-to-Music GenerationAAAI Conference on Artificial Intelligence (AAAI), 2023
Kun Su
Judith Yue Li
Qingqing Huang
Dima Kuzmin
Joonseok Lee
...
Fei Sha
A. Jansen
Yu Wang
Mauro Verzetti
Timo I. Denk
VGen
249
26
0
11 May 2023
Emolysis: A Multimodal Open-Source Group Emotion Analysis and
  Visualization Toolkit
Emolysis: A Multimodal Open-Source Group Emotion Analysis and Visualization Toolkit
Shreya Ghosh
Zhixi Cai
Parul Gupta
Garima Sharma
Abhinav Dhall
Munawar Hayat
Tom Gedeon
215
4
0
09 May 2023
Looking Similar, Sounding Different: Leveraging Counterfactual
  Cross-Modal Pairs for Audiovisual Representation Learning
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation LearningComputer Vision and Pattern Recognition (CVPR), 2023
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
465
3
0
12 Apr 2023
Designing and Evaluating Speech Emotion Recognition Systems: A reality
  check case study with IEMOCAP
Designing and Evaluating Speech Emotion Recognition Systems: A reality check case study with IEMOCAPIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Nikolaos Antoniou
Athanasios Katsamanis
Theodoros Giannakopoulos
Shrikanth Narayanan
223
25
0
03 Apr 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
515
76
0
21 Mar 2023
Enhancing Unsupervised Audio Representation Learning via Adversarial
  Sample Generation
Enhancing Unsupervised Audio Representation Learning via Adversarial Sample Generation
Yulin Pan
Xiangteng He
Biao Gong
Yuxin Peng
Yiliang Lv
SSL
153
0
0
15 Mar 2023
Speech Intelligibility Classifiers from 550k Disordered Speech Samples
Speech Intelligibility Classifiers from 550k Disordered Speech SamplesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Subhashini Venugopalan
Jimmy Tobin
Samuel J. Yang
Katie Seaver
Richard Cave
P. Jiang
Neil Zeghidour
Rus Heywood
Jordan R. Green
Michael P. Brenner
304
18
0
13 Mar 2023
Clinical BERTScore: An Improved Measure of Automatic Speech Recognition
  Performance in Clinical Settings
Clinical BERTScore: An Improved Measure of Automatic Speech Recognition Performance in Clinical SettingsClinical Natural Language Processing Workshop (ClinicalNLP), 2023
Joel Shor
R. Bi
Subhashini Venugopalan
Steven Ibara
Roman Goldenberg
Ehud Rivlen
AI4MH
304
13
0
10 Mar 2023
Improving Self-Supervised Learning for Audio Representations by Feature
  Diversity and Decorrelation
Improving Self-Supervised Learning for Audio Representations by Feature Diversity and DecorrelationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Bac Nguyen
Stefan Uhlich
Fabien Cardinaux
SSL
243
4
0
07 Mar 2023
Noise2Music: Text-conditioned Music Generation with Diffusion Models
Noise2Music: Text-conditioned Music Generation with Diffusion Models
Qingqing Huang
Daniel S. Park
Tao Wang
Timo I. Denk
Andy Ly
...
Jesse Engel
Quoc V. Le
William Chan
Zhifeng Chen
Wei Han
MGenDiffM
492
253
0
08 Feb 2023
MusicLM: Generating Music From Text
MusicLM: Generating Music From Text
A. Agostinelli
Timo I. Denk
Zalan Borsos
Jesse Engel
Mauro Verzetti
...
Adam Roberts
Marco Tagliasacchi
Matthew Sharifi
Neil Zeghidour
Christian Frank
MGen
1.1K
647
0
26 Jan 2023
123
Next
Page 1 of 3