ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1603.00982
  4. Cited By
Audio Word2Vec: Unsupervised Learning of Audio Segment Representations
  using Sequence-to-sequence Autoencoder

Audio Word2Vec: Unsupervised Learning of Audio Segment Representations using Sequence-to-sequence Autoencoder

3 March 2016
Yu-An Chung
Chao-Chung Wu
Chia-Hao Shen
Hung-yi Lee
Lin-Shan Lee
    AI4TS
ArXivPDFHTML

Papers citing "Audio Word2Vec: Unsupervised Learning of Audio Segment Representations using Sequence-to-sequence Autoencoder"

50 / 94 papers shown
Title
Visually Grounded Speech Models have a Mutual Exclusivity Bias
Visually Grounded Speech Models have a Mutual Exclusivity Bias
Leanne Nortje
Dan Oneaţă
Yevgen Matusevych
Herman Kamper
SSL
47
0
0
20 Mar 2024
Acoustic models of Brazilian Portuguese Speech based on Neural
  Transformers
Acoustic models of Brazilian Portuguese Speech based on Neural Transformers
M. Gauy
Marcelo Finger
22
2
0
14 Dec 2023
A Quantitative Approach to Understand Self-Supervised Models as
  Cross-lingual Feature Extractors
A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors
Shuyue Stella Li
Beining Xu
Xiangyu Zhang
Hexin Liu
Wen-Han Chao
Leibny Paola García
SSL
37
4
0
27 Nov 2023
Spoken Word2Vec: Learning Skipgram Embeddings from Speech
Spoken Word2Vec: Learning Skipgram Embeddings from Speech
Mohammad Amaan Sayeed
Hanan Aldarmaki
22
0
0
15 Nov 2023
Matching Latent Encoding for Audio-Text based Keyword Spotting
Matching Latent Encoding for Audio-Text based Keyword Spotting
K. Nishu
Minsik Cho
Devang Naik
9
14
0
08 Jun 2023
Towards hate speech detection in low-resource languages: Comparing ASR
  to acoustic word embeddings on Wolof and Swahili
Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili
C. Jacobs
Nathanaël Carraz Rakotonirina
E. Chimoto
Bruce A. Bassett
Herman Kamper
27
5
0
01 Jun 2023
Downstream Task Agnostic Speech Enhancement with Self-Supervised
  Representation Loss
Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
Hiroshi Sato
Ryo Masumura
Tsubasa Ochiai
Marc Delcroix
Takafumi Moriya
...
Kentaro Shinayama
Saki Mizuno
Mana Ihori
Tomohiro Tanaka
Nobukatsu Hojo
37
5
0
24 May 2023
Exploring How Generative Adversarial Networks Learn Phonological
  Representations
Exploring How Generative Adversarial Networks Learn Phonological Representations
Jing Chen
Micha Elsner
GAN
19
3
0
21 May 2023
A Survey on Time-Series Pre-Trained Models
A Survey on Time-Series Pre-Trained Models
Qianli Ma
Ziqiang Liu
Zhenjing Zheng
Ziyang Huang
Siying Zhu
Zhongzhong Yu
James T. Kwok
AI4TS
31
50
0
18 May 2023
End-to-End Speech Recognition: A Survey
End-to-End Speech Recognition: A Survey
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
26
150
0
03 Mar 2023
Supervised Acoustic Embeddings And Their Transferability Across
  Languages
Supervised Acoustic Embeddings And Their Transferability Across Languages
Sreepratha Ram
Hanan Aldarmaki
SSL
24
3
0
03 Jan 2023
TESSP: Text-Enhanced Self-Supervised Speech Pre-training
TESSP: Text-Enhanced Self-Supervised Speech Pre-training
Zhuoyuan Yao
Shuo Ren
Sanyuan Chen
Ziyang Ma
Pengcheng Guo
Linfu Xie
24
5
0
24 Nov 2022
Improving Speech Representation Learning via Speech-level and
  Phoneme-level Masking Approach
Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach
Xulong Zhang
Jianzong Wang
Ning Cheng
Kexin Zhu
Jing Xiao
21
0
0
25 Oct 2022
TVLT: Textless Vision-Language Transformer
TVLT: Textless Vision-Language Transformer
Zineng Tang
Jaemin Cho
Yixin Nie
Joey Tianyi Zhou
VLM
51
28
0
28 Sep 2022
Learning Phone Recognition from Unpaired Audio and Phone Sequences Based
  on Generative Adversarial Network
Learning Phone Recognition from Unpaired Audio and Phone Sequences Based on Generative Adversarial Network
Da-Rong Liu
Po-Chun Hsu
Yi-Chen Chen
Sung-Feng Huang
Shun-Po Chuang
Da-Yi Wu
Hung-yi Lee
GAN
15
7
0
29 Jul 2022
Towards Proper Contrastive Self-supervised Learning Strategies For Music
  Audio Representation
Towards Proper Contrastive Self-supervised Learning Strategies For Music Audio Representation
Jeong-Eun Choi
Seongwon Jang
Hyunsouk Cho
Sehee Chung
SSL
16
6
0
10 Jul 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
137
350
0
21 May 2022
Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning
Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning
Algayres Robin
Adel Nabli
Benoît Sagot
Emmanuel Dupoux
SSL
23
8
0
11 Apr 2022
Adding Connectionist Temporal Summarization into Conformer to Improve
  Its Decoder Efficiency For Speech Recognition
Adding Connectionist Temporal Summarization into Conformer to Improve Its Decoder Efficiency For Speech Recognition
N. J. Wang
Zongfeng Quan
Shaojun Wang
Jing Xiao
23
1
0
08 Apr 2022
Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings
Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings
Myunghun Jung
Hoirin Kim
19
3
0
30 Mar 2022
Modeling speech recognition and synthesis simultaneously: Encoding and
  decoding lexical and sublexical semantic information into speech with no
  direct access to speech data
Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data
Gašper Beguš
Alan Zhou
SSL
27
4
0
22 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning
A Brief Overview of Unsupervised Neural Speech Representation Learning
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
Lars Maaløe
Christian Igel
BDL
AI4TS
SSL
19
11
0
01 Mar 2022
On Training Targets and Activation Functions for Deep Representation
  Learning in Text-Dependent Speaker Verification
On Training Targets and Activation Functions for Deep Representation Learning in Text-Dependent Speaker Verification
A. Sarkar
Zheng-Hua Tan
16
2
0
17 Jan 2022
Deep Spoken Keyword Spotting: An Overview
Deep Spoken Keyword Spotting: An Overview
Iván López-Espejo
Zheng-Hua Tan
John H. L. Hansen
Jesper Jensen
21
100
0
20 Nov 2021
SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text
  Joint Pre-Training
SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training
Ankur Bapna
Yu-An Chung
Na Wu
Anmol Gulati
Ye Jia
J. Clark
Melvin Johnson
Jason Riesa
Alexis Conneau
Yu Zhang
VLM
61
94
0
20 Oct 2021
Interpreting intermediate convolutional layers in unsupervised acoustic
  word classification
Interpreting intermediate convolutional layers in unsupervised acoustic word classification
Gašper Beguš
Alan Zhou
FAtt
SSL
33
5
0
05 Oct 2021
Modeling Dynamics of Facial Behavior for Mental Health Assessment
Modeling Dynamics of Facial Behavior for Mental Health Assessment
Minh Tran
Ellen R. Bradley
Michelle Matvey
J. Woolley
M. Soleymani
CVBM
17
3
0
23 Aug 2021
Dropout Regularization for Self-Supervised Learning of Transformer
  Encoder Speech Representation
Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation
Jian Luo
Jianzong Wang
Ning Cheng
Jing Xiao
SSL
27
6
0
09 Jul 2021
Unsupervised Automatic Speech Recognition: A Review
Unsupervised Automatic Speech Recognition: A Review
Hanan Aldarmaki
Asad Ullah
Nazar Zaki
VLM
SSL
39
56
0
09 Jun 2021
A Novel Semi-supervised Framework for Call Center Agent Malpractice
  Detection via Neural Feature Learning
A Novel Semi-supervised Framework for Call Center Agent Malpractice Detection via Neural Feature Learning
cSukru Ozan
Leonardo O. Iheme
12
4
0
04 Jun 2021
Unsupervised Discriminative Learning of Sounds for Audio Event
  Classification
Unsupervised Discriminative Learning of Sounds for Audio Event Classification
Sascha Hornauer
Ke Li
Stella X. Yu
Shabnam Ghaffarzadegan
Liu Ren
SSL
26
5
0
19 May 2021
Interpreting intermediate convolutional layers of generative CNNs
  trained on waveforms
Interpreting intermediate convolutional layers of generative CNNs trained on waveforms
Gašper Beguš
Alan Zhou
27
7
0
19 Apr 2021
Cetacean Translation Initiative: a roadmap to deciphering the
  communication of sperm whales
Cetacean Translation Initiative: a roadmap to deciphering the communication of sperm whales
Jacob Andreas
Gašper Beguš
M. Bronstein
R. Diamant
Denley Delaney
...
D. Tchernov
P. Tønnesen
Antonio Torralba
Daniel M. Vogt
Robert J. Wood
43
10
0
17 Apr 2021
Utilizing Self-supervised Representations for MOS Prediction
Utilizing Self-supervised Representations for MOS Prediction
Wei-Cheng Tseng
Chien-yu Huang
Wei-Tsung Kao
Yist Y. Lin
Hung-yi Lee
SSL
27
63
0
07 Apr 2021
Auto-KWS 2021 Challenge: Task, Datasets, and Baselines
Auto-KWS 2021 Challenge: Task, Datasets, and Baselines
Jingsong Wang
Yuxuan He
Chunyu Zhao
Qijie Shao
Wei-Wei Tu
Tom Ko
Hung-yi Lee
Lei Xie
26
4
0
31 Mar 2021
Broad-UNet: Multi-scale feature learning for nowcasting tasks
Broad-UNet: Multi-scale feature learning for nowcasting tasks
Jesús García Fernández
S. Mehrkanoon
27
66
0
12 Feb 2021
A comparison of self-supervised speech representations as input features
  for unsupervised acoustic word embeddings
A comparison of self-supervised speech representations as input features for unsupervised acoustic word embeddings
Lisa van Staden
Herman Kamper
SSL
31
16
0
14 Dec 2020
Acoustic span embeddings for multilingual query-by-example search
Acoustic span embeddings for multilingual query-by-example search
Yushi Hu
Shane Settle
Karen Livescu
RALM
17
8
0
24 Nov 2020
Stabilizing Label Assignment for Speech Separation by Self-supervised
  Pre-training
Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training
Sung-Feng Huang
Shun-Po Chuang
Da-Rong Liu
Yi-Chen Chen
Gene-Ping Yang
Hung-yi Lee
SSL
39
14
0
29 Oct 2020
Probing Acoustic Representations for Phonetic Properties
Probing Acoustic Representations for Phonetic Properties
Danni Ma
Neville Ryant
M. Liberman
25
45
0
25 Oct 2020
Contrastive Learning of General-Purpose Audio Representations
Contrastive Learning of General-Purpose Audio Representations
Aaqib Saeed
David Grangier
Neil Zeghidour
VLM
SSL
24
262
0
21 Oct 2020
Identity-Based Patterns in Deep Convolutional Networks: Generative
  Adversarial Phonology and Reduplication
Identity-Based Patterns in Deep Convolutional Networks: Generative Adversarial Phonology and Reduplication
Gašper Beguš
GAN
SSL
8
15
0
13 Sep 2020
Automatic Detection of Phonological Errors in Child Speech Using Siamese
  Recurrent Autoencoder
Automatic Detection of Phonological Errors in Child Speech Using Siamese Recurrent Autoencoder
Si-Ioi Ng
Tan Lee
9
7
0
07 Aug 2020
Evaluating computational models of infant phonetic learning across
  languages
Evaluating computational models of infant phonetic learning across languages
Yevgen Matusevych
Thomas Schatz
Herman Kamper
Naomi H Feldman
Sharon Goldwater
19
14
0
06 Aug 2020
Evaluating the reliability of acoustic speech embeddings
Evaluating the reliability of acoustic speech embeddings
Robin Algayres
Mohamed Salah Zaiem
Benoît Sagot
Emmanuel Dupoux
38
29
0
27 Jul 2020
Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings
Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings
Bowen Shi
Shane Settle
Karen Livescu
22
4
0
01 Jul 2020
CiwGAN and fiwGAN: Encoding information in acoustic data to model
  lexical learning with Generative Adversarial Networks
CiwGAN and fiwGAN: Encoding information in acoustic data to model lexical learning with Generative Adversarial Networks
Gašper Beguš
GAN
6
33
0
04 Jun 2020
High-Fidelity Audio Generation and Representation Learning with Guided
  Adversarial Autoencoder
High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder
Kazi Nazmul Haque
R. Rana
Björn W Schuller
DRL
26
12
0
01 Jun 2020
Improved Speech Representations with Multi-Target Autoregressive
  Predictive Coding
Improved Speech Representations with Multi-Target Autoregressive Predictive Coding
Yu-An Chung
James R. Glass
SSL
15
56
0
11 Apr 2020
Analyzing autoencoder-based acoustic word embeddings
Analyzing autoencoder-based acoustic word embeddings
Yevgen Matusevych
Herman Kamper
Sharon Goldwater
30
12
0
03 Apr 2020
12
Next