v1v2v3v4 (latest)

Dawn of the transformer era in speech emotion recognition: closing the valence gap

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

14 March 2022

Johannes Wagner

Andreas Triantafyllopoulos

Björn W. Schuller

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "Dawn of the transformer era in speech emotion recognition: closing the valence gap"

50 / 130 papers shown

Title
SAR-LM: Symbolic Audio Reasoning with Large Language Models Termeh Taheri Yinghao Ma Emmanouil Benetos AuLLM LRM 130 0 0 09 Nov 2025
Transformer Redesign for Late Fusion of Audio-Text Features on Ultra-Low-Power Edge Hardware Stavros Mitsis Ermos Hadjikyriakos Humaid Ibrahim Savvas Neofytou Shashwat Raman James Myles Eiman Kanjo 74 0 0 20 Oct 2025
Switchboard-Affect: Emotion Perception Labels from Conversational Speech Amrit Romana Jaya Narain Tien Dung Tran Andrea Davis Jason Fong Ramya Rasipuram Vikramjit Mitra 60 1 0 14 Oct 2025
Improving Speech Emotion Recognition with Mutual Information Regularized Generative Model Chung-Soo Ahn R. Rana Sunil Sivadas Carlos Busso Jagath Rajapakse 65 0 0 11 Oct 2025
Deceptive Exploration in Multi-armed Bandits I. Arda Vurankaya Mustafa O. Karabag Wesley A Suttle Jesse Milzman David Fridovich-Keil Ufuk Topcu 60 0 0 09 Oct 2025
SEER: The Span-based Emotion Evidence Retrieval Benchmark Aneesha Sampath Oya Aran E. Provost RALM LRM 132 0 0 03 Oct 2025
SynchroRaMa : Lip-Synchronized and Emotion-Aware Talking Face Generation via Multi-Modal Emotion Embedding Phyo Thet Yee D. Kollias Sudeepta Mishra Abhinav Dhall VGen 88 2 0 24 Sep 2025
More Similar than Dissimilar: Modeling Annotators for Cross-Corpus Speech Emotion Recognition James Tavernor E. Provost 64 0 0 15 Sep 2025
Joint Effects of Argumentation Theory, Audio Modality and Data Enrichment on LLM-Based Fallacy Classification Hongxu Zhou Hylke Westerdijk Khondoker Ittehadul Islam 36 0 0 14 Sep 2025
Emoanti: audio anti-deepfake with refined emotion-guided representations Xiaokang Li Yicheng Gong Dinghao Zou Xin Cao Sunbowen Lee 80 0 0 13 Sep 2025
The MSP-Podcast Corpus John H. L. Hansen Reza Lotfian K. Sridhar Ali N. Salman Wei-Cheng Lin ... Abinay Reddy Naini Seong-Gyun Leem Luz Martinez-Lucas Huang-Cheng Chou Pravin Mote 68 3 0 11 Sep 2025
Speech-Based Depressive Mood Detection in the Presence of Multiple Sclerosis: A Cross-Corpus and Cross-Lingual Study Monica Gonzalez-Machorro U. Reichel Pascal Hecker Helly Hammer Hesam Sagha F. Eyben Robert Hoepner Björn Schuller 32 1 0 25 Aug 2025
EmoTale: An Enacted Speech-emotion Dataset in Danish Maja J. Hjuler Harald V. Skat-Rørdam Line H. Clemmensen Sneha Das 60 1 0 20 Aug 2025
EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition Hugo Thimonier Antony Perzo Renaud Seguier 96 1 0 19 Aug 2025
RankList -- A Listwise Preference Learning Framework for Predicting Subjective Preferences Abinay Reddy Naini Fernando Diaz John H. L. Hansen 72 0 0 13 Aug 2025
ProMode: A Speech Prosody Model Conditioned on Acoustic and Textual Inputs Eray Eren Qingju Liu Hyeongwoo Kim Pablo Garrido Abeer Alwan 78 0 0 12 Aug 2025
Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models Qiongqiong Wang Hardik B. Sailor Jeremy H.M Wong Tianchi Liu Shuo Sun Wenyu Zhang Muhammad Huzaifah Nancy F. Chen Ai Ti Aw AuLLM 79 1 0 10 Aug 2025
Charting 15 years of progress in deep learning for speech emotion recognition: A replication study Andreas Triantafyllopoulos A. Batliner B. Schuller AI4TS 125 0 0 04 Aug 2025
HateClipSeg: A Segment-Level Annotated Dataset for Fine-Grained Hate Video Detection Huaimin Wang Zhuoran Wang Roy Ka-wei Lee VLM 92 1 0 03 Aug 2025
Unifying Listener Scoring Scales: Comparison Learning Framework for Speech Quality Assessment and Continuous Speech Emotion Recognition Cheng-Hung Hu Yusuke Yasuda Akifumi Yoshimoto Tomoki Toda 160 0 0 18 Jul 2025
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems Kexin Huang Qian Tu Liwei Fan Chenchen Yang Dong Zhang Shimin Li Zhaoye Fei Qinyuan Cheng Xipeng Qiu 147 4 0 19 Jun 2025
MELT: Towards Automated Multimodal Emotion Data Annotation by Leveraging LLM Embedded Knowledge Xin Jing Jiadong Wang Iosif Tsangko Andreas Triantafyllopoulos Björn Schuller 144 0 0 30 May 2025
Learning Annotation Consensus for Continuous Emotion Recognition Ibrahim Shoer E. Erzin 182 0 0 27 May 2025
Rhapsody: A Dataset for Highlight Detection in Podcasts Younghan Park Anuj Diwan David Harwath Eunsol Choi 184 0 0 26 May 2025
Contrastive Distillation of Emotion Knowledge from LLMs for Zero-Shot Emotion Recognition Minxue Niu E. Provost VLM 290 0 0 23 May 2025
Bridging Speech Emotion Recognition and Personality: Dataset and Temporal Interaction Condition Network Yuan Gao Hao Shi Yahui Fu Chenhui Chu Tatsuya Kawahara 208 0 0 20 May 2025
Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits Tiantian Feng Jihwan Lee Anfeng Xu Yoonjeong Lee Thanathai Lertpetchpun ... Thomas Thebaud Laureano Moro-Velazquez D. Byrd Najim Dehak Zengyi Qin 210 9 0 20 May 2025
Contextual Paralinguistic Data Creation for Multi-Modal Speech-LLM: Data Condensation and Spoken QA Generation Qiongqiong Wang Hardik B. Sailor Tianchi Liu Ai Ti Aw 232 4 0 19 May 2025
Multimodal Emotion Coupling via Speech-to-Facial and Bodily Gestures in Dyadic Interaction Von Ralph Dane Marquez Herbuela Yukie Nagai CVBM 75 0 0 08 May 2025
BLAB: Brutally Long Audio Bench Orevaoghene Ahia Martijn Bartelds Kabir Ahuja Hila Gonen Valentin Hofmann ... Noah Bennett Shinji Watanabe Noah A. Smith Yulia Tsvetkov Sachin Kumar AuLLM LM&MA VLM 402 2 0 05 May 2025
BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech RecognitionComputer Speech and Language (CSL), 2025 Paige Tuttosi Mantaj Dhillon Luna Sang Shane Eastwood Poorvi Bhatia Quang Minh Dinh Avni Kapoor Yewon Jin Angelica Lim 295 2 0 30 Apr 2025
Spatiotemporal Emotional Synchrony in Dyadic Interactions: The Role of Speech Conditions in Facial and Vocal Affective Alignment Von Ralph Dane Marquez Herbuela Yukie Nagai 114 0 0 29 Apr 2025
Affect Models Have Weak Generalizability to Atypical Speech Jaya Narain Amrit Romana Vikramjit Mitra Colin S. Lea Shirley Ren 136 0 0 22 Apr 2025
Exploring Local Interpretable Model-Agnostic Explanations for Speech Emotion Recognition with Distribution-ShiftIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025 Maja J. Hjuler Line H. Clemmensen Sneha Das FAtt 262 3 0 07 Apr 2025
Interactive Multimodal Fusion with Temporal Modeling Jun-chen Yu Yongqi Wang Lei Wang Yang Zheng Shengfan Xu 193 4 0 13 Mar 2025
Scaling Rich Style-Prompted Text-to-Speech Datasets Anuj Diwan Zhisheng Zheng David Harwath Eunsol Choi CLIP VLM 322 12 0 06 Mar 2025
Efficient Finetuning for Dimensional Speech Emotion Recognition in the Age of TransformersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025 Aneesha Sampath James Tavernor E. Provost 287 3 0 17 Feb 2025
autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks Simon Rampp Andreas Triantafyllopoulos M. Milling Björn Schuller 454 1 0 16 Dec 2024
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical VectorIEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2024 Deok-Hyeon Cho Hyung-Seok Oh Seung-Bin Kim Seong-Whan Lee 317 21 0 04 Nov 2024
NTU-NPU System for Voice Privacy 2024 Challenge Nikita Kuzmin Hieu-Thi Luong Jixun Yao Lei Xie Kong Aik Lee Eng Siong Chng 210 5 0 03 Oct 2024
Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions Kun Zhou You Zhang Shengkui Zhao Hao Wang Zexu Pan ... Chongjia Ni Yukun Ma Trung Hieu Nguyen J. Yip Bin Ma 220 10 0 25 Sep 2024
Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System PerformanceIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 Huang-Cheng Chou Haibin Wu Hung-yi Lee Chi-Chun Lee 310 3 0 16 Sep 2024
Dynamics of Collective Group Affect: Group-level Annotations and the Multimodal Modeling of Convergence and Divergence N. Prabhu Maria Tsfasman Catharine Oertel Timo Gerkmann N. Lehmann-Willenbrock 104 2 0 13 Sep 2024
Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 Xin Jing Kun Zhou Andreas Triantafyllopoulos Björn W. Schuller DiffM 156 6 0 10 Sep 2024
Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving Speaker AnonymizationSpoken Language Technology Workshop (SLT), 2024 Zexin Cai Lin Zhang Ashi Garg Leibny Paola García-Perera Kevin Duh Sanjeev Khudanpur Nicholas Andrews Sanjeev Khudanpur 83 10 0 05 Sep 2024
Wav2Small: Distilling Wav2Vec2 to 72K parameters for Low-Resource Speech emotion recognition Dionyssos Kounadis-Bastian Oliver Schrufer Anna Derington H. Wierstorf F. Eyben Felix Burkhardt Björn Schuller 209 1 0 25 Aug 2024
The Whole Is Bigger Than the Sum of Its Parts: Modeling Individual Annotators to Capture Emotional VariabilityInterspeech (Interspeech), 2024 James Tavernor Yara S. El-Tawil E. Provost 129 3 0 21 Aug 2024
Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample ImportanceJournal of Computational Science and Technology (JCST), 2024 M. Milling Shuo Liu Andreas Triantafyllopoulos Ilhan Aslan Björn W. Schuller 240 4 0 12 Aug 2024
Conditioning LLMs with Emotion in Neural Machine TranslationInternational Workshop on Spoken Language Translation (IWSLT), 2024 Charles Brazier Jean-Luc Rouas CVBM 192 2 0 06 Aug 2024
Describe Where You Are: Improving Noise-Robustness for Speech Emotion Recognition with Text Description of the EnvironmentIEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2024 Seong-Gyun Leem Daniel Fulford J. Onnela David Gard John H. L. Hansen 183 2 0 25 Jul 2024