ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.01778
  4. Cited By
AST: Audio Spectrogram Transformer

AST: Audio Spectrogram Transformer

5 April 2021
Yuan Gong
Yu-An Chung
James R. Glass
    ViT
ArXivPDFHTML

Papers citing "AST: Audio Spectrogram Transformer"

50 / 142 papers shown
Title
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
Chenyu Wang
Weixin Luo
Qianyu Chen
Haonan Mai
Jindi Guo
Sixun Dong
Xiaohua Xuan
MLLM
LLMAG
44
19
0
19 Jan 2024
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
23
17
0
27 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
21
64
0
07 Nov 2023
AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting
  Multiple Experts for Video Deepfake Detection
AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection
Ammarah Hashmi
Sahibzada Adil Shahzad
Chia-Wen Lin
Yu Tsao
Hsin-Min Wang
ViT
37
6
0
19 Oct 2023
In-Context Learning for Few-Shot Molecular Property Prediction
In-Context Learning for Few-Shot Molecular Property Prediction
Christopher Fifty
J. Leskovec
Sebastian Thrun
34
5
0
13 Oct 2023
MuseChat: A Conversational Music Recommendation System for Videos
MuseChat: A Conversational Music Recommendation System for Videos
Zhikang Dong
Bin Chen
Xiulong Liu
Paweł Polak
Peng Zhang
LRM
37
26
0
10 Oct 2023
Improving Discriminative Multi-Modal Learning with Large-Scale
  Pre-Trained Models
Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models
Chenzhuang Du
Yue Zhao
Chonghua Liao
Jiacheng You
Jie Fu
Hang Zhao
27
2
0
08 Oct 2023
Efficient Supervised Training of Audio Transformers for Music
  Representation Learning
Efficient Supervised Training of Audio Transformers for Music Representation Learning
Pablo Alonso-Jiménez
Xavier Serra
Dmitry Bogdanov
ViT
24
3
0
28 Sep 2023
Semantic Proximity Alignment: Towards Human Perception-consistent Audio
  Tagging by Aligning with Label Text Description
Semantic Proximity Alignment: Towards Human Perception-consistent Audio Tagging by Aligning with Label Text Description
Youbin Jeon
Yanzhen Ren
VLM
24
0
0
28 Sep 2023
Joint Audio and Speech Understanding
Joint Audio and Speech Understanding
Yuan Gong
Alexander H. Liu
Hongyin Luo
Leonid Karlinsky
James R. Glass
AuLLM
26
66
0
25 Sep 2023
Parameter Efficient Audio Captioning With Faithful Guidance Using
  Audio-text Shared Latent Representation
Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
A. Sridhar
Yinyi Guo
Erik M. Visser
Rehana Mahfuz
24
5
0
06 Sep 2023
AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
Nan Che
Chenrui Liu
Fei Yu
25
0
0
30 Aug 2023
Mobile Foundation Model as Firmware
Mobile Foundation Model as Firmware
Jinliang Yuan
Chenchen Yang
Dongqi Cai
Shihe Wang
Xin Yuan
...
Di Zhang
Hanzi Mei
Xianqing Jia
Shangguang Wang
Mengwei Xu
32
19
0
28 Aug 2023
Audio Difference Captioning Utilizing Similarity-Discrepancy
  Disentanglement
Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement
Daiki Takeuchi
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
K. Kashino
17
6
0
23 Aug 2023
AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Zhaohui Li
Haitao Wang
Xinghua Jiang
31
1
0
14 Aug 2023
Advancing Natural-Language Based Audio Retrieval with PaSST and Large
  Audio-Caption Data Sets
Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets
Paul Primus
Khaled Koutini
Gerhard Widmer
19
13
0
08 Aug 2023
Cascaded Cross-Modal Transformer for Request and Complaint Detection
Cascaded Cross-Modal Transformer for Request and Complaint Detection
Nicolae-Cătălin Ristea
Radu Tudor Ionescu
18
3
0
27 Jul 2023
A Snoring Sound Dataset for Body Position Recognition: Collection,
  Annotation, and Analysis
A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis
Li Xiao
Xiuping Yang
Xinhong Li
Weiping Tu
Xiong Chen
Weiyan Yi
Jie Lin
Yuhong Yang
Yanzhen Ren
13
2
0
25 Jul 2023
Improving Domain Generalization for Sound Classification with Sparse
  Frequency-Regularized Transformer
Improving Domain Generalization for Sound Classification with Sparse Frequency-Regularized Transformer
Honglin Mu
Wentian Xia
Wanxiang Che
10
1
0
19 Jul 2023
Channel-Spatial-Based Few-Shot Bird Sound Event Detection
Channel-Spatial-Based Few-Shot Bird Sound Event Detection
Lingwen Liu
Yuxuan Feng
Haitao Fu
Yajie Yang
Xin Pan
Chenlei Jin
20
0
0
18 Jun 2023
Learning Local to Global Feature Aggregation for Speech Emotion
  Recognition
Learning Local to Global Feature Aggregation for Speech Emotion Recognition
Cheng Lu
Hailun Lian
Wenming Zheng
Yuan Zong
Yan Zhao
Sunan Li
ViT
11
7
0
02 Jun 2023
Contrastive Speech Mixup for Low-resource Keyword Spotting
Contrastive Speech Mixup for Low-resource Keyword Spotting
Dianwen Ng
Ruixi Zhang
J. Yip
Chong Zhang
Yukun Ma
Trung Hieu Nguyen
Chongjia Ni
E. Chng
B. Ma
30
10
0
02 May 2023
Transformer-based Sequence Labeling for Audio Classification based on MFCCs
C. Sonali
S. ChinmayiB
A. Balasubramanian
24
0
0
30 Apr 2023
Denoising Cosine Similarity: A Theory-Driven Approach for Efficient
  Representation Learning
Denoising Cosine Similarity: A Theory-Driven Approach for Efficient Representation Learning
Takumi Nakagawa
Y. Sanada
Hiroki Waida
Yuhui Zhang
Yuichiro Wada
K. Takanashi
Tomonori Yamada
Takafumi Kanamori
DiffM
19
5
0
19 Apr 2023
$β$-Variational autoencoders and transformers for reduced-order
  modelling of fluid flows
βββ-Variational autoencoders and transformers for reduced-order modelling of fluid flows
Alberto Solera-Rico
Carlos Sanmiguel Vila
Miguel Gómez-López
Yuning Wang
Abdulrahman Almashjary
Scott T. M. Dawson
Ricardo Vinuesa
DRL
13
73
0
07 Apr 2023
Efficient Audio Captioning Transformer with Patchout and Text Guidance
Efficient Audio Captioning Transformer with Patchout and Text Guidance
Thodoris Kouzelis
Grigoris Bastas
Athanasios Katsamanis
Alexandros Potamianos
ViT
10
6
0
06 Apr 2023
Personality-aware Human-centric Multimodal Reasoning: A New Task,
  Dataset and Baselines
Personality-aware Human-centric Multimodal Reasoning: A New Task, Dataset and Baselines
Yaochen Zhu
Xiangqing Shen
Rui Xia
19
5
0
05 Apr 2023
Machine Learning for Brain Disorders: Transformers and Visual
  Transformers
Machine Learning for Brain Disorders: Transformers and Visual Transformers
Robin Courant
Maika Edberg
Nicolas Dufour
Vicky Kalogeiton
MedIm
ViT
27
1
0
21 Mar 2023
Multiscale Audio Spectrogram Transformer for Efficient Audio
  Classification
Multiscale Audio Spectrogram Transformer for Efficient Audio Classification
Wenjie Zhu
M. Omar
35
22
0
19 Mar 2023
CAT: Causal Audio Transformer for Audio Classification
CAT: Causal Audio Transformer for Audio Classification
Xiaoyu Liu
Hanlin Lu
Jianbo Yuan
Xinyu Li
ViT
21
22
0
14 Mar 2023
Accommodating Audio Modality in CLIP for Multimodal Processing
Accommodating Audio Modality in CLIP for Multimodal Processing
Ludan Ruan
Anwen Hu
Yuqing Song
Liang Zhang
S. Zheng
Qin Jin
VLM
16
10
0
12 Mar 2023
Heterogeneous Graph Learning for Acoustic Event Classification
Heterogeneous Graph Learning for Acoustic Event Classification
A. Shirian
Mona Ahmadian
Krishna Somandepalli
T. Guha
25
2
0
05 Mar 2023
Low-Complexity Audio Embedding Extractors
Low-Complexity Audio Embedding Extractors
Florian Schmid
Khaled Koutini
Gerhard Widmer
16
4
0
03 Mar 2023
Unified Keyword Spotting and Audio Tagging on Mobile Devices with
  Transformers
Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
27
4
0
03 Mar 2023
SemanticAC: Semantics-Assisted Framework for Audio Classification
SemanticAC: Semantics-Assisted Framework for Audio Classification
Yicheng Xiao
Yue Ma
Shuyan Li
Hantao Zhou
Ran Liao
Xiu Li
13
8
0
12 Feb 2023
Revisiting Pre-training in Audio-Visual Learning
Revisiting Pre-training in Audio-Visual Learning
Ruoxuan Feng
Wenke Xia
Di Hu
17
1
0
07 Feb 2023
Does compressing activations help model parallel training?
Does compressing activations help model parallel training?
S. Bian
Dacheng Li
Hongyi Wang
Eric P. Xing
Shivaram Venkataraman
19
4
0
06 Jan 2023
Federated Learning for Inference at Anytime and Anywhere
Federated Learning for Inference at Anytime and Anywhere
Zicheng Liu
Da Li
Javier Fernandez-Marques
Stefanos Laskaridis
Yan Gao
L. Dudziak
Stan Z. Li
S. Hu
Timothy M. Hospedales
FedML
16
5
0
08 Dec 2022
FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance
  Generation
FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation
Ronghui Li
Junfan Zhao
Yachao Zhang
Mingyang Su
Zeping Ren
Han Zhang
Yansong Tang
Xiuhua Li
DiffM
23
50
0
07 Dec 2022
Music Instrument Classification Reprogrammed
Music Instrument Classification Reprogrammed
Hsin-Hung Chen
Alexander Lerch
19
4
0
15 Nov 2022
The Birds Need Attention Too: Analysing usage of Self Attention in
  identifying bird calls in soundscapes
The Birds Need Attention Too: Analysing usage of Self Attention in identifying bird calls in soundscapes
Chandra Kanth Nagesh
Abhishek Purushothama
11
2
0
14 Nov 2022
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge
  Distillation
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation
Florian Schmid
Khaled Koutini
Gerhard Widmer
ViT
20
58
0
09 Nov 2022
Effective Audio Classification Network Based on Paired Inverse Pyramid
  Structure and Dense MLP Block
Effective Audio Classification Network Based on Paired Inverse Pyramid Structure and Dense MLP Block
Yunhao Chen
Yunjie Zhu
Zihui Yan
Yifan Huang
Zhen Ren
Jianlu Shen
Lifang Chen
20
9
0
05 Nov 2022
Low-Resource Music Genre Classification with Cross-Modal Neural Model
  Reprogramming
Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming
Yun-Ning Hung
Chao-Han Huck Yang
Pin-Yu Chen
Alexander Lerch
21
17
0
02 Nov 2022
Audio MFCC-gram Transformers for respiratory insufficiency detection in
  COVID-19
Audio MFCC-gram Transformers for respiratory insufficiency detection in COVID-19
M. Gauy
Marcelo Finger
16
7
0
25 Oct 2022
GCT: Gated Contextual Transformer for Sequential Audio Tagging
GCT: Gated Contextual Transformer for Sequential Audio Tagging
Yuanbo Hou
Yun Wang
Wenwu Wang
Dick Botteldooren
20
0
0
22 Oct 2022
PSVRF: Learning to restore Pitch-Shifted Voice without reference
Yangfu Li
Xiaodan Lin
Jiaxin Yang
11
0
0
06 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David F. Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
24
119
0
02 Oct 2022
An empirical study of weakly supervised audio tagging embeddings for
  general audio representations
An empirical study of weakly supervised audio tagging embeddings for general audio representations
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
32
1
0
30 Sep 2022
TVLT: Textless Vision-Language Transformer
TVLT: Textless Vision-Language Transformer
Zineng Tang
Jaemin Cho
Yixin Nie
Mohit Bansal
VLM
49
28
0
28 Sep 2022
Previous
123
Next