ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.14368
  4. Cited By
VGGSound: A Large-scale Audio-Visual Dataset

VGGSound: A Large-scale Audio-Visual Dataset

29 April 2020
Honglie Chen
Weidi Xie
Andrea Vedaldi
Andrew Zisserman
ArXivPDFHTML

Papers citing "VGGSound: A Large-scale Audio-Visual Dataset"

50 / 138 papers shown
Title
CLARA: Multilingual Contrastive Learning for Audio Representation
  Acquisition
CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition
K. A. Noriy
Xiaosong Yang
Marcin Budka
Jian Jun Zhang
VLM
29
3
0
18 Oct 2023
Extending Multi-modal Contrastive Representations
Extending Multi-modal Contrastive Representations
Zehan Wang
Ziang Zhang
Luping Liu
Yang Zhao
Haifeng Huang
Tao Jin
Zhou Zhao
31
5
0
13 Oct 2023
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language
  Models
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
Sreyan Ghosh
Ashish Seth
Sonal Kumar
Utkarsh Tyagi
Chandra Kiran Reddy Evuru
S. Ramaneswaran
S. Sakshi
Oriol Nieto
R. Duraiswami
Dinesh Manocha
AuLLM
VLM
CoGe
48
23
0
12 Oct 2023
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Zhaofeng Shi
Qingbo Wu
Fanman Meng
Linfeng Xu
Hongliang Li
VOS
35
3
0
10 Oct 2023
Fine-grained Audio-Visual Joint Representations for Multimodal Large
  Language Models
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models
Guangzhi Sun
Wenyi Yu
Changli Tang
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
41
12
0
09 Oct 2023
Joint Audio and Speech Understanding
Joint Audio and Speech Understanding
Yuan Gong
Alexander H. Liu
Hongyin Luo
Leonid Karlinsky
James R. Glass
AuLLM
30
69
0
25 Sep 2023
Sound Source Localization is All about Cross-Modal Alignment
Sound Source Localization is All about Cross-Modal Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
36
18
0
19 Sep 2023
The Power of Sound (TPoS): Audio Reactive Video Generation with Stable
  Diffusion
The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion
Yujin Jeong
Won-Wha Ryoo
Seunghyun Lee
Dabin Seo
Wonmin Byeon
Sangpil Kim
Jinkyu Kim
DiffM
32
29
0
08 Sep 2023
AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
Nan Che
Chenrui Liu
Fei Yu
33
0
0
30 Aug 2023
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
Hong Li
Xingyu Li
Pengbo Hu
Yinuo Lei
Chunxiao Li
Yi Zhou
49
22
0
15 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
  Pretraining
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
47
224
0
10 Aug 2023
UniBriVL: Robust Universal Representation and Generation of Audio Driven
  Diffusion Models
UniBriVL: Robust Universal Representation and Generation of Audio Driven Diffusion Models
Sen Fang
Bowen Gao
Yangjian Wu
T. Teoh
DiffM
34
1
0
29 Jul 2023
A Unified Audio-Visual Learning Framework for Localization, Separation,
  and Recognition
A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition
Shentong Mo
Pedro Morgado
38
21
0
30 May 2023
AudioToken: Adaptation of Text-Conditioned Diffusion Models for
  Audio-to-Image Generation
AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation
Guy Yariv
Itai Gat
Lior Wolf
Yossi Adi
Idan Schwartz
DiffM
39
21
0
22 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
53
116
0
18 May 2023
A Comparative Study of Pre-trained Speech and Audio Embeddings for
  Speech Emotion Recognition
A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition
Orchid Chetia Phukan
Arun Balaji Buduru
Rajesh Sharma
28
6
0
22 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual
  Cross-Modal Pairs for Audiovisual Representation Learning
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
37
2
0
12 Apr 2023
Posthoc Interpretation via Quantization
Posthoc Interpretation via Quantization
Francesco Paissan
Cem Subakan
Mirco Ravanelli
MQ
24
6
0
22 Mar 2023
Multiscale Audio Spectrogram Transformer for Efficient Audio
  Classification
Multiscale Audio Spectrogram Transformer for Efficient Audio Classification
Wenjie Zhu
M. Omar
37
22
0
19 Mar 2023
Enhancing Unsupervised Audio Representation Learning via Adversarial
  Sample Generation
Enhancing Unsupervised Audio Representation Learning via Adversarial Sample Generation
Yulin Pan
Xiangteng He
Biao Gong
Yuxin Peng
Yiliang Lv
SSL
24
0
0
15 Mar 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet
  Tag-guided Synthetic Data
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
Xuenan Xu
Zhiling Zhang
Zelin Zhou
Pingyue Zhang
Zeyu Xie
Mengyue Wu
Ke Zhu
CLIP
78
14
0
14 Mar 2023
Exploring Efficient-Tuned Learning Audio Representation Method from
  BriVL
Exploring Efficient-Tuned Learning Audio Representation Method from BriVL
Sen Fang
Yang Wu
Bowen Gao
Jingwen Cai
T. Teoh
DiffM
29
1
0
08 Mar 2023
Poisoning Web-Scale Training Datasets is Practical
Poisoning Web-Scale Training Datasets is Practical
Nicholas Carlini
Matthew Jagielski
Christopher A. Choquette-Choo
Daniel Paleka
Will Pearce
Hyrum S. Anderson
Andreas Terzis
Kurt Thomas
Florian Tramèr
SILM
36
182
0
20 Feb 2023
Audio-Visual Contrastive Learning with Temporal Self-Supervision
Audio-Visual Contrastive Learning with Temporal Self-Supervision
Simon Jenni
Alexander Black
John Collomosse
SSL
31
15
0
15 Feb 2023
Balanced Audiovisual Dataset for Imbalance Analysis
Balanced Audiovisual Dataset for Imbalance Analysis
Wenke Xia
Xu Zhao
Xincheng Pang
Changqing Zhang
Di Hu
41
1
0
14 Feb 2023
Revisiting Pre-training in Audio-Visual Learning
Revisiting Pre-training in Audio-Visual Learning
Ruoxuan Feng
Wenke Xia
Di Hu
39
1
0
07 Feb 2023
Epic-Sounds: A Large-scale Dataset of Actions That Sound
Epic-Sounds: A Large-scale Dataset of Actions That Sound
Jaesung Huh
Jacob Chalk
Evangelos Kazakos
Dima Damen
Andrew Zisserman
EgoV
34
41
0
01 Feb 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection
LoCoNet: Long-Short Context Network for Active Speaker Detection
Xizi Wang
Feng Cheng
Gedas Bertasius
David J. Crandall
26
15
0
19 Jan 2023
Tackling the Cocktail Fork Problem for Separation and Transcription of
  Real-World Soundtracks
Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks
Darius Petermann
Gordon Wichern
Aswin Shanmugam Subramanian
Zhong-Qiu Wang
Jonathan Le Roux
27
10
0
14 Dec 2022
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled
  Videos
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Hao-Wen Dong
Naoya Takahashi
Yuki Mitsufuji
Julian McAuley
Taylor Berg-Kirkpatrick
VLM
CLIP
31
25
0
14 Dec 2022
Audiovisual Masked Autoencoders
Audiovisual Masked Autoencoders
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
44
43
0
09 Dec 2022
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
VLM
50
0
0
05 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Mix and Localize: Localizing Sound Sources in Mixtures
Xixi Hu
Ziyang Chen
Andrew Owens
35
51
0
28 Nov 2022
Self-supervised remote sensing feature learning: Learning Paradigms,
  Challenges, and Future Works
Self-supervised remote sensing feature learning: Learning Paradigms, Challenges, and Future Works
Chao Tao
Ji Qi
Mingning Guo
Qing Zhu
Haifeng Li
SSL
34
56
0
15 Nov 2022
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion
  and Keyword-to-Caption Augmentation
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
Yusong Wu
K. Chen
Tianyu Zhang
Yuchen Hui
Marianna Nezhurina
Taylor Berg-Kirkpatrick
Shlomo Dubnov
CLIP
39
490
0
12 Nov 2022
I Hear Your True Colors: Image Guided Audio Generation
I Hear Your True Colors: Image Guided Audio Generation
Roy Sheffer
Yossi Adi
VLM
18
74
0
06 Nov 2022
MarginNCE: Robust Sound Localization with a Negative Margin
MarginNCE: Robust Sound Localization with a Negative Margin
Sooyoung Park
Arda Senocak
Joon Son Chung
SSL
27
13
0
03 Nov 2022
AVES: Animal Vocalization Encoder based on Self-Supervision
AVES: Animal Vocalization Encoder based on Self-Supervision
Masato Hagiwara
CLIP
VLM
AI4TS
19
24
0
26 Oct 2022
Play It Back: Iterative Attention for Audio Recognition
Play It Back: Iterative Attention for Audio Recognition
Alexandros Stergiou
Dima Damen
42
4
0
20 Oct 2022
VTC: Improving Video-Text Retrieval with User Comments
VTC: Improving Video-Text Retrieval with User Comments
Laura Hanu
James Thewlis
Yuki M. Asano
Christian Rupprecht
VGen
41
7
0
19 Oct 2022
Learning Temporal Resolution in Spectrogram for Audio Classification
Learning Temporal Resolution in Spectrogram for Audio Classification
Haohe Liu
Xubo Liu
Qiuqiang Kong
Wenwu Wang
Mark D. Plumbley
34
7
0
04 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
42
120
0
02 Oct 2022
AudioGen: Textually Guided Audio Generation
AudioGen: Textually Guided Audio Generation
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
DiffM
27
290
0
30 Sep 2022
The Efficacy of Self-Supervised Speech Models for Audio Representations
The Efficacy of Self-Supervised Speech Models for Audio Representations
Tung-Yu Wu
Chen-An Li
Tzu-Han Lin
Tsung-Yuan Hsu
Hung-yi Lee
37
5
0
26 Sep 2022
Language-based Audio Retrieval Task in DCASE 2022 Challenge
Huang Xie
Samuel Lipping
Tuomas Virtanen
70
18
0
20 Sep 2022
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
Shentong Mo
Pedro Morgado
85
64
0
30 Aug 2022
Robust Sound-Guided Image Manipulation
Robust Sound-Guided Image Manipulation
Seung Hyun Lee
Gyeongrok Oh
Wonmin Byeon
Sang Ho Yoon
Jinkyu Kim
Sangpil Kim
DiffM
26
7
0
30 Aug 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
46
55
0
20 Aug 2022
UAVM: Towards Unifying Audio and Visual Models
UAVM: Towards Unifying Audio and Visual Models
Yuan Gong
Alexander H. Liu
Andrew Rouditchenko
James R. Glass
33
21
0
29 Jul 2022
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Grant Van Horn
Rui Qian
Kimberly Wilber
Hartwig Adam
Oisin Mac Aodha
Serge Belongie
31
10
0
21 Jul 2022
Previous
123
Next