ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.14368
  4. Cited By
VGGSound: A Large-scale Audio-Visual Dataset

VGGSound: A Large-scale Audio-Visual Dataset

29 April 2020
Honglie Chen
Weidi Xie
Andrea Vedaldi
Andrew Zisserman
ArXivPDFHTML

Papers citing "VGGSound: A Large-scale Audio-Visual Dataset"

38 / 138 papers shown
Title
Audio-Visual Segmentation
Audio-Visual Segmentation
Jinxing Zhou
Jianyuan Wang
Jingyang Zhang
Weixuan Sun
Jing Zhang
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
33
109
0
11 Jul 2022
A Comprehensive Survey on Video Saliency Detection with Auditory
  Information: the Audio-visual Consistency Perceptual is the Key!
A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!
Chenglizhao Chen
Mengke Song
Wenfeng Song
Li Guo
Muwei Jian
40
26
0
20 Jun 2022
Self-Supervised Learning for Videos: A Survey
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Yogesh S Rawat
M. Shah
SSL
43
132
0
18 Jun 2022
Learning Representations for New Sound Classes With Continual
  Self-Supervised Learning
Learning Representations for New Sound Classes With Continual Self-Supervised Learning
Zhepei Wang
Cem Subakan
Xilin Jiang
Junkai Wu
Efthymios Tzinis
Mirco Ravanelli
Paris Smaragdis
CLL
SSL
69
19
0
15 May 2022
On Negative Sampling for Audio-Visual Contrastive Learning from Movies
On Negative Sampling for Audio-Visual Contrastive Learning from Movies
Mahdi M. Kalayeh
Shervin Ardeshir
Lingyi Liu
Nagendra Kamath
Ashok Chandrashekar
SSL
35
3
0
29 Apr 2022
Sound-Guided Semantic Video Generation
Sound-Guided Semantic Video Generation
Seung Hyun Lee
Gyeongrok Oh
Wonmin Byeon
Chanyoung Kim
Wonjae Ryoo
Sang Ho Yoon
Hyunjun Cho
Jihyun Bae
Jinkyu Kim
Sangpil Kim
VGen
38
26
0
20 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
54
39
0
06 Apr 2022
Training-Free Robust Multimodal Learning via Sample-Wise Jacobian
  Regularization
Training-Free Robust Multimodal Learning via Sample-Wise Jacobian Regularization
Zhengqi Gao
Sucheng Ren
Zihui Xue
Siting Li
Hang Zhao
26
3
0
05 Apr 2022
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
...
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
ReLM
LRM
66
574
0
01 Apr 2022
Localizing Visual Sounds the Easy Way
Localizing Visual Sounds the Easy Way
Shentong Mo
Pedro Morgado
35
78
0
17 Mar 2022
Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs
Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs
Jingfei Xia
Mingchen Zhuge
Tiantian Geng
Shun Fan
Yuantai Wei
Zhenyu He
Feng Zheng
23
14
0
08 Mar 2022
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition
  on Modality-Specific Annotated Videos
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos
Saghir Alfasly
Jian Lu
C. Xu
Yuru Zou
42
18
0
06 Mar 2022
ADIMA: Abuse Detection In Multilingual Audio
ADIMA: Abuse Detection In Multilingual Audio
Vikram Gupta
Rini A. Sharon
Ramit Sawhney
Debdoot Mukherjee
23
20
0
16 Feb 2022
Visual Sound Localization in the Wild by Cross-Modal Interference
  Erasing
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing
Xian Liu
Rui Qian
Hang Zhou
Di Hu
Weiyao Lin
Ziwei Liu
Bolei Zhou
Xiaowei Zhou
18
25
0
13 Feb 2022
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Di Hu
Yake Wei
Rui Qian
Weiyao Lin
Ruihua Song
Ji-Rong Wen
24
41
0
22 Dec 2021
Computational bioacoustics with deep learning: a review and roadmap
Computational bioacoustics with deep learning: a review and roadmap
D. Stowell
32
235
0
13 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David Harwath
James R. Glass
Hilde Kuehne
ViT
34
129
0
08 Dec 2021
Sound-Guided Semantic Image Manipulation
Sound-Guided Semantic Image Manipulation
Seung Hyun Lee
Wonseok Roh
Wonmin Byeon
Sang Ho Yoon
Chanyoung Kim
Jinkyu Kim
Sangpil Kim
DiffM
37
43
0
30 Nov 2021
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
35
73
0
25 Nov 2021
Sparse Fusion for Multimodal Transformers
Sparse Fusion for Multimodal Transformers
Yi Ding
Alex Rich
Mason Wang
Noah Stier
M. Turk
P. Sen
Tobias Höllerer
ViT
27
7
0
23 Nov 2021
Wav2CLIP: Learning Robust Audio Representations From CLIP
Wav2CLIP: Learning Robust Audio Representations From CLIP
Ho-Hsiang Wu
Prem Seetharaman
Kundan Kumar
J. P. Bello
CLIP
VLM
48
268
0
21 Oct 2021
Taming Visually Guided Sound Generation
Taming Visually Guided Sound Generation
Vladimir E. Iashin
Esa Rahtu
VLM
32
122
0
17 Oct 2021
Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Ian Palmer
Andrew Rouditchenko
Andrei Barbu
Boris Katz
James R. Glass
11
4
0
14 Oct 2021
Embed Everything: A Method for Efficiently Co-Embedding Multi-Modal
  Spaces
Embed Everything: A Method for Efficiently Co-Embedding Multi-Modal Spaces
Sarah Di
Robin Yu
Amol Kapoor
21
0
0
09 Oct 2021
VPN: Video Provenance Network for Robust Content Attribution
VPN: Video Provenance Network for Robust Content Attribution
Alexander Black
Tu Bui
Simon Jenni
Vishy Swaminathan
John Collomosse
36
9
0
21 Sep 2021
ARCA23K: An audio dataset for investigating open-set label noise
ARCA23K: An audio dataset for investigating open-set label noise
Turab Iqbal
Yin Cao
A. Bailey
Mark D. Plumbley
Wenwu Wang
31
4
0
19 Sep 2021
Learning to Cut by Watching Movies
Learning to Cut by Watching Movies
Alejandro Pardo
Fabian Caba Heilbron
Juan Carlos León Alcázar
Ali K. Thabet
Guohao Li
VGen
58
20
0
09 Aug 2021
Attention Bottlenecks for Multimodal Fusion
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
48
544
0
30 Jun 2021
Self-supervised Video Representation Learning with Cross-Stream
  Prototypical Contrasting
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
Martine Toering
Ioannis Gatopoulos
M. Stol
Vincent Tao Hu
SSL
40
11
0
18 Jun 2021
Representation Learning via Global Temporal Alignment and
  Cycle-Consistency
Representation Learning via Global Temporal Alignment and Cycle-Consistency
Isma Hadji
Konstantinos G. Derpanis
Allan D. Jepson
AI4TS
35
54
0
11 May 2021
Audio Retrieval with Natural Language Queries
Audio Retrieval with Natural Language Queries
Andreea-Maria Oncescu
A. Sophia Koepke
João F. Henriques
Zeynep Akata
Samuel Albanie
21
77
0
05 May 2021
Joint Representation Learning and Novel Category Discovery on Single-
  and Multi-modal Data
Joint Representation Learning and Novel Category Discovery on Single- and Multi-modal Data
Xu Jia
Kai Han
Yukun Zhu
Bradley Green
159
57
0
26 Apr 2021
Slow-Fast Auditory Streams For Audio Recognition
Slow-Fast Auditory Streams For Audio Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
26
66
0
05 Mar 2021
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual
  Video Representation Learning
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning
Sangho Lee
Jiwan Chung
Youngjae Yu
Gunhee Kim
Thomas Breuel
Gal Chechik
Yale Song
71
45
0
26 Jan 2021
Visual Speech Enhancement Without A Real Visual Stream
Visual Speech Enhancement Without A Real Visual Stream
Sindhu B. Hegde
Prajwal K R
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
DiffM
20
17
0
20 Dec 2020
VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge
VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge
Arsha Nagrani
Joon Son Chung
Jaesung Huh
Andrew Brown
Ernesto Coto
Weidi Xie
Mitchell McLaren
D. Reynolds
Andrew Zisserman
21
74
0
12 Dec 2020
FSD50K: An Open Dataset of Human-Labeled Sound Events
FSD50K: An Open Dataset of Human-Labeled Sound Events
Eduardo Fonseca
Xavier Favory
Jordi Pons
F. Font
Xavier Serra
26
438
0
01 Oct 2020
VoxCeleb2: Deep Speaker Recognition
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
266
2,242
0
14 Jun 2018
Previous
123