ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1610.09001
  4. Cited By
SoundNet: Learning Sound Representations from Unlabeled Video

SoundNet: Learning Sound Representations from Unlabeled Video

27 October 2016
Y. Aytar
Carl Vondrick
Antonio Torralba
    SSL
ArXivPDFHTML

Papers citing "SoundNet: Learning Sound Representations from Unlabeled Video"

20 / 120 papers shown
Title
A Simple Baseline for Audio-Visual Scene-Aware Dialog
A Simple Baseline for Audio-Visual Scene-Aware Dialog
Idan Schwartz
A. Schwing
Tamir Hazan
19
69
0
11 Apr 2019
DistInit: Learning Video Representations Without a Single Labeled Video
DistInit: Learning Video Representations Without a Single Labeled Video
Rohit Girdhar
Du Tran
Lorenzo Torresani
Deva Ramanan
19
54
0
26 Jan 2019
Deep Learning for Human Affect Recognition: Insights and New
  Developments
Deep Learning for Human Affect Recognition: Insights and New Developments
Philipp V. Rouast
M. Adam
R. Chiong
24
167
0
09 Jan 2019
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
Samuel Albanie
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
CVBM
19
270
0
16 Aug 2018
End-to-End Audio Visual Scene-Aware Dialog using Multimodal
  Attention-Based Video Features
End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features
Chiori Hori
Huda AlAmri
Jue Wang
G. Wichern
Takaaki Hori
...
Raphael Gontijo-Lopes
Abhishek Das
Irfan Essa
Dhruv Batra
Devi Parikh
VGen
16
125
0
21 Jun 2018
Weakly-supervised Visual Instrument-playing Action Detection in Videos
Weakly-supervised Visual Instrument-playing Action Detection in Videos
Jen-Yu Liu
Yi-Hsuan Yang
Shyh-Kang Jeng
19
13
0
05 May 2018
Multimodal Emotion Recognition for One-Minute-Gradual Emotion Challenge
Multimodal Emotion Recognition for One-Minute-Gradual Emotion Challenge
Ziqi Zheng
Chenjie Cao
Xingwei Chen
Guoqiang Xu
24
19
0
03 May 2018
Learnable PINs: Cross-Modal Embeddings for Person Identity
Learnable PINs: Cross-Modal Embeddings for Person Identity
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
SSL
13
140
0
02 May 2018
A Bimodal Learning Approach to Assist Multi-sensory Effects
  Synchronization
A Bimodal Learning Approach to Assist Multi-sensory Effects Synchronization
R. Abreu
J. Santos
Eduardo Bezerra
13
8
0
28 Apr 2018
The Sound of Pixels
The Sound of Pixels
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
22
527
0
09 Apr 2018
Audio-Visual Event Localization in Unconstrained Videos
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
14
422
0
23 Mar 2018
Moments in Time Dataset: one million videos for event understanding
Moments in Time Dataset: one million videos for event understanding
Mathew Monfort
A. Andonian
Bolei Zhou
K. Ramakrishnan
Sarah Adel Bargal
...
L. Brown
Quanfu Fan
Dan Gutfreund
Carl Vondrick
A. Oliva
22
538
0
09 Jan 2018
Learning Sight from Sound: Ambient Sound Provides Supervision for Visual
  Learning
Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
Andrew Owens
Jiajun Wu
Josh H. McDermott
William T. Freeman
Antonio Torralba
SSL
22
177
0
20 Dec 2017
Objects that Sound
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjD
VOS
19
528
0
18 Dec 2017
Semantic speech retrieval with a visually grounded model of
  untranscribed speech
Semantic speech retrieval with a visually grounded model of untranscribed speech
Herman Kamper
Gregory Shakhnarovich
Karen Livescu
13
53
0
05 Oct 2017
Audio Super Resolution using Neural Networks
Audio Super Resolution using Neural Networks
Volodymyr Kuleshov
S. Enam
Stefano Ermon
SupR
16
126
0
02 Aug 2017
Comparison of Time-Frequency Representations for Environmental Sound
  Classification using Convolutional Neural Networks
Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks
M. Huzaifah
AI4TS
20
148
0
22 Jun 2017
Multimodal Machine Learning: A Survey and Taxonomy
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
13
2,855
0
26 May 2017
Generating Videos with Scene Dynamics
Generating Videos with Scene Dynamics
Carl Vondrick
Hamed Pirsiavash
Antonio Torralba
GAN
VGen
66
1,460
0
08 Sep 2016
Acoustic Scene Classification
Acoustic Scene Classification
D. Barchiesi
D. Giannoulis
D. Stowell
Mark D. Plumbley
98
405
0
13 Nov 2014
Previous
123