PESTO: Real-Time Pitch Estimation with Self-supervised Transposition-equivariant ObjectiveTransactions of the International Society for Music Information Retrieval (TISMIR), 2025 |
T-FOLEY: A Controllable Waveform-Domain Diffusion Model for
Temporal-Event-Guided Foley Sound SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 |
Pouring by Feel: An Analysis of Tactile and Proprioceptive Sensing for
Accurate PouringIEEE International Conference on Robotics and Automation (ICRA), 2022 |
PESTO: Pitch Estimation with Self-supervised Transposition-equivariant ObjectiveInternational Society for Music Information Retrieval Conference (ISMIR), 2023 |
PourIt!: Weakly-supervised Liquid Perception from a Single Image for
Visual Closed-Loop Robotic PouringIEEE International Conference on Computer Vision (ICCV), 2023 |
Listen, Think, and UnderstandInternational Conference on Learning Representations (ICLR), 2023 |
ImageBind: One Embedding Space To Bind Them AllComputer Vision and Pattern Recognition (CVPR), 2023 |
Conditional Generation of Audio from Video via Foley AnalogiesComputer Vision and Pattern Recognition (CVPR), 2023 |
Segment AnythingIEEE International Conference on Computer Vision (ICCV), 2023 |
MAViL: Masked Audio-Video LearnersNeural Information Processing Systems (NeurIPS), 2022 |
Audiovisual Masked AutoencodersIEEE International Conference on Computer Vision (ICCV), 2022 |
Contrastive Audio-Visual Masked AutoencoderInternational Conference on Learning Representations (ICLR), 2022 |
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic LearningNeural Information Processing Systems (NeurIPS), 2022 |
Sound Localization by Self-Supervised Time Delay EstimationEuropean Conference on Computer Vision (ECCV), 2022 |
Sound-Guided Semantic Video GenerationEuropean Conference on Computer Vision (ECCV), 2022 |
Self-supervised Transparent Liquid Segmentation for Robotic PouringIEEE International Conference on Robotics and Automation (ICRA), 2022 |
MERLOT Reserve: Neural Script Knowledge through Vision and Language and
SoundComputer Vision and Pattern Recognition (CVPR), 2022 |
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster
PredictionInternational Conference on Learning Representations (ICLR), 2022 |
Emerging Properties in Self-Supervised Vision TransformersIEEE International Conference on Computer Vision (ICCV), 2021 |
Multimodal Clustering Networks for Self-supervised Learning from
Unlabeled VideosIEEE International Conference on Computer Vision (ICCV), 2021 |
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
Video, Audio and TextNeural Information Processing Systems (NeurIPS), 2021 |
Localizing Visual Sounds the Hard WayComputer Vision and Pattern Recognition (CVPR), 2021 |
Labelling unlabelled videos from scratch with multi-modal
self-supervisionNeural Information Processing Systems (NeurIPS), 2020 |
Rescaling Egocentric VisionInternational Journal of Computer Vision (IJCV), 2020 |
VGGSound: A Large-scale Audio-Visual DatasetIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020 |
Audio-Visual Instance Discrimination with Cross-Modal AgreementComputer Vision and Pattern Recognition (CVPR), 2020 |
Robust Robotic Pouring using Audition and HapticsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2020 |
DDSP: Differentiable Digital Signal ProcessingInternational Conference on Learning Representations (ICLR), 2020 |
Self-Supervised Learning by Cross-Modal Audio-Video ClusteringNeural Information Processing Systems (NeurIPS), 2019 |
Making Sense of Audio Vibration for Liquid Height Estimation in Robotic
PouringIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2019 |
Cooperative Learning of Audio and Video Models from Self-Supervised
SynchronizationNeural Information Processing Systems (NeurIPS), 2018 |
Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017 |
See the Glass Half Full: Reasoning about Liquid Containers, their Volume
and ContentIEEE International Conference on Computer Vision (ICCV), 2017 |