ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.11222
  4. Cited By
v1v2 (latest)

The Sound of Water: Inferring Physical Properties from Pouring Liquids

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
18 November 2024
Piyush Bagad
Makarand Tapaswi
Cees G. M. Snoek
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "The Sound of Water: Inferring Physical Properties from Pouring Liquids"

50 / 57 papers shown
Segmenting Collision Sound Sources in Egocentric Videos
Segmenting Collision Sound Sources in Egocentric Videos
Kranti Parida
Omar Emara
Hazel Doughty
Dima Damen
VOS
265
0
0
17 Nov 2025
PESTO: Real-Time Pitch Estimation with Self-supervised Transposition-equivariant Objective
PESTO: Real-Time Pitch Estimation with Self-supervised Transposition-equivariant ObjectiveTransactions of the International Society for Music Information Retrieval (TISMIR), 2025
Alain Riou
Bernardo Torres
Ben Hayes
Stefan Lattner
Gaëtan Hadjeres
Gaël Richard
Geoffroy Peeters
264
4
0
02 Aug 2025
Learning to See Inside Opaque Liquid Containers using Speckle Vibrometry
Learning to See Inside Opaque Liquid Containers using Speckle Vibrometry
Matan Kichler
Shai Bagon
Mark Sheinin
85
0
0
28 Jul 2025
T-FOLEY: A Controllable Waveform-Domain Diffusion Model for
  Temporal-Event-Guided Foley Sound Synthesis
T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Yoonjin Chung
Junwon Lee
Juhan Nam
170
22
0
17 Jan 2024
Pouring by Feel: An Analysis of Tactile and Proprioceptive Sensing for
  Accurate Pouring
Pouring by Feel: An Analysis of Tactile and Proprioceptive Sensing for Accurate PouringIEEE International Conference on Robotics and Automation (ICRA), 2022
Pedro Piacenza
Daewon Lee
Volkan Isler
188
9
0
27 Oct 2023
PESTO: Pitch Estimation with Self-supervised Transposition-equivariant Objective
PESTO: Pitch Estimation with Self-supervised Transposition-equivariant ObjectiveInternational Society for Music Information Retrieval Conference (ISMIR), 2023
Alain Riou
Stefan Lattner
Gaëtan Hadjeres
Geoffroy Peeters
249
38
0
05 Sep 2023
PourIt!: Weakly-supervised Liquid Perception from a Single Image for
  Visual Closed-Loop Robotic Pouring
PourIt!: Weakly-supervised Liquid Perception from a Single Image for Visual Closed-Loop Robotic PouringIEEE International Conference on Computer Vision (ICCV), 2023
Haitao Lin
Yanwei Fu
Xiangyang Xue
247
11
0
21 Jul 2023
Listen, Think, and Understand
Listen, Think, and UnderstandInternational Conference on Learning Representations (ICLR), 2023
Yuan Gong
Hongyin Luo
Alexander H. Liu
Leonid Karlinsky
James R. Glass
ELMMLLMLRM
682
219
0
18 May 2023
ImageBind: One Embedding Space To Bind Them All
ImageBind: One Embedding Space To Bind Them AllComputer Vision and Pattern Recognition (CVPR), 2023
Rohit Girdhar
Alaaeldin El-Nouby
Zhuang Liu
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
VLM
553
1,303
0
09 May 2023
Conditional Generation of Audio from Video via Foley Analogies
Conditional Generation of Audio from Video via Foley AnalogiesComputer Vision and Pattern Recognition (CVPR), 2023
Yuexi Du
Ziyang Chen
Justin Salamon
Bryan C. Russell
Andrew Owens
VGen
205
60
0
17 Apr 2023
Segment Anything
Segment AnythingIEEE International Conference on Computer Vision (ICCV), 2023
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
...
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLMVLM
960
11,140
0
05 Apr 2023
MAViL: Masked Audio-Video Learners
MAViL: Masked Audio-Video LearnersNeural Information Processing Systems (NeurIPS), 2022
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
322
73
0
15 Dec 2022
Audiovisual Masked Autoencoders
Audiovisual Masked AutoencodersIEEE International Conference on Computer Vision (ICCV), 2022
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
317
56
0
09 Dec 2022
Contrastive Audio-Visual Masked Autoencoder
Contrastive Audio-Visual Masked AutoencoderInternational Conference on Learning Representations (ICLR), 2022
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
395
166
0
02 Oct 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic LearningNeural Information Processing Systems (NeurIPS), 2022
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
314
114
0
16 Jun 2022
Sound Localization by Self-Supervised Time Delay Estimation
Sound Localization by Self-Supervised Time Delay EstimationEuropean Conference on Computer Vision (ECCV), 2022
Ziyang Chen
David Fouhey
Andrew Owens
SSL
254
23
0
26 Apr 2022
Sound-Guided Semantic Video Generation
Sound-Guided Semantic Video GenerationEuropean Conference on Computer Vision (ECCV), 2022
Seung Hyun Lee
Gyeongrok Oh
Wonmin Byeon
Chanyoung Kim
Wonjae Ryoo
Sang Ho Yoon
Hyunjun Cho
Jihyun Bae
Jinkyu Kim
Sangpil Kim
VGen
331
42
0
20 Apr 2022
Self-supervised Transparent Liquid Segmentation for Robotic Pouring
Self-supervised Transparent Liquid Segmentation for Robotic PouringIEEE International Conference on Robotics and Automation (ICRA), 2022
G. Narasimhan
Kai Zhang
Ben Eisner
Xingyu Lin
David Held
182
21
0
03 Mar 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and
  Sound
MERLOT Reserve: Neural Script Knowledge through Vision and Language and SoundComputer Vision and Pattern Recognition (CVPR), 2022
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
499
238
0
07 Jan 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster
  Prediction
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster PredictionInternational Conference on Learning Representations (ICLR), 2022
Bowen Shi
Wei-Ning Hsu
Kushal Lakhotia
Abdel-rahman Mohamed
SSL
365
417
0
05 Jan 2022
Predicting 3D shapes, masks, and properties of materials, liquids, and
  objects inside transparent containers, using the TransProteus CGI dataset
Predicting 3D shapes, masks, and properties of materials, liquids, and objects inside transparent containers, using the TransProteus CGI dataset
S. Eppel
Haoping Xu
Yi Ru Wang
Alán Aspuru-Guzik
3DVDiffM
192
3
0
15 Sep 2021
AudioCLIP: Extending CLIP to Image, Text and Audio
AudioCLIP: Extending CLIP to Image, Text and Audio
A. Guzhov
Federico Raue
Jörn Hees
Andreas Dengel
CLIPVLM
561
479
0
24 Jun 2021
Pouring Dynamics Estimation Using Gated Recurrent Units
Pouring Dynamics Estimation Using Gated Recurrent Units
Qi Zheng
145
2
0
08 May 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersIEEE International Conference on Computer Vision (ICCV), 2021
Mathilde Caron
Hugo Touvron
Ishan Misra
Edouard Grave
Julien Mairal
Piotr Bojanowski
Armand Joulin
2.0K
7,910
0
29 Apr 2021
Multimodal Clustering Networks for Self-supervised Learning from
  Unlabeled Videos
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled VideosIEEE International Conference on Computer Vision (ICCV), 2021
Brian Chen
Andrew Rouditchenko
Kevin Duarte
Hilde Kuehne
Samuel Thomas
...
Rogerio Feris
David Harwath
James R. Glass
M. Picheny
Shih-Fu Chang
SSL
429
96
0
26 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and TextNeural Information Processing Systems (NeurIPS), 2021
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Huayu Chen
Boqing Gong
ViT
730
679
0
22 Apr 2021
Localizing Visual Sounds the Hard Way
Localizing Visual Sounds the Hard WayComputer Vision and Pattern Recognition (CVPR), 2021
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
ObjD
210
225
0
06 Apr 2021
Robot Gaining Accurate Pouring Skills through Self-Supervised Learning
  and Generalization
Robot Gaining Accurate Pouring Skills through Self-Supervised Learning and Generalization
Yongqiang Huang
Juan Wilches
Yu Sun
SSL
231
28
0
19 Nov 2020
Labelling unlabelled videos from scratch with multi-modal
  self-supervision
Labelling unlabelled videos from scratch with multi-modal self-supervisionNeural Information Processing Systems (NeurIPS), 2020
Yuki M. Asano
Mandela Patrick
Christian Rupprecht
Andrea Vedaldi
SSL
261
161
0
24 Jun 2020
Rescaling Egocentric Vision
Rescaling Egocentric VisionInternational Journal of Computer Vision (IJCV), 2020
Dima Damen
Hazel Doughty
G. Farinella
Antonino Furnari
Evangelos Kazakos
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
506
583
0
23 Jun 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
  Representations
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
2.4K
7,387
0
20 Jun 2020
VGGSound: A Large-scale Audio-Visual Dataset
VGGSound: A Large-scale Audio-Visual DatasetIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Honglie Chen
Weidi Xie
Andrea Vedaldi
Andrew Zisserman
282
758
0
29 Apr 2020
Audio-Visual Instance Discrimination with Cross-Modal Agreement
Audio-Visual Instance Discrimination with Cross-Modal AgreementComputer Vision and Pattern Recognition (CVPR), 2020
Pedro Morgado
Nuno Vasconcelos
Ishan Misra
SSL
317
296
0
27 Apr 2020
Robust Robotic Pouring using Audition and Haptics
Robust Robotic Pouring using Audition and HapticsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2020
Hongzhuo Liang
Chuangchuang Zhou
Shuang Li
Xiaojian Ma
Norman Hendrich
Timo Gerkmann
F. Sun
Marcus Stoffel
Jianwei Zhang
380
22
0
29 Feb 2020
DDSP: Differentiable Digital Signal Processing
DDSP: Differentiable Digital Signal ProcessingInternational Conference on Learning Representations (ICLR), 2020
Jesse Engel
Lamtharn Hantrakul
Chenjie Gu
Adam Roberts
DiffM
349
435
0
14 Jan 2020
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Self-Supervised Learning by Cross-Modal Audio-Video ClusteringNeural Information Processing Systems (NeurIPS), 2019
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
SSL
495
461
0
28 Nov 2019
Making Sense of Audio Vibration for Liquid Height Estimation in Robotic
  Pouring
Making Sense of Audio Vibration for Liquid Height Estimation in Robotic PouringIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2019
Hongzhuo Liang
Shuang Li
Xiaojian Ma
Norman Hendrich
Timo Gerkmann
Gang Hua
Jianwei Zhang
221
45
0
02 Mar 2019
Deep Audio-Visual Speech Recognition
Deep Audio-Visual Speech Recognition
Triantafyllos Afouras
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
364
820
0
06 Sep 2018
Liquid Pouring Monitoring via Rich Sensory Inputs
Liquid Pouring Monitoring via Rich Sensory Inputs
Tz-Ying Wu
Juan-Ting Lin
Tsun-Hsuan Wang
Chan-Wei Hu
Juan Carlos Niebles
Min Sun
146
5
0
06 Aug 2018
Cooperative Learning of Audio and Video Models from Self-Supervised
  Synchronization
Cooperative Learning of Audio and Video Models from Self-Supervised SynchronizationNeural Information Processing Systems (NeurIPS), 2018
Bruno Korbar
Du Tran
Lorenzo Torresani
372
501
0
30 Jun 2018
Deep Lip Reading: a comparison of models and an online application
Deep Lip Reading: a comparison of models and an online application
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
168
134
0
15 Jun 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
593
796
0
10 Apr 2018
CREPE: A Convolutional Representation for Pitch Estimation
CREPE: A Convolutional Representation for Pitch Estimation
Jong Wook Kim
Justin Salamon
P. Li
J. P. Bello
260
435
0
17 Feb 2018
Objects that Sound
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjDVOS
338
554
0
18 Dec 2017
Attention Is All You Need
Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
4.2K
161,538
0
12 Jun 2017
Learning to Pour
Learning to Pour
Yongqiang Huang
Yu Sun
134
18
0
25 May 2017
Look, Listen and Learn
Look, Listen and Learn
Relja Arandjelović
Andrew Zisserman
SSL
374
982
0
23 May 2017
Time-Contrastive Networks: Self-Supervised Learning from Video
Time-Contrastive Networks: Self-Supervised Learning from Video
P. Sermanet
Corey Lynch
Yevgen Chebotar
Jasmine Hsu
Eric Jang
S. Schaal
Sergey Levine
SSL
432
896
0
23 Apr 2017
Perceiving and Reasoning About Liquids Using Fully Convolutional
  Networks
Perceiving and Reasoning About Liquids Using Fully Convolutional Networks
Connor Schenck
Dieter Fox
262
35
0
05 Mar 2017
See the Glass Half Full: Reasoning about Liquid Containers, their Volume
  and Content
See the Glass Half Full: Reasoning about Liquid Containers, their Volume and ContentIEEE International Conference on Computer Vision (ICCV), 2017
Roozbeh Mottaghi
Connor Schenck
Dieter Fox
Ali Farhadi
182
49
0
10 Jan 2017
12
Next