Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2112.00007
Cited By
Sound-Guided Semantic Image Manipulation
30 November 2021
Seung Hyun Lee
Wonseok Roh
Wonmin Byeon
Sang Ho Yoon
Chanyoung Kim
Jinkyu Kim
Sangpil Kim
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Sound-Guided Semantic Image Manipulation"
37 / 37 papers shown
Audio-Guided Visual Editing with Complex Multi-Modal Prompts
Hyeonyu Kim
Seokhoon Jeong
Seonghee Han
Chanhyuk Choi
Taehwan Kim
DiffM
174
0
0
28 Aug 2025
DualResolution Residual Architecture with Artifact Suppression for Melanocytic Lesion Segmentation
Vikram Singh
Kabir Malhotra
Rohan Desai
Ananya Shankaracharya
Priyadarshini Chatterjee
Krishnan Menon Iyer
MedIm
392
0
0
09 Aug 2025
VesselRW: Weakly Supervised Subcutaneous Vessel Segmentation via Learned Random Walk Propagation
Ayaan Nooruddin Siddiqui
Mahnoor Zaidi
Ayesha Nazneen Shahbaz
Priyadarshini Chatterjee
Krishnan Menon Iyer
309
0
0
09 Aug 2025
Edge Detection for Organ Boundaries via Top Down Refinement and SubPixel Upsampling
Aarav Mehta
Priya Deshmukh
Vikram Singh
Siddharth Malhotra
Krishnan Menon Iyer
Tanvi Iyer
MedIm
345
0
0
09 Aug 2025
Deeply Dual Supervised learning for melanoma recognition
Rujosh Polma
Krishnan Menon Iyer
278
0
0
04 Aug 2025
ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization
Huilai Li
Yonghao Dang
Ying Xing
Yiming Wang
Jianqin Yin
235
0
0
14 Jul 2025
MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment
Hao Zhou
Xiaobao Guo
Yuzhe Zhu
A. Kong
DiffM
530
2
0
13 Mar 2025
Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Asian Conference on Computer Vision (ACCV), 2024
Susan Liang
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
DiffM
421
20
0
09 Oct 2024
Self-Supervised Audio-Visual Soundscape Stylization
European Conference on Computer Vision (ECCV), 2024
Tingle Li
Renhao Wang
Po-Yao Huang
Andrew Owens
Gopala Anumanchipalli
DiffM
SSL
381
8
0
22 Sep 2024
EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos
Aashish Rai
Srinath Sridhar
DiffM
213
6
0
30 Jul 2024
NeuroBind: Towards Unified Multimodal Representations for Neural Signals
Fengyu Yang
Chao Feng
Daniel Wang
Tianye Wang
Ziyao Zeng
...
Hyoungseob Park
Pengliang Ji
Han Zhao
Yuanning Li
Alex Wong
331
14
0
19 Jul 2024
Espresso: Robust Concept Filtering in Text-to-Image Models
Anudeep Das
Vasisht Duddu
Rui Zhang
Nadarajah Asokan
EGVM
614
14
0
30 Apr 2024
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu
Yikun Liu
Fei Zhang
Chen Ju
Ya Zhang
Yanfeng Wang
387
29
0
17 Mar 2024
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
Fengyu Yang
Chao Feng
Ziyang Chen
Hyoungseob Park
Daniel Wang
...
Ziyao Zeng
Xien Chen
Rit Gangopadhyay
Andrew Owens
Alex Wong
327
124
0
31 Jan 2024
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
IEEE transactions on multimedia (IEEE TMM), 2023
Zhaofeng Shi
Qingbo Wu
Fanman Meng
Linfeng Xu
Hongliang Li
VOS
517
14
0
10 Oct 2023
The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion
IEEE International Conference on Computer Vision (ICCV), 2023
Yujin Jeong
Won-Wha Ryoo
Seunghyun Lee
Dabin Seo
Wonmin Byeon
Sangpil Kim
Jinkyu Kim
DiffM
200
41
0
08 Sep 2023
Generating Realistic Images from In-the-wild Sounds
IEEE International Conference on Computer Vision (ICCV), 2023
Taegyeong Lee
Jeonghun Kang
Hyeonyu Kim
Taehwan Kim
DiffM
360
11
0
05 Sep 2023
Align, Adapt and Inject: Sound-guided Unified Image Generation
Yue Yang
Kaipeng Zhang
Yuying Ge
Wenqi Shao
Zeyue Xue
Yu Qiao
Ping Luo
DiffM
404
9
0
20 Jun 2023
Conditional Generation of Audio from Video via Foley Analogies
Computer Vision and Pattern Recognition (CVPR), 2023
Yuexi Du
Ziyang Chen
Justin Salamon
Bryan C. Russell
Andrew Owens
VGen
251
64
0
17 Apr 2023
Soundini: Sound-Guided Diffusion for Natural Video Editing
Seung Hyun Lee
Si-Yeol Kim
Innfarn Yoo
Feng Yang
Donghyeon Cho
Youngseo Kim
Huiwen Chang
Jinkyu Kim
Sangpil Kim
VGen
DiffM
233
22
0
13 Apr 2023
VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs
IEEE International Conference on Computer Vision (ICCV), 2023
Moayed Haji-Ali
Andrew Bond
Tolga Birdal
Duygu Ceylan
Levent Karacan
Erkut Erdem
Aykut Erdem
VGen
DiffM
538
2
0
12 Apr 2023
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
Computer Vision and Pattern Recognition (CVPR), 2023
Kim Sung-Bin
Arda Senocak
H. Ha
Andrew Owens
Tae-Hyun Oh
DiffM
VGen
364
59
0
30 Mar 2023
GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation
IEEE International Conference on Computer Vision (ICCV), 2023
Can Qin
Ning Yu
Chen Xing
Shu Zhen Zhang
Zeyuan Chen
Stefano Ermon
Yun Fu
Caiming Xiong
Ran Xu
DiffM
444
27
0
17 Mar 2023
Chat with the Environment: Interactive Multimodal Perception Using Large Language Models
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Xufeng Zhao
Mengdi Li
C. Weber
Muhammad Burhan Hafez
S. Wermter
LLMAG
LM&Ro
LRM
427
74
0
14 Mar 2023
CoralStyleCLIP: Co-optimized Region and Layer Selection for Image Editing
Computer Vision and Pattern Recognition (CVPR), 2023
Ambareesh Revanur
Debraj Basu
Shradha Agrawal
Dhwanit Agarwal
Deepak Pai
200
9
0
09 Mar 2023
Cross-modal Face- and Voice-style Transfer
Naoya Takahashi
M. Singh
Yuki Mitsufuji
CVBM
303
2
0
27 Feb 2023
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
International Conference on Learning Representations (ICLR), 2022
Hao-Wen Dong
Naoya Takahashi
Yuki Mitsufuji
Julian McAuley
Taylor Berg-Kirkpatrick
VLM
CLIP
309
43
0
14 Dec 2022
Fighting Malicious Media Data: A Survey on Tampering Detection and Deepfake Detection
Proceedings of the IEEE (Proc. IEEE), 2022
Junke Wang
Zhenxin Li
Chao Zhang
Yue Yu
Zuxuan Wu
Larry S. Davis
Yueping Jiang
AAML
234
10
0
12 Dec 2022
Touch and Go: Learning from Human-Collected Vision and Touch
Neural Information Processing Systems (NeurIPS), 2022
Fengyu Yang
Chenyang Ma
Jiacheng Zhang
Jing Zhu
Wenzhen Yuan
Andrew Owens
369
105
0
22 Nov 2022
LISA: Localized Image Stylization with Audio via Implicit Neural Representation
Seung Hyun Lee
Chanyoung Kim
Wonmin Byeon
Sang Ho Yoon
Jinkyu Kim
Sangpil Kim
194
3
0
21 Nov 2022
GAN-based Facial Attribute Manipulation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Yunfan Liu
Qi Li
Qiyao Deng
Zhen Sun
Mingcong Yang
CVBM
307
40
0
23 Oct 2022
Robust Sound-Guided Image Manipulation
Neural Networks (NN), 2022
Seung Hyun Lee
Gyeongrok Oh
Wonmin Byeon
Sang Ho Yoon
Jinkyu Kim
Sangpil Kim
DiffM
382
8
0
30 Aug 2022
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
332
76
0
20 Aug 2022
Learning Visual Styles from Audio-Visual Associations
European Conference on Computer Vision (ECCV), 2022
Tingle Li
Yichen Liu
Andrew Owens
Hang Zhao
DiffM
239
26
0
10 May 2022
Sound-Guided Semantic Video Generation
European Conference on Computer Vision (ECCV), 2022
Seung Hyun Lee
Gyeongrok Oh
Wonmin Byeon
Chanyoung Kim
Wonjae Ryoo
Sang Ho Yoon
Hyunjun Cho
Jihyun Bae
Jinkyu Kim
Sangpil Kim
VGen
447
46
0
20 Apr 2022
Audio-to-Image Cross-Modal Generation
IEEE International Joint Conference on Neural Network (IJCNN), 2021
Maciej Żelaszczyk
Jacek Mańdziuk
DiffM
228
22
0
27 Sep 2021
GAN Inversion: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Weihao Xia
Yulun Zhang
Yujiu Yang
Jing-Hao Xue
Bolei Zhou
Ming-Hsuan Yang
DiffM
1.1K
617
0
14 Jan 2021
1
Page 1 of 1