ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.00007
  4. Cited By
Sound-Guided Semantic Image Manipulation

Sound-Guided Semantic Image Manipulation

30 November 2021
Seung Hyun Lee
Wonseok Roh
Wonmin Byeon
Sang Ho Yoon
Chanyoung Kim
Jinkyu Kim
Sangpil Kim
    DiffM
ArXiv (abs)PDFHTML

Papers citing "Sound-Guided Semantic Image Manipulation"

37 / 37 papers shown
Audio-Guided Visual Editing with Complex Multi-Modal Prompts
Audio-Guided Visual Editing with Complex Multi-Modal Prompts
Hyeonyu Kim
Seokhoon Jeong
Seonghee Han
Chanhyuk Choi
Taehwan Kim
DiffM
174
0
0
28 Aug 2025
DualResolution Residual Architecture with Artifact Suppression for Melanocytic Lesion Segmentation
DualResolution Residual Architecture with Artifact Suppression for Melanocytic Lesion Segmentation
Vikram Singh
Kabir Malhotra
Rohan Desai
Ananya Shankaracharya
Priyadarshini Chatterjee
Krishnan Menon Iyer
MedIm
392
0
0
09 Aug 2025
VesselRW: Weakly Supervised Subcutaneous Vessel Segmentation via Learned Random Walk Propagation
VesselRW: Weakly Supervised Subcutaneous Vessel Segmentation via Learned Random Walk Propagation
Ayaan Nooruddin Siddiqui
Mahnoor Zaidi
Ayesha Nazneen Shahbaz
Priyadarshini Chatterjee
Krishnan Menon Iyer
309
0
0
09 Aug 2025
Edge Detection for Organ Boundaries via Top Down Refinement and SubPixel Upsampling
Edge Detection for Organ Boundaries via Top Down Refinement and SubPixel Upsampling
Aarav Mehta
Priya Deshmukh
Vikram Singh
Siddharth Malhotra
Krishnan Menon Iyer
Tanvi Iyer
MedIm
345
0
0
09 Aug 2025
Deeply Dual Supervised learning for melanoma recognition
Deeply Dual Supervised learning for melanoma recognition
Rujosh Polma
Krishnan Menon Iyer
278
0
0
04 Aug 2025
ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization
ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization
Huilai Li
Yonghao Dang
Ying Xing
Yiming Wang
Jianqin Yin
235
0
0
14 Jul 2025
MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment
MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment
Hao Zhou
Xiaobao Guo
Yuzhe Zhu
A. Kong
DiffM
530
2
0
13 Mar 2025
Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation
Language-Guided Joint Audio-Visual Editing via One-Shot AdaptationAsian Conference on Computer Vision (ACCV), 2024
Susan Liang
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
DiffM
421
20
0
09 Oct 2024
Self-Supervised Audio-Visual Soundscape Stylization
Self-Supervised Audio-Visual Soundscape StylizationEuropean Conference on Computer Vision (ECCV), 2024
Tingle Li
Renhao Wang
Po-Yao Huang
Andrew Owens
Gopala Anumanchipalli
DiffMSSL
381
8
0
22 Sep 2024
EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos
EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos
Aashish Rai
Srinath Sridhar
DiffM
213
6
0
30 Jul 2024
NeuroBind: Towards Unified Multimodal Representations for Neural Signals
NeuroBind: Towards Unified Multimodal Representations for Neural Signals
Fengyu Yang
Chao Feng
Daniel Wang
Tianye Wang
Ziyao Zeng
...
Hyoungseob Park
Pengliang Ji
Han Zhao
Yuanning Li
Alex Wong
331
14
0
19 Jul 2024
Espresso: Robust Concept Filtering in Text-to-Image Models
Espresso: Robust Concept Filtering in Text-to-Image Models
Anudeep Das
Vasisht Duddu
Rui Zhang
Nadarajah Asokan
EGVM
614
14
0
30 Apr 2024
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu
Yikun Liu
Fei Zhang
Chen Ju
Ya Zhang
Yanfeng Wang
387
29
0
17 Mar 2024
Binding Touch to Everything: Learning Unified Multimodal Tactile
  Representations
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
Fengyu Yang
Chao Feng
Ziyang Chen
Hyoungseob Park
Daniel Wang
...
Ziyao Zeng
Xien Chen
Rit Gangopadhyay
Andrew Owens
Alex Wong
327
124
0
31 Jan 2024
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Cross-modal Cognitive Consensus guided Audio-Visual SegmentationIEEE transactions on multimedia (IEEE TMM), 2023
Zhaofeng Shi
Qingbo Wu
Fanman Meng
Linfeng Xu
Hongliang Li
VOS
517
14
0
10 Oct 2023
The Power of Sound (TPoS): Audio Reactive Video Generation with Stable
  Diffusion
The Power of Sound (TPoS): Audio Reactive Video Generation with Stable DiffusionIEEE International Conference on Computer Vision (ICCV), 2023
Yujin Jeong
Won-Wha Ryoo
Seunghyun Lee
Dabin Seo
Wonmin Byeon
Sangpil Kim
Jinkyu Kim
DiffM
200
41
0
08 Sep 2023
Generating Realistic Images from In-the-wild Sounds
Generating Realistic Images from In-the-wild SoundsIEEE International Conference on Computer Vision (ICCV), 2023
Taegyeong Lee
Jeonghun Kang
Hyeonyu Kim
Taehwan Kim
DiffM
360
11
0
05 Sep 2023
Align, Adapt and Inject: Sound-guided Unified Image Generation
Align, Adapt and Inject: Sound-guided Unified Image Generation
Yue Yang
Kaipeng Zhang
Yuying Ge
Wenqi Shao
Zeyue Xue
Yu Qiao
Ping Luo
DiffM
404
9
0
20 Jun 2023
Conditional Generation of Audio from Video via Foley Analogies
Conditional Generation of Audio from Video via Foley AnalogiesComputer Vision and Pattern Recognition (CVPR), 2023
Yuexi Du
Ziyang Chen
Justin Salamon
Bryan C. Russell
Andrew Owens
VGen
251
64
0
17 Apr 2023
Soundini: Sound-Guided Diffusion for Natural Video Editing
Soundini: Sound-Guided Diffusion for Natural Video Editing
Seung Hyun Lee
Si-Yeol Kim
Innfarn Yoo
Feng Yang
Donghyeon Cho
Youngseo Kim
Huiwen Chang
Jinkyu Kim
Sangpil Kim
VGenDiffM
233
22
0
13 Apr 2023
VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs
VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEsIEEE International Conference on Computer Vision (ICCV), 2023
Moayed Haji-Ali
Andrew Bond
Tolga Birdal
Duygu Ceylan
Levent Karacan
Erkut Erdem
Aykut Erdem
VGenDiffM
538
2
0
12 Apr 2023
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
Sound to Visual Scene Generation by Audio-to-Visual Latent AlignmentComputer Vision and Pattern Recognition (CVPR), 2023
Kim Sung-Bin
Arda Senocak
H. Ha
Andrew Owens
Tae-Hyun Oh
DiffMVGen
364
59
0
30 Mar 2023
GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation
GlueGen: Plug and Play Multi-modal Encoders for X-to-image GenerationIEEE International Conference on Computer Vision (ICCV), 2023
Can Qin
Ning Yu
Chen Xing
Shu Zhen Zhang
Zeyuan Chen
Stefano Ermon
Yun Fu
Caiming Xiong
Ran Xu
DiffM
444
27
0
17 Mar 2023
Chat with the Environment: Interactive Multimodal Perception Using Large
  Language Models
Chat with the Environment: Interactive Multimodal Perception Using Large Language ModelsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Xufeng Zhao
Mengdi Li
C. Weber
Muhammad Burhan Hafez
S. Wermter
LLMAGLM&RoLRM
427
74
0
14 Mar 2023
CoralStyleCLIP: Co-optimized Region and Layer Selection for Image
  Editing
CoralStyleCLIP: Co-optimized Region and Layer Selection for Image EditingComputer Vision and Pattern Recognition (CVPR), 2023
Ambareesh Revanur
Debraj Basu
Shradha Agrawal
Dhwanit Agarwal
Deepak Pai
200
9
0
09 Mar 2023
Cross-modal Face- and Voice-style Transfer
Cross-modal Face- and Voice-style Transfer
Naoya Takahashi
M. Singh
Yuki Mitsufuji
CVBM
303
2
0
27 Feb 2023
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled
  Videos
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled VideosInternational Conference on Learning Representations (ICLR), 2022
Hao-Wen Dong
Naoya Takahashi
Yuki Mitsufuji
Julian McAuley
Taylor Berg-Kirkpatrick
VLMCLIP
309
43
0
14 Dec 2022
Fighting Malicious Media Data: A Survey on Tampering Detection and
  Deepfake Detection
Fighting Malicious Media Data: A Survey on Tampering Detection and Deepfake DetectionProceedings of the IEEE (Proc. IEEE), 2022
Junke Wang
Zhenxin Li
Chao Zhang
Yue Yu
Zuxuan Wu
Larry S. Davis
Yueping Jiang
AAML
234
10
0
12 Dec 2022
Touch and Go: Learning from Human-Collected Vision and Touch
Touch and Go: Learning from Human-Collected Vision and TouchNeural Information Processing Systems (NeurIPS), 2022
Fengyu Yang
Chenyang Ma
Jiacheng Zhang
Jing Zhu
Wenzhen Yuan
Andrew Owens
369
105
0
22 Nov 2022
LISA: Localized Image Stylization with Audio via Implicit Neural
  Representation
LISA: Localized Image Stylization with Audio via Implicit Neural Representation
Seung Hyun Lee
Chanyoung Kim
Wonmin Byeon
Sang Ho Yoon
Jinkyu Kim
Sangpil Kim
194
3
0
21 Nov 2022
GAN-based Facial Attribute Manipulation
GAN-based Facial Attribute ManipulationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Yunfan Liu
Qi Li
Qiyao Deng
Zhen Sun
Mingcong Yang
CVBM
307
40
0
23 Oct 2022
Robust Sound-Guided Image Manipulation
Robust Sound-Guided Image ManipulationNeural Networks (NN), 2022
Seung Hyun Lee
Gyeongrok Oh
Wonmin Byeon
Sang Ho Yoon
Jinkyu Kim
Sangpil Kim
DiffM
382
8
0
30 Aug 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
332
76
0
20 Aug 2022
Learning Visual Styles from Audio-Visual Associations
Learning Visual Styles from Audio-Visual AssociationsEuropean Conference on Computer Vision (ECCV), 2022
Tingle Li
Yichen Liu
Andrew Owens
Hang Zhao
DiffM
239
26
0
10 May 2022
Sound-Guided Semantic Video Generation
Sound-Guided Semantic Video GenerationEuropean Conference on Computer Vision (ECCV), 2022
Seung Hyun Lee
Gyeongrok Oh
Wonmin Byeon
Chanyoung Kim
Wonjae Ryoo
Sang Ho Yoon
Hyunjun Cho
Jihyun Bae
Jinkyu Kim
Sangpil Kim
VGen
447
46
0
20 Apr 2022
Audio-to-Image Cross-Modal Generation
Audio-to-Image Cross-Modal GenerationIEEE International Joint Conference on Neural Network (IJCNN), 2021
Maciej Żelaszczyk
Jacek Mańdziuk
DiffM
228
22
0
27 Sep 2021
GAN Inversion: A Survey
GAN Inversion: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Weihao Xia
Yulun Zhang
Yujiu Yang
Jing-Hao Xue
Bolei Zhou
Ming-Hsuan Yang
DiffM
1.1K
617
0
14 Jan 2021
1
Page 1 of 1