ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.08303
  4. Cited By
Multimodal Variational Auto-encoder based Audio-Visual Segmentation

Multimodal Variational Auto-encoder based Audio-Visual Segmentation

12 October 2023
Yuxin Mao
Jing Zhang
Mochu Xiang
Yiran Zhong
Yuchao Dai
ArXivPDFHTML

Papers citing "Multimodal Variational Auto-encoder based Audio-Visual Segmentation"

24 / 24 papers shown
Title
Audio and Multiscale Visual Cues Driven Cross-modal Transformer for Idling Vehicle Detection
Audio and Multiscale Visual Cues Driven Cross-modal Transformer for Idling Vehicle Detection
Xiwen Li
Ross T. Whitaker
Tolga Tasdizen
22
0
0
15 Apr 2025
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
Chen Liu
Peike Li
Liying Yang
Dadong Wang
Lincheng Li
Xin Yu
VOS
57
0
0
17 Mar 2025
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
Chen Liu
Liying Yang
Peike Li
Dadong Wang
Lincheng Li
Xin Yu
VOS
91
0
0
17 Mar 2025
Towards Open-Vocabulary Audio-Visual Event Localization
Jinxing Zhou
D. Guo
Ruohao Guo
Yuxin Mao
Jingjing Hu
Yiran Zhong
Xiaojun Chang
M. Wang
VLM
46
3
0
18 Nov 2024
Detecting Misinformation in Multimedia Content through Cross-Modal
  Entity Consistency: A Dual Learning Approach
Detecting Misinformation in Multimedia Content through Cross-Modal Entity Consistency: A Dual Learning Approach
Zhe Fu
Kanlun Wang
Wangjiaxuan Xin
Lina Zhou
Shi Chen
Yaorong Ge
Daniel Janies
Dongsong Zhang
19
9
0
16 Aug 2024
AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual
  Segmentation
AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual Segmentation
Zili Wang
Qi Yang
Linsu Shi
Jiazhong Yu
M. Tanveer
Fei Li
Shiming Xiang
VOS
14
1
0
03 Aug 2024
Stepping Stones: A Progressive Training Strategy for Audio-Visual
  Semantic Segmentation
Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation
Juncheng Ma
Peiwen Sun
Yaoting Wang
Di Hu
VOS
33
7
0
16 Jul 2024
Label-anticipated Event Disentanglement for Audio-Visual Video Parsing
Label-anticipated Event Disentanglement for Audio-Visual Video Parsing
Jinxing Zhou
Dan Guo
Yuxin Mao
Yiran Zhong
Xiaojun Chang
Meng Wang
23
11
0
11 Jul 2024
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
Yuanhong Chen
Chong Wang
Yuyuan Liu
Hu Wang
Gustavo Carneiro
24
2
0
07 Jul 2024
SAVE: Segment Audio-Visual Easy way using Segment Anything Model
SAVE: Segment Audio-Visual Easy way using Segment Anything Model
Khanh-Binh Nguyen
Chae Jung Park
VLM
VOS
18
1
0
02 Jul 2024
Meerkat: Audio-Visual Large Language Model for Grounding in Space and
  Time
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
Sanjoy Chowdhury
Sayan Nag
Subhrajyoti Dasgupta
Jun Chen
Mohamed Elhoseiny
Ruohan Gao
Dinesh Manocha
VLM
MLLM
27
9
0
01 Jul 2024
Extending Segment Anything Model into Auditory and Temporal Dimensions
  for Audio-Visual Segmentation
Extending Segment Anything Model into Auditory and Temporal Dimensions for Audio-Visual Segmentation
Juhyeong Seon
Woobin Im
Sebin Lee
Jumin Lee
Sung-eui Yoon
20
1
0
10 Jun 2024
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu
Yikun Liu
Fei Zhang
Chen Ju
Ya-Qin Zhang
Yanfeng Wang
23
9
0
17 Mar 2024
EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous
  Driving
EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving
Jiacheng Lin
Jiajun Chen
Kunyu Peng
Xuan He
Zhiyong Li
Rainer Stiefelhagen
Kailun Yang
37
6
0
28 Feb 2024
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence
  Lengths in Large Language Models
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Zhen Qin
Weigao Sun
Dong Li
Xuyang Shen
Weixuan Sun
Yiran Zhong
60
21
0
09 Jan 2024
Object-aware Adaptive-Positivity Learning for Audio-Visual Question
  Answering
Object-aware Adaptive-Positivity Learning for Audio-Visual Question Answering
Zhangbin Li
Dan Guo
Jinxing Zhou
Jing Zhang
Meng Wang
13
11
0
20 Dec 2023
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for
  Audio-Visual Segmentation
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
Qi Yang
Xing Nie
Tong Li
Pengfei Gao
Ying Guo
Cheng Zhen
Pengfei Yan
Shiming Xiang
VOS
15
4
0
11 Dec 2023
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense
  Interactions through Masked Modeling
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Shentong Mo
Pedro Morgado
6
13
0
02 Dec 2023
Audio-Visual Instance Segmentation
Audio-Visual Instance Segmentation
Ruohao Guo
Yaru Chen
Yanyu Qi
Wenzhen Yue
Dantong Niu
...
Wenzhen Yue
Ji Shi
Qixun Wang
Peiliang Zhang
Buwen Liang
VLM
VOS
18
2
0
28 Oct 2023
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Zhaofeng Shi
Qingbo Wu
Fanman Meng
Linfeng Xu
Hongliang Li
VOS
10
3
0
10 Oct 2023
Improving Audio-Visual Segmentation with Bidirectional Generation
Improving Audio-Visual Segmentation with Bidirectional Generation
Dawei Hao
Yuxin Mao
Bowen He
Xiaodong Han
Yuchao Dai
Yiran Zhong
VOS
VGen
20
29
0
16 Aug 2023
Unraveling Instance Associations: A Closer Look for Audio-Visual
  Segmentation
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
Yuanhong Chen
Yuyuan Liu
Hu Wang
Fengbei Liu
Chong Wang
Helen Frazer
G. Carneiro
VOS
6
5
0
06 Apr 2023
Generative Transformer for Accurate and Reliable Salient Object
  Detection
Generative Transformer for Accurate and Reliable Salient Object Detection
Yuxin Mao
Jing Zhang
Zhexiong Wan
Yuchao Dai
Aixuan Li
Yun-Qiu Lv
Xinyu Tian
Deng-Ping Fan
Nick Barnes
ViT
63
30
0
20 Apr 2021
Learning Modality-Specific Representations with Self-Supervised
  Multi-Task Learning for Multimodal Sentiment Analysis
Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis
Wenmeng Yu
Hua Xu
Ziqi Yuan
Jiele Wu
SSL
45
430
0
09 Feb 2021
1