ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.06170
  4. Cited By
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency
  Prediction
v1v2v3 (latest)

ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction

IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2020
11 December 2020
Samyak Jain
P. Yarlagadda
Shreyank Jyoti
Shyamgopal Karthik
Subramanian Ramanathan
Vineet Gandhi
    ViT
ArXiv (abs)PDFHTMLGithub (67★)

Papers citing "ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction"

23 / 23 papers shown
Simplifying Knowledge Transfer in Pretrained Models
Simplifying Knowledge Transfer in Pretrained Models
Siddharth Jain
Shyamgopal Karthik
Vineet Gandhi
182
0
0
25 Oct 2025
The ISLab Solution to the Algonauts Challenge 2025: A Multimodal Deep Learning Approach to Brain Response Prediction
The ISLab Solution to the Algonauts Challenge 2025: A Multimodal Deep Learning Approach to Brain Response Prediction
Andrea Corsico
Giorgia Rigamonti
Simone Zini
Simone Bianco
Paolo Napoletano
70
0
0
25 Jul 2025
DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction
DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction
Kiana Hoshanfar
Alireza Hosseini
Ahmad Kalhor
Babak N. Araabi
1.0K
1
0
14 Apr 2025
Minimalistic Video Saliency Prediction via Efficient Decoder & Spatio Temporal Action Cues
Minimalistic Video Saliency Prediction via Efficient Decoder & Spatio Temporal Action CuesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Rohit Girmaji
Siddharth Jain
Bhav Beri
Sarthak Bansal
Vineet Gandhi
ViT
234
6
0
01 Feb 2025
Relevance-guided Audio Visual Fusion for Video Saliency Prediction
Li Yu
Xuanzhe Sun
Pan Gao
Moncef Gabbouj
363
2
0
18 Nov 2024
AV-PedAware: Self-Supervised Audio-Visual Fusion for Dynamic Pedestrian Awareness
AV-PedAware: Self-Supervised Audio-Visual Fusion for Dynamic Pedestrian AwarenessIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Yizhuo Yang
Shenghai Yuan
Muqing Cao
Jianfei Yang
Lihua Xie
563
15
0
11 Nov 2024
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with DiffusionAAAI Conference on Artificial Intelligence (AAAI), 2024
Yunlong Tang
Gen Zhan
Li Yang
Yiting Liao
Chenliang Xu
VGenDiffMLRM
507
15
0
21 Aug 2024
Saliency Detection in Educational Videos: Analyzing the Performance of
  Current Models, Identifying Limitations and Advancement Directions
Saliency Detection in Educational Videos: Analyzing the Performance of Current Models, Identifying Limitations and Advancement DirectionsInternational Conference on Information and Knowledge Management (CIKM), 2024
Evelyn Navarrete
Ralph Ewerth
Anett Hoppe
164
3
0
08 Aug 2024
Unified Dynamic Scanpath Predictors Outperform Individually Trained Neural Models
Unified Dynamic Scanpath Predictors Outperform Individually Trained Neural Models
Fares Abawi
Di Fu
Stefan Wermter
351
1
0
05 May 2024
SalFoM: Dynamic Saliency Prediction with Video Foundation Models
SalFoM: Dynamic Saliency Prediction with Video Foundation ModelsInternational Conference on Pattern Recognition (ICPR), 2024
Morteza Moradi
Mohammad Moradi
Francesco Rundo
C. Spampinato
Ali Borji
S. Palazzo
250
4
0
03 Apr 2024
DiffSal: Joint Audio and Video Learning for Diffusion Saliency
  Prediction
DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction
Jun Xiong
Peng Zhang
Tao You
Chuanyue Li
Wei Huang
Yufei Zha
DiffM
243
16
0
02 Mar 2024
Transformer-based Video Saliency Prediction with High Temporal Dimension
  Decoding
Transformer-based Video Saliency Prediction with High Temporal Dimension Decoding
Morteza Moradi
S. Palazzo
C. Spampinato
222
8
0
15 Jan 2024
UniST: Towards Unifying Saliency Transformer for Video Saliency
  Prediction and Detection
UniST: Towards Unifying Saliency Transformer for Video Saliency Prediction and Detection
Jun Xiong
Peng Zhang
Chuanyue Li
Wei Huang
Yufei Zha
Tao You
ViT
181
3
0
15 Sep 2023
NPF-200: A Multi-Modal Eye Fixation Dataset and Method for
  Non-Photorealistic Videos
NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic VideosACM Multimedia (ACM MM), 2023
Ziyuan Yang
Sucheng Ren
Zongwei Wu
Nanxuan Zhao
Junle Wang
Jing Qin
Shengfeng He
228
3
0
23 Aug 2023
Gated Driver Attention Predictor
Gated Driver Attention Predictor
Tianci Zhao
Xue Bai
Jianwu Fang
Jianru Xue
244
4
0
01 Aug 2023
TinyHD: Efficient Video Saliency Prediction with Heterogeneous Decoders
  using Hierarchical Maps Distillation
TinyHD: Efficient Video Saliency Prediction with Heterogeneous Decoders using Hierarchical Maps DistillationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Feiyan Hu
S. Palazzo
Federica Proietto Salanitri
Giovanni Bellitto
Morteza Moradi
C. Spampinato
Kevin McGuinness
242
17
0
11 Jan 2023
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
323
73
0
20 Aug 2022
A Comprehensive Survey on Video Saliency Detection with Auditory
  Information: the Audio-visual Consistency Perceptual is the Key!
A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!
Chenglizhao Chen
Mengke Song
Wenfeng Song
Li Guo
Muwei Jian
253
38
0
20 Jun 2022
Joint Learning of Visual-Audio Saliency Prediction and Sound Source
  Localization on Multi-face Videos
Joint Learning of Visual-Audio Saliency Prediction and Sound Source Localization on Multi-face Videos
Minglang Qiao
Yufan Liu
Mai Xu
Xin Deng
Bing Li
Weiming Hu
Ali Borji
CVBM
162
5
0
05 Nov 2021
A trained humanoid robot can perform human-like crossmodal social
  attention and conflict resolution
A trained humanoid robot can perform human-like crossmodal social attention and conflict resolutionInternational Journal of Social Robotics (JSR), 2021
Di Fu
Fares Abawi
Hugo C. C. Carneiro
Matthias Kerzel
Ziwei Chen
Erik Strahl
Xun Liu
S. Wermter
482
10
0
02 Nov 2021
Spatio-Temporal Self-Attention Network for Video Saliency Prediction
Spatio-Temporal Self-Attention Network for Video Saliency PredictionIEEE transactions on multimedia (IEEE Trans. Multimedia), 2021
Ziqiang Wang
Zhi Liu
Gongyang Li
Yang Wang
Tianhong Zhang
Lihua Xu
Jijun Wang
3DPC
410
63
0
24 Aug 2021
Temporal-Spatial Feature Pyramid for Video Saliency Detection
Temporal-Spatial Feature Pyramid for Video Saliency Detection
Qinyao Chang
Shiping Zhu
245
35
0
10 May 2021
Noise-Aware Video Saliency Prediction
Noise-Aware Video Saliency PredictionBritish Machine Vision Conference (BMVC), 2021
Ekta Prashnani
Orazio Gallo
Joohwan Kim
Josef Spjut
P. Sen
I. Frosio
186
1
0
16 Apr 2021
1
Page 1 of 1