ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.06170
  4. Cited By
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency
  Prediction

ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction

11 December 2020
Samyak Jain
P. Yarlagadda
Shreyank Jyoti
Shyamgopal Karthik
Subramanian Ramanathan
Vineet Gandhi
    ViT
ArXivPDFHTML

Papers citing "ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction"

11 / 11 papers shown
Title
DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction
DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction
Kiana Hoshanfar
Alireza Hosseini
Ahmad Kalhor
Babak Nadjar Araabi
118
0
0
14 Apr 2025
AV-PedAware: Self-Supervised Audio-Visual Fusion for Dynamic Pedestrian Awareness
AV-PedAware: Self-Supervised Audio-Visual Fusion for Dynamic Pedestrian Awareness
Yizhuo Yang
Shenghai Yuan
Muqing Cao
Jianfei Yang
Lihua Xie
51
7
0
11 Nov 2024
Saliency Detection in Educational Videos: Analyzing the Performance of
  Current Models, Identifying Limitations and Advancement Directions
Saliency Detection in Educational Videos: Analyzing the Performance of Current Models, Identifying Limitations and Advancement Directions
Evelyn Navarrete
Ralph Ewerth
Anett Hoppe
21
0
0
08 Aug 2024
Unified Dynamic Scanpath Predictors Outperform Individually Trained Neural Models
Unified Dynamic Scanpath Predictors Outperform Individually Trained Neural Models
Fares Abawi
Di Fu
Stefan Wermter
30
0
0
05 May 2024
Transformer-based Video Saliency Prediction with High Temporal Dimension
  Decoding
Transformer-based Video Saliency Prediction with High Temporal Dimension Decoding
Morteza Moradi
S. Palazzo
C. Spampinato
24
2
0
15 Jan 2024
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
21
9
0
25 Oct 2023
NPF-200: A Multi-Modal Eye Fixation Dataset and Method for
  Non-Photorealistic Videos
NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos
Ziyuan Yang
Sucheng Ren
Zongwei Wu
Nanxuan Zhao
Junle Wang
Jing Qin
Shengfeng He
30
2
0
23 Aug 2023
A trained humanoid robot can perform human-like crossmodal social
  attention and conflict resolution
A trained humanoid robot can perform human-like crossmodal social attention and conflict resolution
Di Fu
Fares Abawi
Hugo C. C. Carneiro
Matthias Kerzel
Ziwei Chen
Erik Strahl
Xun Liu
S. Wermter
14
6
0
02 Nov 2021
Bringing Generalization to Deep Multi-View Pedestrian Detection
Bringing Generalization to Deep Multi-View Pedestrian Detection
Jeet K. Vora
Swetanjal Dutta
Kanishk Jain
Shyamgopal Karthik
Vineet Gandhi
24
6
0
24 Sep 2021
Spatio-Temporal Self-Attention Network for Video Saliency Prediction
Spatio-Temporal Self-Attention Network for Video Saliency Prediction
Ziqiang Wang
Zhi Liu
Gongyang Li
Yang Wang
Tianhong Zhang
Lihua Xu
Jijun Wang
3DPC
28
44
0
24 Aug 2021
Unified Image and Video Saliency Modeling
Unified Image and Video Saliency Modeling
Richard Droste
Jianbo Jiao
J. A. Noble
53
157
0
11 Mar 2020
1