Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.04210
Cited By
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
8 January 2024
Zhi-Song Liu
Robin Courant
Vicky Kalogeiton
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild"
9 / 9 papers shown
Title
ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models
Hao Yin
Guangzong Si
Zilei Wang
54
0
0
17 Mar 2025
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Guangzhi Sun
Yudong Yang
Jimin Zhuang
Changli Tang
Y. Li
W. Li
Z. Ma
Chao Zhang
LRM
MLLM
VLM
64
2
0
17 Feb 2025
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Fushuo Huo
Wenchao Xu
Zhong Zhang
Haozhao Wang
Zhicheng Chen
Peilin Zhao
VLM
MLLM
61
18
0
04 Aug 2024
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Xubo Liu
Qiushi Huang
Xinhao Mei
Haohe Liu
Qiuqiang Kong
...
Yu Zhang
Lilian H. Y. Tang
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
36
18
0
28 Oct 2022
SCAM! Transferring humans between images with Semantic Cross Attention Modulation
Nicolas Dufour
David Picard
Vicky Kalogeiton
36
13
0
10 Oct 2022
Multimodal Self-Supervised Learning of General Audio Representations
Luyu Wang
Pauline Luc
Adrià Recasens
Jean-Baptiste Alayrac
Aaron van den Oord
SSL
70
41
0
26 Apr 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,939
0
09 Feb 2021
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
401
594
0
21 Jul 2020
Multimodal Utterance-level Affect Analysis using Visual, Audio and Text Features
Didan Deng
Yuqian Zhou
Jimin Pi
Bertram E. Shi
CVBM
12
25
0
02 May 2018
1