Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2007.02036
Cited By
Modality Shifting Attention Network for Multi-modal Video Question Answering
4 July 2020
Junyeong Kim
Minuk Ma
T. Pham
Kyungsu Kim
Chang-Dong Yoo
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Modality Shifting Attention Network for Multi-modal Video Question Answering"
15 / 15 papers shown
Title
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
T. Pham
Tri Ton
Chang D. Yoo
41
3
0
03 Oct 2024
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
21
9
0
25 Oct 2023
Self-Supervised Visual Representation Learning via Residual Momentum
T. Pham
Axi Niu
Zhang Kang
Sultan Rizky Hikmawan Madjid
Jiajing Hong
Daehyeok Kim
Joshua Tian Jin Tee
Chang-Dong Yoo
SSL
43
6
0
17 Nov 2022
Frame-Subtitle Self-Supervision for Multi-Modal Video Question Answering
Jiong Wang
Zhou Zhao
Weike Jin
18
0
0
08 Sep 2022
Video Question Answering with Iterative Video-Text Co-Tokenization
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
20
18
0
01 Aug 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
36
227
0
16 Jun 2022
Learning to Answer Visual Questions from Web Videos
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
34
33
0
10 May 2022
Relevance-based Margin for Contrastively-trained Video Retrieval Models
Alex Falcon
Swathikiran Sudhakaran
G. Serra
Sergio Escalera
O. Lanz
34
7
0
27 Apr 2022
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Guangyao Li
Yake Wei
Yapeng Tian
Chenliang Xu
Ji-Rong Wen
Di Hu
29
136
0
26 Mar 2022
All in One: Exploring Unified Video-Language Pre-training
Alex Jinpeng Wang
Yixiao Ge
Rui Yan
Yuying Ge
Xudong Lin
Guanyu Cai
Jianping Wu
Ying Shan
Xiaohu Qie
Mike Zheng Shou
16
200
0
14 Mar 2022
Video Question Answering: Datasets, Algorithms and Challenges
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
21
85
0
02 Mar 2022
End-to-end Multi-modal Video Temporal Grounding
Yi-Wen Chen
Yi-Hsuan Tsai
Ming-Hsuan Yang
11
51
0
12 Jul 2021
Structured Co-reference Graph Attention for Video-grounded Dialogue
Junyeong Kim
Sunjae Yoon
Dahyun Kim
Chang-Dong Yoo
18
26
0
24 Mar 2021
SCNet: Training Inference Sample Consistency for Instance Segmentation
Thang Vu
Haeyong Kang
Chang-Dong Yoo
ISeg
57
90
0
18 Dec 2020
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
152
1,465
0
06 Jun 2016
1