Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2009.08043
Cited By
v1
v2 (latest)
Self-supervised pre-training and contrastive representation learning for multiple-choice video QA
AAAI Conference on Artificial Intelligence (AAAI), 2020
17 September 2020
Seonhoon Kim
Seohyeong Jeong
Eunbyul Kim
Inho Kang
Nojun Kwak
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Self-supervised pre-training and contrastive representation learning for multiple-choice video QA"
26 / 26 papers shown
IMoRe: Implicit Program-Guided Reasoning for Human Motion Q&A
Chen Li
Chinthani Sugandhika
Yeo Keat Ee
Eric Peh
Hao Zhang
Hong Yang
Deepu Rajan
Basura Fernando
LRM
225
3
0
04 Aug 2025
Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models
AAAI Conference on Artificial Intelligence (AAAI), 2024
Jean Park
Kuk Jin Jang
Basam Alasaly
Sriharsha Mopidevi
Andrew Zolensky
Eric Eaton
Insup Lee
Kevin Johnson
316
18
0
22 Aug 2024
End-to-End Video Question Answering with Frame Scoring Mechanisms and Adaptive Sampling
Jianxin Liang
Xiaojun Meng
Yueqian Wang
Chang Liu
Qun Liu
Dongyan Zhao
237
15
0
21 Jul 2024
Answering Diverse Questions via Text Attached with Key Audio-Visual Clues
Qilang Ye
Zitong Yu
Xin Liu
271
4
0
11 Mar 2024
Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA
Chengen Lai
Shengli Song
Shiqi Meng
Jingyang Li
Sitong Yan
Guangneng Hu
258
11
0
21 Dec 2023
Visual Commonsense based Heterogeneous Graph Contrastive Learning
Zongzhao Li
Xiangyu Zhu
Xi Zhang
Zhaoxiang Zhang
Zhen Lei
248
1
0
11 Nov 2023
Long Story Short: a Summarize-then-Search Method for Long Video Question Answering
Jiwan Chung
Youngjae Yu
493
7
0
02 Nov 2023
Large Language Models are Temporal and Causal Reasoners for Video Question Answering
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Dohwan Ko
Ji Soo Lee
Wooyoung Kang
Byungseok Roh
Hyunwoo J. Kim
LRM
454
60
0
24 Oct 2023
Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer
IEEE International Conference on Computer Vision (ICCV), 2023
Guangyi Chen
Xiao Liu
Guangrun Wang
Kun Zhang
Philip H.S.Torr
Xiaoping Zhang
Yansong Tang
396
32
0
16 Aug 2023
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
Junting Pan
Ziyi Lin
Yuying Ge
Xiatian Zhu
Renrui Zhang
Yi Wang
Yu Qiao
Jiaming Song
MLLM
214
38
0
15 Jun 2023
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Jianghui Wang
Yuxuan Wang
Dongyan Zhao
Zilong Zheng
406
1
0
04 Jun 2023
Contrastive Video Question Answering via Video Graph Transformer
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Junbin Xiao
Pan Zhou
Angela Yao
Yicong Li
Richang Hong
Shuicheng Yan
Tat-Seng Chua
ViT
324
57
0
27 Feb 2023
Cross-Modal Contrastive Learning for Robust Reasoning in VQA
Qinjie Zheng
Chaoyue Wang
Daqing Liu
Dadong Wang
Dacheng Tao
LRM
176
0
0
21 Nov 2022
Facial Video-based Remote Physiological Measurement via Self-supervised Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Zijie Yue
Miaojing Shi
Shuai Ding
CVBM
339
64
0
27 Oct 2022
Dense but Efficient VideoQA for Intricate Compositional Reasoning
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Jihyeon Janel Lee
Wooyoung Kang
Eun-Sol Kim
CoGe
306
5
0
19 Oct 2022
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Computer Vision and Pattern Recognition (CVPR), 2022
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
VLM
740
85
0
04 Sep 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Neural Information Processing Systems (NeurIPS), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
551
284
0
16 Jun 2022
Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data
Shohreh Deldari
Hao Xue
Aaqib Saeed
Jiayuan He
Daniel V. Smith
Flora D. Salim
AI4TS
289
45
0
06 Jun 2022
Learning to Answer Visual Questions from Web Videos
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
427
41
0
10 May 2022
Video Question Answering: Datasets, Algorithms and Challenges
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
360
118
0
02 Mar 2022
Temporal Sentence Grounding in Videos: A Survey and Future Directions
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
3DGS
471
59
0
20 Jan 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Computer Vision and Pattern Recognition (CVPR), 2022
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
566
251
0
07 Jan 2022
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Wenjie Wang
Lijuan Wang
Zicheng Liu
VLM
486
245
0
24 Nov 2021
Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Chenyu You
Polydoros Giannouris
Yuexian Zou
SSL
269
66
0
08 Sep 2021
MERLOT: Multimodal Neural Script Knowledge Models
Neural Information Processing Systems (NeurIPS), 2021
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLM
LRM
519
439
0
04 Jun 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Hung Le
Nancy F. Chen
Guosheng Lin
MLLM
330
21
0
16 Apr 2021
1
Page 1 of 1