ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.08043
  4. Cited By
Self-supervised pre-training and contrastive representation learning for
  multiple-choice video QA
v1v2 (latest)

Self-supervised pre-training and contrastive representation learning for multiple-choice video QA

AAAI Conference on Artificial Intelligence (AAAI), 2020
17 September 2020
Seonhoon Kim
Seohyeong Jeong
Eunbyul Kim
Inho Kang
Nojun Kwak
    SSL
ArXiv (abs)PDFHTML

Papers citing "Self-supervised pre-training and contrastive representation learning for multiple-choice video QA"

26 / 26 papers shown
IMoRe: Implicit Program-Guided Reasoning for Human Motion Q&A
IMoRe: Implicit Program-Guided Reasoning for Human Motion Q&A
Chen Li
Chinthani Sugandhika
Yeo Keat Ee
Eric Peh
Hao Zhang
Hong Yang
Deepu Rajan
Basura Fernando
LRM
225
3
0
04 Aug 2025
Assessing Modality Bias in Video Question Answering Benchmarks with
  Multimodal Large Language Models
Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2024
Jean Park
Kuk Jin Jang
Basam Alasaly
Sriharsha Mopidevi
Andrew Zolensky
Eric Eaton
Insup Lee
Kevin Johnson
316
18
0
22 Aug 2024
End-to-End Video Question Answering with Frame Scoring Mechanisms and
  Adaptive Sampling
End-to-End Video Question Answering with Frame Scoring Mechanisms and Adaptive Sampling
Jianxin Liang
Xiaojun Meng
Yueqian Wang
Chang Liu
Qun Liu
Dongyan Zhao
237
15
0
21 Jul 2024
Answering Diverse Questions via Text Attached with Key Audio-Visual
  Clues
Answering Diverse Questions via Text Attached with Key Audio-Visual Clues
Qilang Ye
Zitong Yu
Xin Liu
271
4
0
11 Mar 2024
Towards More Faithful Natural Language Explanation Using Multi-Level
  Contrastive Learning in VQA
Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA
Chengen Lai
Shengli Song
Shiqi Meng
Jingyang Li
Sitong Yan
Guangneng Hu
258
11
0
21 Dec 2023
Visual Commonsense based Heterogeneous Graph Contrastive Learning
Visual Commonsense based Heterogeneous Graph Contrastive Learning
Zongzhao Li
Xiangyu Zhu
Xi Zhang
Zhaoxiang Zhang
Zhen Lei
248
1
0
11 Nov 2023
Long Story Short: a Summarize-then-Search Method for Long Video Question
  Answering
Long Story Short: a Summarize-then-Search Method for Long Video Question Answering
Jiwan Chung
Youngjae Yu
493
7
0
02 Nov 2023
Large Language Models are Temporal and Causal Reasoners for Video
  Question Answering
Large Language Models are Temporal and Causal Reasoners for Video Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Dohwan Ko
Ji Soo Lee
Wooyoung Kang
Byungseok Roh
Hyunwoo J. Kim
LRM
454
60
0
24 Oct 2023
Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer
Tem-adapter: Adapting Image-Text Pretraining for Video Question AnswerIEEE International Conference on Computer Vision (ICCV), 2023
Guangyi Chen
Xiao Liu
Guangrun Wang
Kun Zhang
Philip H.S.Torr
Xiaoping Zhang
Yansong Tang
396
32
0
16 Aug 2023
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen
  Large Language Models
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
Junting Pan
Ziyi Lin
Yuying Ge
Xiatian Zhu
Renrui Zhang
Yi Wang
Yu Qiao
Jiaming Song
MLLM
214
38
0
15 Jun 2023
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Jianghui Wang
Yuxuan Wang
Dongyan Zhao
Zilong Zheng
406
1
0
04 Jun 2023
Contrastive Video Question Answering via Video Graph Transformer
Contrastive Video Question Answering via Video Graph TransformerIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Junbin Xiao
Pan Zhou
Angela Yao
Yicong Li
Richang Hong
Shuicheng Yan
Tat-Seng Chua
ViT
324
57
0
27 Feb 2023
Cross-Modal Contrastive Learning for Robust Reasoning in VQA
Cross-Modal Contrastive Learning for Robust Reasoning in VQA
Qinjie Zheng
Chaoyue Wang
Daqing Liu
Dadong Wang
Dacheng Tao
LRM
176
0
0
21 Nov 2022
Facial Video-based Remote Physiological Measurement via Self-supervised
  Learning
Facial Video-based Remote Physiological Measurement via Self-supervised LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Zijie Yue
Miaojing Shi
Shuai Ding
CVBM
339
64
0
27 Oct 2022
Dense but Efficient VideoQA for Intricate Compositional Reasoning
Dense but Efficient VideoQA for Intricate Compositional ReasoningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Jihyeon Janel Lee
Wooyoung Kang
Eun-Sol Kim
CoGe
306
5
0
19 Oct 2022
An Empirical Study of End-to-End Video-Language Transformers with Masked
  Visual Modeling
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual ModelingComputer Vision and Pattern Recognition (CVPR), 2022
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
VLM
740
85
0
04 Sep 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language
  Models
Zero-Shot Video Question Answering via Frozen Bidirectional Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
551
284
0
16 Jun 2022
Beyond Just Vision: A Review on Self-Supervised Representation Learning
  on Multimodal and Temporal Data
Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data
Shohreh Deldari
Hao Xue
Aaqib Saeed
Jiayuan He
Daniel V. Smith
Flora D. Salim
AI4TS
289
45
0
06 Jun 2022
Learning to Answer Visual Questions from Web Videos
Learning to Answer Visual Questions from Web VideosIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
427
41
0
10 May 2022
Video Question Answering: Datasets, Algorithms and Challenges
Video Question Answering: Datasets, Algorithms and ChallengesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
360
118
0
02 Mar 2022
Temporal Sentence Grounding in Videos: A Survey and Future Directions
Temporal Sentence Grounding in Videos: A Survey and Future DirectionsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
3DGS
471
59
0
20 Jan 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and
  Sound
MERLOT Reserve: Neural Script Knowledge through Vision and Language and SoundComputer Vision and Pattern Recognition (CVPR), 2022
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
566
251
0
07 Jan 2022
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token
  Modeling
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Wenjie Wang
Lijuan Wang
Zicheng Liu
VLM
486
245
0
24 Nov 2021
Self-supervised Contrastive Cross-Modality Representation Learning for
  Spoken Question Answering
Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Chenyu You
Polydoros Giannouris
Yuexian Zou
SSL
269
66
0
08 Sep 2021
MERLOT: Multimodal Neural Script Knowledge Models
MERLOT: Multimodal Neural Script Knowledge ModelsNeural Information Processing Systems (NeurIPS), 2021
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLMLRM
519
439
0
04 Jun 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language
  Tasks
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language TasksNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Hung Le
Nancy F. Chen
Guosheng Lin
MLLM
330
21
0
16 Apr 2021
1
Page 1 of 1