ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.05162
  4. Cited By
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video
  Captioning
v1v2v3v4 (latest)

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
11 March 2020
Zhiyuan Fang
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
ArXiv (abs)PDFHTML

Papers citing "Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning"

38 / 38 papers shown
Laugh, Relate, Engage: Stylized Comment Generation for Short Videos
Laugh, Relate, Engage: Stylized Comment Generation for Short Videos
Xuan Ouyang
Senan Wang
Bouzhou Wang
Siyuan Xiahou
Jinrong Zhou
Yuekang Li
VGen
131
0
0
05 Nov 2025
TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision
TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision
Ayush Gupta
A. Roy
Rama Chellappa
Nathaniel D. Bastian
Alvaro Velasquez
Susmit Jha
260
2
0
11 Jun 2025
Audio Description Generation in the Era of LLMs and VLMs: A Review of
  Transferable Generative AI Technologies
Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI TechnologiesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Yingqiang Gao
Lukas Fischer
Alexa Lintner
Sarah Ebling
312
8
0
11 Oct 2024
Causal Understanding For Video Question Answering
Causal Understanding For Video Question Answering
Bhanu Prakash Reddy Guda
Tanmay Kulkarni
Adithya Sampath
Swarnashree Mysore Sathyendra
CML
359
0
0
23 Jul 2024
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal NarrativeInternational Conference on Learning Representations (ICLR), 2024
Asmar Nadeem
Faegheh Sardari
R. Dawes
Syed Sameed Husain
Adrian Hilton
Armin Mustafa
515
10
0
10 Jun 2024
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World
  Knowledge
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World KnowledgeComputer Vision and Pattern Recognition (CVPR), 2024
Andong Wang
Bo Wu
Sunli Chen
Zhenfang Chen
Haotian Guan
Wei-Ning Lee
Li Erran Li
Chuang Gan
LRMRALM
334
36
0
15 May 2024
Knowledge Guided Entity-aware Video Captioning and A Basketball
  Benchmark
Knowledge Guided Entity-aware Video Captioning and A Basketball Benchmark
Zeyu Xi
Ge Shi
Xuefen Li
Junchi Yan
Zun Li
Lifang Wu
Zilin Liu
Liang Wang
262
1
0
25 Jan 2024
Commonsense for Zero-Shot Natural Language Video Localization
Commonsense for Zero-Shot Natural Language Video LocalizationAAAI Conference on Artificial Intelligence (AAAI), 2023
Meghana Holla
Ismini Lourentzou
408
6
0
29 Dec 2023
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language
  Understanding
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingNeural Information Processing Systems (NeurIPS), 2023
K. Mangalam
Raiymbek Akshulakov
Jitendra Malik
503
594
0
17 Aug 2023
End-to-end Knowledge Retrieval with Multi-modal Queries
End-to-end Knowledge Retrieval with Multi-modal QueriesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Man Luo
Zhiyuan Fang
Tejas Gokhale
Yezhou Yang
Chitta Baral
VLM
253
32
0
01 Jun 2023
Fine-grained Audible Video Description
Fine-grained Audible Video DescriptionComputer Vision and Pattern Recognition (CVPR), 2023
Xuyang Shen
Dong Li
Jinxing Zhou
Zhen Qin
Bowen He
...
Yuchao Dai
Lingpeng Kong
Meng Wang
Yu Qiao
Yiran Zhong
VGen
217
23
0
27 Mar 2023
GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for
  Real-time Soccer Commentary Generation
GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary GenerationInternational Conference on Information and Knowledge Management (CIKM), 2023
Ji Qi
Jifan Yu
Teng Tu
Kunyu Gao
Yifan Xu
...
Juanzi Li
Jie Tang
Weidong Guo
Hui Liu
Yu-Syuan Xu
293
36
0
26 Mar 2023
Benchmarks for Automated Commonsense Reasoning: A Survey
Benchmarks for Automated Commonsense Reasoning: A SurveyACM Computing Surveys (ACM Comput. Surv.), 2023
E. Davis
ELMLRM
459
83
0
09 Feb 2023
CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties
  via Video Question Answering
CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Maitreya Patel
Tejas Gokhale
Chitta Baral
Yezhou Yang
289
14
0
07 Nov 2022
Multimodal learning with graphs
Multimodal learning with graphsNature Machine Intelligence (Nat. Mach. Intell.), 2022
Yasha Ektefaie
George Dasoulas
Ayush Noori
Maha Farhat
Marinka Zitnik
720
154
0
07 Sep 2022
WinoGAViL: Gamified Association Benchmark to Challenge
  Vision-and-Language Models
WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Yonatan Bitton
Nitzan Bitton-Guetta
Ron Yosef
Yuval Elovici
Joey Tianyi Zhou
Gabriel Stanovsky
Roy Schwartz
270
20
0
25 Jul 2022
From Representation to Reasoning: Towards both Evidence and Commonsense
  Reasoning for Video Question-Answering
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-AnsweringComputer Vision and Pattern Recognition (CVPR), 2022
Jiangtong Li
Li Niu
Liqing Zhang
242
68
0
30 May 2022
Tragedy Plus Time: Capturing Unintended Human Activities from
  Weakly-labeled Videos
Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos
Arnav Chakravarthy
Zhiyuan Fang
Yezhou Yang
192
2
0
28 Apr 2022
Video Question Answering: Datasets, Algorithms and Challenges
Video Question Answering: Datasets, Algorithms and ChallengesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
364
119
0
02 Mar 2022
A Review on Methods and Applications in Multimodal Deep Learning
A Review on Methods and Applications in Multimodal Deep Learning
Summaira Jabeen
Xi Li
Muhammad Shoib Amin
Abdul Jabbar
VLMHAI
317
175
0
18 Feb 2022
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive
  Reasoning
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive ReasoningEuropean Conference on Computer Vision (ECCV), 2022
Jack Hessel
Jena D. Hwang
Jinho Park
Rowan Zellers
Chandra Bhagavatula
Anna Rohrbach
Kate Saenko
Yejin Choi
ReLM
567
68
0
10 Feb 2022
Materialized Knowledge Bases from Commonsense Transformers
Materialized Knowledge Bases from Commonsense Transformers
Shrestha Ghosh
Simon Razniewski
336
7
0
29 Dec 2021
Injecting Semantic Concepts into End-to-End Image Captioning
Injecting Semantic Concepts into End-to-End Image Captioning
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lin Liang
Zhe Gan
Lijuan Wang
Yezhou Yang
Zicheng Liu
ViTVLM
289
124
0
09 Dec 2021
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text
  Generation
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation
Marzena Karpinska
Nader Akoury
Mohit Iyyer
672
134
0
14 Sep 2021
Hybrid Reasoning Network for Video-based Commonsense Captioning
Hybrid Reasoning Network for Video-based Commonsense CaptioningACM Multimedia (ACM MM), 2021
Weijiang Yu
Jian Liang
Lei Ji
Lu Li
Yuejian Fang
Nong Xiao
Nan Duan
232
11
0
05 Aug 2021
iReason: Multimodal Commonsense Reasoning using Videos and Natural
  Language with Interpretability
iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability
Andrew Wang
Vasu Sharma
CML
271
5
0
25 Jun 2021
Unsupervised Pronoun Resolution via Masked Noun-Phrase Prediction
Unsupervised Pronoun Resolution via Masked Noun-Phrase PredictionAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Minghan Shen
Pratyay Banerjee
Chitta Baral
SSL
340
5
0
26 May 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
375
71
0
24 May 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
NExT-QA:Next Phase of Question-Answering to Explaining Temporal ActionsComputer Vision and Pattern Recognition (CVPR), 2021
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
515
817
0
18 May 2021
CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question
  Answering with Hypothetical Actions over Images
CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over ImagesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Shailaja Keyur Sampat
Akshay Kumar
Yezhou Yang
Chitta Baral
197
26
0
13 Apr 2021
Compressing Visual-linguistic Model via Knowledge Distillation
Compressing Visual-linguistic Model via Knowledge DistillationIEEE International Conference on Computer Vision (ICCV), 2021
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lijuan Wang
Yezhou Yang
Zicheng Liu
VLM
389
119
0
05 Apr 2021
WeaQA: Weak Supervision via Captions for Visual Question Answering
WeaQA: Weak Supervision via Captions for Visual Question AnsweringFindings (Findings), 2020
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
456
39
0
04 Dec 2020
What is More Likely to Happen Next? Video-and-Language Future Event
  Prediction
What is More Likely to Happen Next? Video-and-Language Future Event Prediction
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
353
80
0
15 Oct 2020
Weak Supervision and Referring Attention for Temporal-Textual
  Association Learning
Weak Supervision and Referring Attention for Temporal-Textual Association Learning
Zhiyuan Fang
Shu Kong
Zhe Wang
Charless C. Fowlkes
Yezhou Yang
178
20
0
21 Jun 2020
ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural
  Language
ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language
Zhe Wang
Zhiyuan Fang
Jun Wang
Yezhou Yang
310
225
0
15 May 2020
memeBot: Towards Automatic Image Meme Generation
memeBot: Towards Automatic Image Meme Generation
Aadhavan Sadasivam
K. Gunasekar
H. Davulcu
Yezhou Yang
133
17
0
30 Apr 2020
VQA-LOL: Visual Question Answering under the Lens of Logic
VQA-LOL: Visual Question Answering under the Lens of LogicEuropean Conference on Computer Vision (ECCV), 2020
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
CoGe
326
81
0
19 Feb 2020
Diverse Video Captioning Through Latent Variable Expansion
Diverse Video Captioning Through Latent Variable ExpansionPattern Recognition Letters (PR), 2019
Huanhou Xiao
Jinglun Shi
DiffM
417
15
0
26 Oct 2019
1
Page 1 of 1