Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2003.05162
Cited By
v1
v2
v3
v4 (latest)
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
11 March 2020
Zhiyuan Fang
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning"
38 / 38 papers shown
Laugh, Relate, Engage: Stylized Comment Generation for Short Videos
Xuan Ouyang
Senan Wang
Bouzhou Wang
Siyuan Xiahou
Jinrong Zhou
Yuekang Li
VGen
131
0
0
05 Nov 2025
TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision
Ayush Gupta
A. Roy
Rama Chellappa
Nathaniel D. Bastian
Alvaro Velasquez
Susmit Jha
260
2
0
11 Jun 2025
Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI Technologies
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Yingqiang Gao
Lukas Fischer
Alexa Lintner
Sarah Ebling
312
8
0
11 Oct 2024
Causal Understanding For Video Question Answering
Bhanu Prakash Reddy Guda
Tanmay Kulkarni
Adithya Sampath
Swarnashree Mysore Sathyendra
CML
359
0
0
23 Jul 2024
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
International Conference on Learning Representations (ICLR), 2024
Asmar Nadeem
Faegheh Sardari
R. Dawes
Syed Sameed Husain
Adrian Hilton
Armin Mustafa
515
10
0
10 Jun 2024
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Computer Vision and Pattern Recognition (CVPR), 2024
Andong Wang
Bo Wu
Sunli Chen
Zhenfang Chen
Haotian Guan
Wei-Ning Lee
Li Erran Li
Chuang Gan
LRM
RALM
334
36
0
15 May 2024
Knowledge Guided Entity-aware Video Captioning and A Basketball Benchmark
Zeyu Xi
Ge Shi
Xuefen Li
Junchi Yan
Zun Li
Lifang Wu
Zilin Liu
Liang Wang
262
1
0
25 Jan 2024
Commonsense for Zero-Shot Natural Language Video Localization
AAAI Conference on Artificial Intelligence (AAAI), 2023
Meghana Holla
Ismini Lourentzou
408
6
0
29 Dec 2023
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
Neural Information Processing Systems (NeurIPS), 2023
K. Mangalam
Raiymbek Akshulakov
Jitendra Malik
503
594
0
17 Aug 2023
End-to-end Knowledge Retrieval with Multi-modal Queries
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Man Luo
Zhiyuan Fang
Tejas Gokhale
Yezhou Yang
Chitta Baral
VLM
253
32
0
01 Jun 2023
Fine-grained Audible Video Description
Computer Vision and Pattern Recognition (CVPR), 2023
Xuyang Shen
Dong Li
Jinxing Zhou
Zhen Qin
Bowen He
...
Yuchao Dai
Lingpeng Kong
Meng Wang
Yu Qiao
Yiran Zhong
VGen
217
23
0
27 Mar 2023
GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation
International Conference on Information and Knowledge Management (CIKM), 2023
Ji Qi
Jifan Yu
Teng Tu
Kunyu Gao
Yifan Xu
...
Juanzi Li
Jie Tang
Weidong Guo
Hui Liu
Yu-Syuan Xu
293
36
0
26 Mar 2023
Benchmarks for Automated Commonsense Reasoning: A Survey
ACM Computing Surveys (ACM Comput. Surv.), 2023
E. Davis
ELM
LRM
459
83
0
09 Feb 2023
CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Maitreya Patel
Tejas Gokhale
Chitta Baral
Yezhou Yang
289
14
0
07 Nov 2022
Multimodal learning with graphs
Nature Machine Intelligence (Nat. Mach. Intell.), 2022
Yasha Ektefaie
George Dasoulas
Ayush Noori
Maha Farhat
Marinka Zitnik
720
154
0
07 Sep 2022
WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models
Neural Information Processing Systems (NeurIPS), 2022
Yonatan Bitton
Nitzan Bitton-Guetta
Ron Yosef
Yuval Elovici
Joey Tianyi Zhou
Gabriel Stanovsky
Roy Schwartz
270
20
0
25 Jul 2022
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering
Computer Vision and Pattern Recognition (CVPR), 2022
Jiangtong Li
Li Niu
Liqing Zhang
242
68
0
30 May 2022
Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos
Arnav Chakravarthy
Zhiyuan Fang
Yezhou Yang
192
2
0
28 Apr 2022
Video Question Answering: Datasets, Algorithms and Challenges
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
364
119
0
02 Mar 2022
A Review on Methods and Applications in Multimodal Deep Learning
Summaira Jabeen
Xi Li
Muhammad Shoib Amin
Abdul Jabbar
VLM
HAI
317
175
0
18 Feb 2022
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning
European Conference on Computer Vision (ECCV), 2022
Jack Hessel
Jena D. Hwang
Jinho Park
Rowan Zellers
Chandra Bhagavatula
Anna Rohrbach
Kate Saenko
Yejin Choi
ReLM
567
68
0
10 Feb 2022
Materialized Knowledge Bases from Commonsense Transformers
Shrestha Ghosh
Simon Razniewski
336
7
0
29 Dec 2021
Injecting Semantic Concepts into End-to-End Image Captioning
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lin Liang
Zhe Gan
Lijuan Wang
Yezhou Yang
Zicheng Liu
ViT
VLM
289
124
0
09 Dec 2021
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation
Marzena Karpinska
Nader Akoury
Mohit Iyyer
672
134
0
14 Sep 2021
Hybrid Reasoning Network for Video-based Commonsense Captioning
ACM Multimedia (ACM MM), 2021
Weijiang Yu
Jian Liang
Lei Ji
Lu Li
Yuejian Fang
Nong Xiao
Nan Duan
232
11
0
05 Aug 2021
iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability
Andrew Wang
Vasu Sharma
CML
271
5
0
25 Jun 2021
Unsupervised Pronoun Resolution via Masked Noun-Phrase Prediction
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Minghan Shen
Pratyay Banerjee
Chitta Baral
SSL
340
5
0
26 May 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
375
71
0
24 May 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Computer Vision and Pattern Recognition (CVPR), 2021
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
515
817
0
18 May 2021
CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Shailaja Keyur Sampat
Akshay Kumar
Yezhou Yang
Chitta Baral
197
26
0
13 Apr 2021
Compressing Visual-linguistic Model via Knowledge Distillation
IEEE International Conference on Computer Vision (ICCV), 2021
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lijuan Wang
Yezhou Yang
Zicheng Liu
VLM
389
119
0
05 Apr 2021
WeaQA: Weak Supervision via Captions for Visual Question Answering
Findings (Findings), 2020
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
456
39
0
04 Dec 2020
What is More Likely to Happen Next? Video-and-Language Future Event Prediction
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
353
80
0
15 Oct 2020
Weak Supervision and Referring Attention for Temporal-Textual Association Learning
Zhiyuan Fang
Shu Kong
Zhe Wang
Charless C. Fowlkes
Yezhou Yang
178
20
0
21 Jun 2020
ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language
Zhe Wang
Zhiyuan Fang
Jun Wang
Yezhou Yang
310
225
0
15 May 2020
memeBot: Towards Automatic Image Meme Generation
Aadhavan Sadasivam
K. Gunasekar
H. Davulcu
Yezhou Yang
133
17
0
30 Apr 2020
VQA-LOL: Visual Question Answering under the Lens of Logic
European Conference on Computer Vision (ECCV), 2020
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
CoGe
326
81
0
19 Feb 2020
Diverse Video Captioning Through Latent Variable Expansion
Pattern Recognition Letters (PR), 2019
Huanhou Xiao
Jinglun Shi
DiffM
417
15
0
26 Oct 2019
1
Page 1 of 1