v1v2v3v4 (latest)

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020

11 March 2020

Yezhou Yang

Papers citing "Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning"

38 / 38 papers shown

Laugh, Relate, Engage: Stylized Comment Generation for Short Videos

131

05 Nov 2025

TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision

260

11 Jun 2025

Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI TechnologiesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

312

11 Oct 2024

Causal Understanding For Video Question Answering

Bhanu Prakash Reddy Guda

Tanmay Kulkarni

Adithya Sampath

Swarnashree Mysore Sathyendra

CML

359

23 Jul 2024

NarrativeBridge: Enhancing Video Captioning with Causal-Temporal NarrativeInternational Conference on Learning Representations (ICLR), 2024

515

10 Jun 2024

SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World KnowledgeComputer Vision and Pattern Recognition (CVPR), 2024

Chuang Gan

334

15 May 2024

Knowledge Guided Entity-aware Video Captioning and A Basketball Benchmark

262

25 Jan 2024

Commonsense for Zero-Shot Natural Language Video LocalizationAAAI Conference on Artificial Intelligence (AAAI), 2023

Meghana Holla

Ismini Lourentzou

408

29 Dec 2023

EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingNeural Information Processing Systems (NeurIPS), 2023

K. Mangalam

Raiymbek Akshulakov

Jitendra Malik

503

594

17 Aug 2023

End-to-end Knowledge Retrieval with Multi-modal QueriesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Yezhou Yang

253

01 Jun 2023

Fine-grained Audible Video DescriptionComputer Vision and Pattern Recognition (CVPR), 2023

Zhen Qin

...

Yuchao Dai

Lingpeng Kong

Meng Wang

Yu Qiao

Yiran Zhong

VGen

217

27 Mar 2023

GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary GenerationInternational Conference on Information and Knowledge Management (CIKM), 2023

Ji Qi

Jifan Yu

Teng Tu

Kunyu Gao

Yifan Xu

...

Juanzi Li

293

26 Mar 2023

Benchmarks for Automated Commonsense Reasoning: A SurveyACM Computing Surveys (ACM Comput. Surv.), 2023

E. Davis

ELM LRM

459

09 Feb 2023

CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Maitreya Patel

Tejas Gokhale

Chitta Baral

Yezhou Yang

289

07 Nov 2022

Multimodal learning with graphsNature Machine Intelligence (Nat. Mach. Intell.), 2022

720

154

07 Sep 2022

WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language ModelsNeural Information Processing Systems (NeurIPS), 2022

Gabriel Stanovsky

270

25 Jul 2022

From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-AnsweringComputer Vision and Pattern Recognition (CVPR), 2022

Jiangtong Li

Li Niu

Liqing Zhang

242

30 May 2022

Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos

Arnav Chakravarthy

Zhiyuan Fang

Yezhou Yang

192

28 Apr 2022

Video Question Answering: Datasets, Algorithms and ChallengesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Wei Ji

364

119

02 Mar 2022

A Review on Methods and Applications in Multimodal Deep Learning

317

175

18 Feb 2022

The Abduction of Sherlock Holmes: A Dataset for Visual Abductive ReasoningEuropean Conference on Computer Vision (ECCV), 2022

Yejin Choi

567

10 Feb 2022

Materialized Knowledge Bases from Commonsense Transformers

Shrestha Ghosh

Simon Razniewski

336

29 Dec 2021

Injecting Semantic Concepts into End-to-End Image Captioning

Xiaowei Hu

Yezhou Yang

Zicheng Liu

ViT VLM

289

124

09 Dec 2021

The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation

Marzena Karpinska

Nader Akoury

Mohit Iyyer

672

134

14 Sep 2021

Hybrid Reasoning Network for Video-based Commonsense CaptioningACM Multimedia (ACM MM), 2021

232

05 Aug 2021

iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability

Andrew Wang

Vasu Sharma

CML

271

25 Jun 2021

Unsupervised Pronoun Resolution via Masked Noun-Phrase PredictionAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

340

26 May 2021

Recent Advances and Trends in Multimodal Deep Learning: A Review

Xi Li

375

24 May 2021

NExT-QA:Next Phase of Question-Answering to Explaining Temporal ActionsComputer Vision and Pattern Recognition (CVPR), 2021

Junbin Xiao

Xindi Shang

Angela Yao

Tat-Seng Chua

515

817

18 May 2021

CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over ImagesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

Shailaja Keyur Sampat

Akshay Kumar

Yezhou Yang

Chitta Baral

197

13 Apr 2021

Compressing Visual-linguistic Model via Knowledge DistillationIEEE International Conference on Computer Vision (ICCV), 2021

Zhiyuan Fang

Jianfeng Wang

Xiaowei Hu

Lijuan Wang

Yezhou Yang

Zicheng Liu

VLM

389

119

05 Apr 2021

WeaQA: Weak Supervision via Captions for Visual Question AnsweringFindings (Findings), 2020

Pratyay Banerjee

Tejas Gokhale

Yezhou Yang

Chitta Baral

456

04 Dec 2020

What is More Likely to Happen Next? Video-and-Language Future Event Prediction

353

15 Oct 2020

Weak Supervision and Referring Attention for Temporal-Textual Association Learning

Zhiyuan Fang

Shu Kong

Zhe Wang

Charless C. Fowlkes

Yezhou Yang

178

21 Jun 2020

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

Zhe Wang

Zhiyuan Fang

Jun Wang

Yezhou Yang

310

225

15 May 2020

memeBot: Towards Automatic Image Meme Generation

Aadhavan Sadasivam

K. Gunasekar

H. Davulcu

Yezhou Yang

133

30 Apr 2020

VQA-LOL: Visual Question Answering under the Lens of LogicEuropean Conference on Computer Vision (ECCV), 2020

Yezhou Yang

326

19 Feb 2020

Diverse Video Captioning Through Latent Variable ExpansionPattern Recognition Letters (PR), 2019

Huanhou Xiao

Jinglun Shi

DiffM

417

26 Oct 2019