End-to-End Multimodal Representation Learning for Video Dialog

26 October 2022

Papers citing "End-to-End Multimodal Representation Learning for Video Dialog"

4 / 4 papers shown

Title
HierSum: A Global and Local Attention Mechanism for Video Summarization Apoorva Beedu Irfan Essa 41 0 0 25 Apr 2025
Mamba Fusion: Learning Actions Through Questioning Zhikang Dong Apoorva Beedu Jason Sheinkopf Irfan Essa Mamba 57 2 0 17 Sep 2024
Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering Jungin Park Jiyoung Lee K. Sohn 123 99 0 29 Apr 2021
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,724 0 26 Sep 2016