Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations

21 February 2022

Papers citing "Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations"

5 / 5 papers shown

Title
M2K-VDG: Model-Adaptive Multimodal Knowledge Anchor Enhanced Video-grounded Dialogue Generation Hongcheng Liu Pingjie Wang Yu Wang Yanfeng Wang 39 1 0 19 Feb 2024
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius Heng Wang Lorenzo Torresani ViT 280 1,981 0 09 Feb 2021
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues Hung Le Doyen Sahoo Nancy F. Chen S. Hoi 38 30 0 20 Oct 2020
Aggregated Residual Transformations for Deep Neural Networks Saining Xie Ross B. Girshick Piotr Dollár Z. Tu Kaiming He 294 10,216 0 16 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,743 0 26 Sep 2016