Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation

28 December 2021

Papers citing "Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation"

3 / 3 papers shown

Title
Normalized and Geometry-Aware Self-Attention Network for Image Captioning Longteng Guo Jing Liu Xinxin Zhu Peng Yao Shichen Lu Hanqing Lu ViT 112 189 0 19 Mar 2020
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network Bairui Wang Lin Ma Wei Zhang Wenhao Jiang Jingwen Wang Wei Liu 66 163 0 27 Aug 2019
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,743 0 26 Sep 2016