Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.14088
Cited By
Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation
28 December 2021
Philipp Harzig
Moritz Einfalt
Rainer Lienhart
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation"
3 / 3 papers shown
Title
Normalized and Geometry-Aware Self-Attention Network for Image Captioning
Longteng Guo
Jing Liu
Xinxin Zhu
Peng Yao
Shichen Lu
Hanqing Lu
ViT
114
189
0
19 Mar 2020
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network
Bairui Wang
Lin Ma
Wei Zhang
Wenhao Jiang
Jingwen Wang
Wei Liu
66
163
0
27 Aug 2019
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,743
0
26 Sep 2016
1