Synchronized Audio-Visual Frames with Fractional Positional Encoding for
Transformers in Video-to-Text TranslationInternational Conference on Information Photonics (ICIP), 2021 |
A Comprehensive Review of the Video-to-Text ProblemArtificial Intelligence Review (AIR), 2021 |