Audiovisual Transformer Architectures for Large-Scale Classification and Synchronization of Weakly Labeled Audio Events

2 December 2019

Papers citing "Audiovisual Transformer Architectures for Large-Scale Classification and Synchronization of Weakly Labeled Audio Events"

4 / 4 papers shown

Title
Temporal and cross-modal attention for audio-visual zero-shot learning Otniel-Bogdan Mercea Thomas Hummel A. Sophia Koepke Zeynep Akata 38 25 0 20 Jul 2022
GAFX: A General Audio Feature eXtractor Zhaoyang Bu Han Zhang Xiaohu Zhu 30 0 0 19 Jul 2022
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation Tanzila Rahman Mengyu Yang Leonid Sigal ViT 29 8 0 26 Oct 2021
A Transformer-based Audio Captioning Model with Keyword Estimation Yuma Koizumi Ryo Masumura Kyosuke Nishida Masahiro Yasuda Shoichiro Saito 18 54 0 01 Jul 2020