Connecting the Dots between Audio and Text without Parallel Data through
Visual Knowledge Transfer

Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer

16 December 2021

Yejin Choi

Papers citing "Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer"

8 / 8 papers shown

Title
Audio-Language Datasets of Scenes and Events: A Survey Gijs Wijngaard Elia Formisano Michele Esposito M. Dumontier 74 2 0 10 Jan 2025
Gramian Multimodal Representation Learning and Alignment Giordano Cicchetti Eleonora Grassucci Luigi Sigillo Danilo Comminiello 76 0 0 16 Dec 2024
Harvesting Event Schemas from Large Language Models Jialong Tang Hongyu Lin Zhuoqun Li Yaojie Lu Xianpei Han Le Sun 12 4 0 12 May 2023
CAT: Causal Audio Transformer for Audio Classification Xiaoyu Liu Hanlin Lu Jianbo Yuan Xinyu Li ViT 8 21 0 14 Mar 2023
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound Yan-Bo Lin Jie Lei Mohit Bansal Gedas Bertasius 23 39 0 06 Apr 2022
Multimodal Self-Supervised Learning of General Audio Representations Luyu Wang Pauline Luc Adrià Recasens Jean-Baptiste Alayrac Aaron van den Oord SSL 70 41 0 26 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text Hassan Akbari Liangzhe Yuan Rui Qian Wei-Hong Chuang Shih-Fu Chang Yin Cui Boqing Gong ViT 231 573 0 22 Apr 2021
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 220 4,424 0 23 Jan 2020