Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.12397
Cited By
Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual Scenarios
21 May 2023
Yuanyuan Jiang
Jianqin Yin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual Scenarios"
5 / 5 papers shown
Title
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering
Yuanyuan Jiang
Jianqin Yin
38
1
0
13 May 2024
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
240
577
0
22 Apr 2021
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
410
594
0
21 Jul 2020
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
282
39,170
0
01 Sep 2014
Efficient Estimation of Word Representations in Vector Space
Tomáš Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
228
31,150
0
16 Jan 2013
1