Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.12681
Cited By
What Makes Training Multi-Modal Classification Networks Hard?
29 May 2019
Weiyao Wang
Du Tran
Matt Feiszli
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Makes Training Multi-Modal Classification Networks Hard?"
14 / 64 papers shown
Title
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
25
541
0
30 Jun 2021
VidHarm: A Clip Based Dataset for Harmful Content Detection
Johan Edstedt
Amanda Berg
M. Felsberg
Johan Karlsson
Francisca Benavente
Anette Novak
G. Pihlgren
14
2
0
15 Jun 2021
A Review on Explainability in Multimodal Deep Neural Nets
Gargi Joshi
Rahee Walambe
K. Kotecha
23
137
0
17 May 2021
MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition
J. Komorowski
Monika Wysoczanska
Tomasz Trzciñski
16
55
0
12 Apr 2021
Offboard 3D Object Detection from Point Cloud Sequences
C. Qi
Yin Zhou
Mahyar Najibi
Pei Sun
Khoa T. Vo
Boyang Deng
Dragomir Anguelov
3DPC
30
174
0
08 Mar 2021
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
48
973
0
04 Mar 2021
Trusted Multi-View Classification
Zongbo Han
Changqing Zhang
H. Fu
Joey Tianyi Zhou
EDL
21
164
0
03 Feb 2021
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies
Itai Gat
Idan Schwartz
A. Schwing
Tamir Hazan
53
89
0
21 Oct 2020
ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes
C. Qi
Xinlei Chen
Or Litany
Leonidas J. Guibas
3DPC
195
248
0
29 Jan 2020
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
194
205
0
23 Jan 2020
Listen to Look: Action Recognition by Previewing Audio
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
27
251
0
10 Dec 2019
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Bernard Ghanem
Du Tran
SSL
23
428
0
28 Nov 2019
Hypothesis Only Baselines in Natural Language Inference
Adam Poliak
Jason Naradowsky
Aparajita Haldar
Rachel Rudinger
Benjamin Van Durme
190
576
0
02 May 2018
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
152
1,465
0
06 Jun 2016
Previous
1
2