Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2103.03206
Cited By
v1
v2 (latest)
Perceiver: General Perception with Iterative Attention
International Conference on Machine Learning (ICML), 2021
4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"Perceiver: General Perception with Iterative Attention"
50 / 790 papers shown
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
224
156
0
17 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
International Conference on Learning Representations (ICLR), 2022
Hyeong-Seok Choi
Jinhyeok Yang
Juheon Lee
Hyeongju Kim
233
54
0
17 Nov 2022
Token Turing Machines
Computer Vision and Pattern Recognition (CVPR), 2022
Michael S. Ryoo
K. Gopalakrishnan
Kumara Kahatapitiya
Ted Xiao
Kanishka Rao
Austin Stone
Yao Lu
Julian Ibarz
Anurag Arnab
239
28
0
16 Nov 2022
Latent Bottlenecked Attentive Neural Processes
International Conference on Learning Representations (ICLR), 2022
Leo Feng
Hossein Hajimirsadeghi
Yoshua Bengio
Mohamed Osama Ahmed
BDL
214
27
0
15 Nov 2022
NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research
J. Bornschein
Alexandre Galashov
Ross Hemsley
Amal Rannen-Triki
Yutian Chen
...
Angeliki Lazaridou
Yee Whye Teh
Andrei A. Rusu
Razvan Pascanu
MarcÁurelio Ranzato
OOD
VLM
AI4TS
325
20
0
15 Nov 2022
Efficient Speech Translation with Dynamic Latent Perceivers
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Ioannis Tsiamas
Gerard I. Gállego
José A. R. Fonollosa
Marta R. Costa-jussá
237
4
0
28 Oct 2022
A single-cell gene expression language model
Will Connell
Umair W Khan
Michael J. Keiser
115
11
0
25 Oct 2022
Solving Reasoning Tasks with a Slot Transformer
Ryan Faulkner
Daniel Zoran
LRM
147
1
0
20 Oct 2022
Play It Back: Iterative Attention for Audio Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Alexandros Stergiou
Dima Damen
192
5
0
20 Oct 2022
Coordinates Are NOT Lonely -- Codebook Prior Helps Implicit Neural 3D Representations
Neural Information Processing Systems (NeurIPS), 2022
Fukun Yin
Wen Liu
Zilong Huang
Pei Cheng
Tao Chen
Gang Yu
141
21
0
20 Oct 2022
Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Eli Bronstein
Mark Palatucci
Dominik Notz
Brandyn White
Alex Kuefler
...
Punit Shah
Evan Racah
Benjamin Frenkel
Shimon Whiteson
Drago Anguelov
287
68
0
18 Oct 2022
Improving Object-centric Learning with Query Optimization
International Conference on Learning Representations (ICLR), 2022
Baoxiong Jia
Yu Liu
Siyuan Huang
OCL
262
62
0
17 Oct 2022
Linear Video Transformer with Feature Fixation
Kaiyue Lu
Zexia Liu
Jianyuan Wang
Weixuan Sun
Zhen Qin
...
Xuyang Shen
Huizhong Deng
Xiaodong Han
Yuchao Dai
Yiran Zhong
199
7
0
15 Oct 2022
Neural Attentive Circuits
Neural Information Processing Systems (NeurIPS), 2022
Nasim Rahaman
M. Weiß
Francesco Locatello
C. Pal
Yoshua Bengio
Bernhard Schölkopf
Erran L. Li
Nicolas Ballas
292
8
0
14 Oct 2022
RecipeMind: Guiding Ingredient Choices from Food Pairing to Recipe Completion using Cascaded Set Transformer
International Conference on Information and Knowledge Management (CIKM), 2022
Mogan Gim
Donghee Choi
Kana Maruyama
Jihun Choi
Hajung Kim
Donghyeon Park
Jaewoo Kang
162
8
0
14 Oct 2022
Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors
British Machine Vision Conference (BMVC), 2022
Vladimir E. Iashin
Weidi Xie
Esa Rahtu
Andrew Zisserman
147
32
0
13 Oct 2022
A Generalist Framework for Panoptic Segmentation of Images and Videos
IEEE International Conference on Computer Vision (ICCV), 2022
Ting-Li Chen
Lala Li
Saurabh Saxena
Geoffrey E. Hinton
David J. Fleet
VGen
MLLM
442
131
0
12 Oct 2022
SaiT: Sparse Vision Transformers through Adaptive Token Pruning
Ling Li
D. Thorsley
Joseph Hassoun
ViT
138
20
0
11 Oct 2022
Turbo Training with Token Dropout
British Machine Vision Conference (BMVC), 2022
Tengda Han
Weidi Xie
Andrew Zisserman
ViT
214
14
0
10 Oct 2022
SCAM! Transferring humans between images with Semantic Cross Attention Modulation
European Conference on Computer Vision (ECCV), 2022
Nicolas Dufour
David Picard
Vicky Kalogeiton
203
15
0
10 Oct 2022
ConTra: (Con)text (Tra)nsformer for Cross-Modal Video Retrieval
Asian Conference on Computer Vision (ACCV), 2022
A. Fragomeni
Michael Wray
Dima Damen
CLIP
ViT
145
4
0
09 Oct 2022
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling
British Machine Vision Conference (BMVC), 2022
Hsin-Ying Lee
Hung-Ting Su
Bing-Chen Tsai
Tsung-Han Wu
Jia-Fong Yeh
Winston H. Hsu
312
2
0
08 Oct 2022
VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang
Agrim Gupta
Zichen Zhang
Guanzhi Wang
Yongqiang Dou
Yanjun Chen
Li Fei-Fei
Anima Anandkumar
Yuke Zhu
Linxi Fan
LM&Ro
390
475
0
06 Oct 2022
SPARC: Sparse Render-and-Compare for CAD model alignment in a single RGB image
British Machine Vision Conference (BMVC), 2022
Florian Langer
Gwangbin Bae
Ignas Budvytis
R. Cipolla
3DPC
170
15
0
03 Oct 2022
Benign Autoencoders
Semyon Malamud
Teng Andrea Xu
Antoine Didisheim
DRL
AI4CE
176
0
0
02 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
International Conference on Learning Representations (ICLR), 2022
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
396
167
0
02 Oct 2022
Construction and Evaluation of a Self-Attention Model for Semantic Understanding of Sentence-Final Particles
Shuhei Mandokoro
N. Oka
Akane Matsushima
Chie Fukada
Yuko Yoshimura
Koji Kawahara
Kazuaki Tanaka
120
1
0
01 Oct 2022
Cascaded Multi-Modal Mixing Transformers for Alzheimer's Disease Classification with Incomplete Data
NeuroImage (NeuroImage), 2022
Linfeng Liu
Siyu Liu
Lu Zhang
X. To
F. Nasrallah
Shekhar S. Chandra
MedIm
179
79
0
01 Oct 2022
Real-time Online Video Detection with Temporal Smoothing Transformers
European Conference on Computer Vision (ECCV), 2022
Yue Zhao
Philipp Krahenbuhl
ViT
178
91
0
19 Sep 2022
Distribution Aware Metrics for Conditional Natural Language Generation
International Conference on Language Resources and Evaluation (LREC), 2022
David M. Chan
Yiming Ni
David A. Ross
Sudheendra Vijayanarasimhan
Austin Myers
John F. Canny
359
4
0
15 Sep 2022
Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer?
Yi Wang
Zhiwen Fan
Tianlong Chen
Hehe Fan
Zinan Lin
ViT
251
10
0
15 Sep 2022
A patch-based architecture for multi-label classification from single label annotations
Warren Jouanneau
Aurélie Bugeau
Marc Palyart
Nicolas Papadakis
Laurent Vézard
169
0
0
14 Sep 2022
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Conference on Robot Learning (CoRL), 2022
Mohit Shridhar
Lucas Manuelli
Dieter Fox
LM&Ro
630
669
0
12 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
ACM Computing Surveys (ACM CSUR), 2022
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
310
166
0
07 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Transactions of the Association for Computational Linguistics (TACL), 2022
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
373
141
0
31 Aug 2022
A Circular Window-based Cascade Transformer for Online Action Detection
Shuyuan Cao
Weihua Luo
Bairui Wang
Wei Emma Zhang
Lin Ma
192
6
0
30 Aug 2022
Improving Small Molecule Generation using Mutual Information Machine
Daniel A. Reidenbach
M. Livne
Rajesh Ilango
M. Gill
Johnny Israeli
278
20
0
18 Aug 2022
Efficient Multimodal Transformer with Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis
IEEE Transactions on Affective Computing (IEEE TAC), 2022
Guoying Zhao
Zheng Lian
B. Liu
Jianhua Tao
261
109
0
16 Aug 2022
Teacher Guided Training: An Efficient Framework for Knowledge Transfer
International Conference on Learning Representations (ICLR), 2022
Manzil Zaheer
A. S. Rawat
Seungyeon Kim
Chong You
Himanshu Jain
Andreas Veit
Rob Fergus
Surinder Kumar
VLM
163
1
0
14 Aug 2022
Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter
IEEE Transactions on Games (IEEE Trans. Games), 2022
Aleksandar Stanić
Yujin Tang
David R Ha
Jürgen Schmidhuber
ELM
255
15
0
05 Aug 2022
COPER: Continuous Patient State Perceiver
V. Chauhan
Anshul Thakur
Odhran O'Donoghue
David Clifton
AI4TS
OOD
258
7
0
05 Aug 2022
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and Explorations
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Xufeng Zhao
C. Weber
Muhammad Burhan Hafez
S. Wermter
179
10
0
04 Aug 2022
CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Mahdi Saleh
Yige Wang
Nassir Navab
Benjamin Busam
F. Tombari
3DPC
235
4
0
31 Jul 2022
UAVM: Towards Unifying Audio and Visual Models
IEEE Signal Processing Letters (SPL), 2022
Yuan Gong
Alexander H. Liu
Andrew Rouditchenko
James R. Glass
299
30
0
29 Jul 2022
Depth Field Networks for Generalizable Multi-view Scene Representation
European Conference on Computer Vision (ECCV), 2022
Vitor Campagnolo Guizilini
Igor Vasiljevic
Jiading Fang
Rares Andrei Ambrus
G. Shakhnarovich
Matthew R. Walter
Adrien Gaidon
3DV
MDE
187
18
0
28 Jul 2022
Temporal and cross-modal attention for audio-visual zero-shot learning
European Conference on Computer Vision (ECCV), 2022
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
193
32
0
20 Jul 2022
Residual and Attentional Architectures for Vector-Symbols
W. Olin-Ammentorp
153
3
0
18 Jul 2022
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality
Neural Information Processing Systems (NeurIPS), 2022
Wei-Ning Hsu
Bowen Shi
SSL
VLM
319
52
0
14 Jul 2022
Transformer-based Context Condensation for Boosting Feature Pyramids in Object Detection
International Journal of Computer Vision (IJCV), 2022
Zhe Chen
Jing Zhang
Yufei Xu
Dacheng Tao
ViT
220
15
0
14 Jul 2022
MM-ALT: A Multimodal Automatic Lyric Transcription System
ACM Multimedia (ACM MM), 2022
Xiangming Gu
Longshen Ou
Danielle Ong
Ye Wang
215
15
0
13 Jul 2022
Previous
1
2
3
...
12
13
14
15
16
Next