Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2103.03206
Cited By
v1
v2 (latest)
Perceiver: General Perception with Iterative Attention
International Conference on Machine Learning (ICML), 2021
4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"Perceiver: General Perception with Iterative Attention"
50 / 782 papers shown
Title
Memory Efficient Neural Processes via Constant Memory Attention Block
International Conference on Machine Learning (ICML), 2023
Leo Feng
Frederick Tung
Hossein Hajimirsadeghi
Yoshua Bengio
Mohamed Osama Ahmed
262
8
0
23 May 2023
Training Transitive and Commutative Multimodal Transformers with LoReTTa
Neural Information Processing Systems (NeurIPS), 2023
Manuel Tran
Yashin Dicente Cid
Amal Lahiani
Fabian J. Theis
Tingying Peng
Eldad Klaiman
230
3
0
23 May 2023
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
229
108
0
22 May 2023
RWKV: Reinventing RNNs for the Transformer Era
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
513
805
0
22 May 2023
FIT: Far-reaching Interleaved Transformers
Ting-Li Chen
Lala Li
268
16
0
22 May 2023
What Makes for Good Visual Tokenizers for Large Language Models?
Guangzhi Wang
Yixiao Ge
Xiaohan Ding
Mohan S. Kankanhalli
Ying Shan
MLLM
VLM
223
44
0
20 May 2023
ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention
Interspeech (Interspeech), 2023
J. Yip
Tuan Truong
Dianwen Ng
Chong Zhang
Yukun Ma
Trung Hieu Nguyen
Chongjia Ni
Shengkui Zhao
Chng Eng Siong
Bin Ma
128
2
0
20 May 2023
LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model
Computer Vision and Pattern Recognition (CVPR), 2023
Chenjie Cao
Yunuo Cai
Qiaole Dong
Yikai Wang
Yanwei Fu
DiffM
207
21
0
19 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
383
149
0
18 May 2023
Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners
Xuehai He
Weixi Feng
Tsu-Jui Fu
Varun Jampani
Arjun Reddy Akula
P. Narayana
Sugato Basu
William Yang Wang
Xinze Wang
DiffM
361
13
0
18 May 2023
Paxion: Patching Action Knowledge in Video-Language Foundation Models
Neural Information Processing Systems (NeurIPS), 2023
Zhenhailong Wang
Ansel Blume
Sha Li
Genglin Liu
Jaemin Cho
Zineng Tang
Joey Tianyi Zhou
Heng Ji
KELM
VGen
209
40
0
18 May 2023
Soft Prompt Decoding for Multilingual Dense Retrieval
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023
Zhiqi Huang
Hansi Zeng
Hamed Zamani
James Allan
RALM
168
16
0
15 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Emanuele Bugliarello
Laurent Sartran
Aishwarya Agrawal
Lisa Anne Hendricks
Aida Nematzadeh
VLM
187
30
0
12 May 2023
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Neural Information Processing Systems (NeurIPS), 2023
L. Yu
Daniel Simig
Colin Flaherty
Armen Aghajanyan
Luke Zettlemoyer
M. Lewis
236
130
0
12 May 2023
Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts
Zhaoyang Zhang
Yantao Shen
Kunyu Shi
Zhaowei Cai
Jun Fang
Siqi Deng
Hao Yang
Davide Modolo
Zhuowen Tu
Stefano Soatto
VLM
154
3
0
11 May 2023
Cascaded Cross-Attention Networks for Data-Efficient Whole-Slide Image Classification Using Transformers
Firas Khader
Jakob Nikolas Kather
T. Han
S. Nebelung
Christiane Kuhl
Johannes Stegmaier
Daniel Truhn
MedIm
ViT
63
1
0
11 May 2023
VPGTrans: Transfer Visual Prompt Generator across LLMs
Neural Information Processing Systems (NeurIPS), 2023
Ao Zhang
Hao Fei
Yuan Yao
Wei Ji
Li Li
Zhiyuan Liu
Tat-Seng Chua
MLLM
VLM
182
100
0
02 May 2023
BenchMD: A Benchmark for Unified Learning on Medical Images and Sensors
Kathryn Wantlin
Chenwei Wu
Shih-Cheng Huang
Oishi Banerjee
Farah Z. Dadabhoy
...
A. Adamson
Laura Heacock
G. Tison
Alex Tamkin
Pranav Rajpurkar
SSL
OOD
116
4
0
17 Apr 2023
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction
Computer Vision and Pattern Recognition (CVPR), 2023
Guillaume Jaume
Anurag J. Vaidya
Richard J. Chen
Drew F. K. Williamson
Paul Pu Liang
Faisal Mahmood
316
97
0
13 Apr 2023
Dynamic Mobile-Former: Strengthening Dynamic Convolution with Attention and Residual Connection in Kernel Space
Seokju Yun
Youngmin Ro
ViT
130
2
0
13 Apr 2023
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens
International Conference on Learning Representations (ICLR), 2023
Ziteng Gao
Zhan Tong
Limin Wang
Mike Zheng Shou
143
14
0
07 Apr 2023
Attention: Marginal Probability is All You Need?
Ryan Singh
Christopher L. Buckley
133
3
0
07 Apr 2023
SLM: End-to-end Feature Selection via Sparse Learnable Masks
Yihe Dong
Sercan O. Arik
162
3
0
06 Apr 2023
Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data
Vladislav Lialin
Stephen Rawls
David M. Chan
Shalini Ghosh
Anna Rumshisky
Wael Hamza
VLM
AI4TS
238
8
0
04 Apr 2023
Monocular 3D Object Detection with Bounding Box Denoising in 3D by Perceiver
IEEE International Conference on Computer Vision (ICCV), 2023
Xianpeng Liu
Ce Zheng
K. Cheng
Nan Xue
Guo-Jun Qi
Tianfu Wu
3DPC
242
10
0
03 Apr 2023
FinderNet: A Data Augmentation Free Canonicalization aided Loop Detection and Closure technique for Point clouds in 6-DOF separation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Sudarshan S. Harithas
Gurkirat Singh
Aneesh Chavan
Sarthak Sharma
Suraj Patni
Chetan Arora
K. M. Krishna
3DPC
132
3
0
03 Apr 2023
Multi-Modal Perceiver Language Model for Outcome Prediction in Emergency Department
Sabri Boughorbel
Fethi Jarray
Abdulaziz Yousuf Al-Homaid
Rashid Niaz
Khalid Alyafei
148
1
0
03 Apr 2023
Towards Flexible Multi-modal Document Models
Computer Vision and Pattern Recognition (CVPR), 2023
Naoto Inoue
Kotaro Kikuchi
E. Simo-Serra
Mayu Otani
Kota Yamaguchi
185
30
0
31 Mar 2023
Self-Supervised Multimodal Learning: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
275
79
0
31 Mar 2023
Multi-modal learning for geospatial vegetation forecasting
Computer Vision and Pattern Recognition (CVPR), 2023
V. Benson
Claire Robin
C. Requena-Mesa
Lazaro Alonso
Nuno Carvalhais
José A. Cortés
Zhihan Gao
Nora Linscheid
M. Weynants
Markus Reichstein
209
23
0
28 Mar 2023
Object Discovery from Motion-Guided Tokens
Computer Vision and Pattern Recognition (CVPR), 2023
Zhipeng Bao
P. Tokmakov
Yu-Xiong Wang
Adrien Gaidon
M. Hebert
OCL
188
26
0
27 Mar 2023
GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents
ACM Transactions on Graphics (TOG), 2023
Tenglong Ao
Zeyi Zhang
Libin Liu
DiffM
VGen
239
187
0
26 Mar 2023
ViPFormer: Efficient Vision-and-Pointcloud Transformer for Unsupervised Pointcloud Understanding
IEEE International Conference on Robotics and Automation (ICRA), 2023
Hongyu Sun
Yongcai Wang
Xudong Cai
Xuewei Bai
Deying Li
ViT
3DPC
255
8
0
25 Mar 2023
Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained Experts
Kastan Day
D. Christl
Rohan Salvi
Pranav Sriram
ViT
120
1
0
24 Mar 2023
Machine Learning for Brain Disorders: Transformers and Visual Transformers
Robin Courant
Maika Edberg
Nicolas Dufour
Vicky Kalogeiton
MedIm
ViT
119
2
0
21 Mar 2023
Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers
Computer Vision and Pattern Recognition (CVPR), 2023
Jaehoon Yoo
Semin Kim
Doyup Lee
Chiheon Kim
Seunghoon Hong
193
6
0
20 Mar 2023
Unified Visual Relationship Detection with Vision and Language Models
IEEE International Conference on Computer Vision (ICCV), 2023
Long Zhao
Liangzhe Yuan
Boqing Gong
Huayu Chen
Florian Schroff
Ming-Hsuan Yang
Hartwig Adam
Ting Liu
ObjD
242
12
0
16 Mar 2023
Relax, it doesn't matter how you get there: A new self-supervised approach for multi-timescale behavior analysis
Neural Information Processing Systems (NeurIPS), 2023
Mehdi Azabou
Michael J. Mendelson
Nauman Ahad
Maks Sorokin
S. Thakoor
Carolina Urzay
Eva L. Dyer
164
9
0
15 Mar 2023
Making Vision Transformers Efficient from A Token Sparsification View
Computer Vision and Pattern Recognition (CVPR), 2023
Shuning Chang
Pichao Wang
Ming Lin
Fan Wang
David Junhao Zhang
Rong Jin
Mike Zheng Shou
ViT
206
37
0
15 Mar 2023
Brain Diffuser: An End-to-End Brain Image to Brain Network Pipeline
Chinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023
Xuhang Chen
Baiying Lei
Chi-Man Pun
Shuqiang Wang
MedIm
DiffM
228
35
0
11 Mar 2023
Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation
IEEE International Conference on Computer Vision (ICCV), 2023
Qichen Fu
Xingyu Liu
Ran Xu
Juan Carlos Niebles
Kris Kitani
ViT
216
22
0
09 Mar 2023
Sample Efficient Multimodal Semantic Augmentation for Incremental Summarization
Sumanta Bhattacharyya
R. Manuvinakurike
Sahisnu Mazumder
Saurav Sahay
VLM
149
0
0
08 Mar 2023
Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes
Computer Vision and Pattern Recognition (CVPR), 2023
Brandon Clark
Alec Kerrigan
P. Kulkarni
V. Cepeda
M. Shah
115
43
0
07 Mar 2023
A Light-Weight Contrastive Approach for Aligning Human Pose Sequences
R. Collins
3DH
158
2
0
07 Mar 2023
Your representations are in the network: composable and parallel adaptation for large scale models
Neural Information Processing Systems (NeurIPS), 2023
Yonatan Dukler
Alessandro Achille
Hao Yang
Varsha Vivek
Luca Zancato
Benjamin Bowman
Avinash Ravichandran
Charless C. Fowlkes
A. Swaminathan
Stefano Soatto
261
3
0
07 Mar 2023
Prismer: A Vision-Language Model with Multi-Task Experts
Shikun Liu
Linxi Fan
Edward Johns
Zhiding Yu
Chaowei Xiao
Anima Anandkumar
VLM
MLLM
284
33
0
04 Mar 2023
AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal Reasoning
IEEE International Conference on Robotics and Automation (ICRA), 2023
Xijun Wang
Ruiqi Xian
Tianrui Guan
Celso M. de Melo
Stephen M. Nogar
Aniket Bera
Tianyi Zhou
132
15
0
02 Mar 2023
Directed Diffusion: Direct Control of Object Placement through Attention Guidance
AAAI Conference on Artificial Intelligence (AAAI), 2023
W. Ma
J. P. Lewis
Avisek Lahiri
Thomas Leung
W. Kleijn
DiffM
276
81
0
25 Feb 2023
Language-Driven Representation Learning for Robotics
Siddharth Karamcheti
Suraj Nair
Annie S. Chen
Thomas Kollar
Chelsea Finn
Dorsa Sadigh
Abigail Z. Jacobs
LM&Ro
SSL
230
189
0
24 Feb 2023
Optical Transformers
Maxwell G. Anderson
Shifan Ma
Tianyu Wang
Logan G. Wright
Peter L. McMahon
111
31
0
20 Feb 2023
Previous
1
2
3
...
10
11
12
...
14
15
16
Next