ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.03206
  4. Cited By
Perceiver: General Perception with Iterative Attention
v1v2 (latest)

Perceiver: General Perception with Iterative Attention

International Conference on Machine Learning (ICML), 2021
4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
    VLMViTMDE
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "Perceiver: General Perception with Iterative Attention"

50 / 790 papers shown
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video
  UniFormer
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
224
156
0
17 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
NANSY++: Unified Voice Synthesis with Neural Analysis and SynthesisInternational Conference on Learning Representations (ICLR), 2022
Hyeong-Seok Choi
Jinhyeok Yang
Juheon Lee
Hyeongju Kim
233
54
0
17 Nov 2022
Token Turing Machines
Token Turing MachinesComputer Vision and Pattern Recognition (CVPR), 2022
Michael S. Ryoo
K. Gopalakrishnan
Kumara Kahatapitiya
Ted Xiao
Kanishka Rao
Austin Stone
Yao Lu
Julian Ibarz
Anurag Arnab
239
28
0
16 Nov 2022
Latent Bottlenecked Attentive Neural Processes
Latent Bottlenecked Attentive Neural ProcessesInternational Conference on Learning Representations (ICLR), 2022
Leo Feng
Hossein Hajimirsadeghi
Yoshua Bengio
Mohamed Osama Ahmed
BDL
214
27
0
15 Nov 2022
NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision
  Research
NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research
J. Bornschein
Alexandre Galashov
Ross Hemsley
Amal Rannen-Triki
Yutian Chen
...
Angeliki Lazaridou
Yee Whye Teh
Andrei A. Rusu
Razvan Pascanu
MarcÁurelio Ranzato
OODVLMAI4TS
325
20
0
15 Nov 2022
Efficient Speech Translation with Dynamic Latent Perceivers
Efficient Speech Translation with Dynamic Latent PerceiversIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Ioannis Tsiamas
Gerard I. Gállego
José A. R. Fonollosa
Marta R. Costa-jussá
237
4
0
28 Oct 2022
A single-cell gene expression language model
A single-cell gene expression language model
Will Connell
Umair W Khan
Michael J. Keiser
115
11
0
25 Oct 2022
Solving Reasoning Tasks with a Slot Transformer
Solving Reasoning Tasks with a Slot Transformer
Ryan Faulkner
Daniel Zoran
LRM
147
1
0
20 Oct 2022
Play It Back: Iterative Attention for Audio Recognition
Play It Back: Iterative Attention for Audio RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Alexandros Stergiou
Dima Damen
192
5
0
20 Oct 2022
Coordinates Are NOT Lonely -- Codebook Prior Helps Implicit Neural 3D
  Representations
Coordinates Are NOT Lonely -- Codebook Prior Helps Implicit Neural 3D RepresentationsNeural Information Processing Systems (NeurIPS), 2022
Fukun Yin
Wen Liu
Zilong Huang
Pei Cheng
Tao Chen
Gang Yu
141
21
0
20 Oct 2022
Hierarchical Model-Based Imitation Learning for Planning in Autonomous
  Driving
Hierarchical Model-Based Imitation Learning for Planning in Autonomous DrivingIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Eli Bronstein
Mark Palatucci
Dominik Notz
Brandyn White
Alex Kuefler
...
Punit Shah
Evan Racah
Benjamin Frenkel
Shimon Whiteson
Drago Anguelov
287
68
0
18 Oct 2022
Improving Object-centric Learning with Query Optimization
Improving Object-centric Learning with Query OptimizationInternational Conference on Learning Representations (ICLR), 2022
Baoxiong Jia
Yu Liu
Siyuan Huang
OCL
262
62
0
17 Oct 2022
Linear Video Transformer with Feature Fixation
Linear Video Transformer with Feature Fixation
Kaiyue Lu
Zexia Liu
Jianyuan Wang
Weixuan Sun
Zhen Qin
...
Xuyang Shen
Huizhong Deng
Xiaodong Han
Yuchao Dai
Yiran Zhong
199
7
0
15 Oct 2022
Neural Attentive Circuits
Neural Attentive CircuitsNeural Information Processing Systems (NeurIPS), 2022
Nasim Rahaman
M. Weiß
Francesco Locatello
C. Pal
Yoshua Bengio
Bernhard Schölkopf
Erran L. Li
Nicolas Ballas
292
8
0
14 Oct 2022
RecipeMind: Guiding Ingredient Choices from Food Pairing to Recipe
  Completion using Cascaded Set Transformer
RecipeMind: Guiding Ingredient Choices from Food Pairing to Recipe Completion using Cascaded Set TransformerInternational Conference on Information and Knowledge Management (CIKM), 2022
Mogan Gim
Donghee Choi
Kana Maruyama
Jihun Choi
Hajung Kim
Donghyeon Park
Jaewoo Kang
162
8
0
14 Oct 2022
Sparse in Space and Time: Audio-visual Synchronisation with Trainable
  Selectors
Sparse in Space and Time: Audio-visual Synchronisation with Trainable SelectorsBritish Machine Vision Conference (BMVC), 2022
Vladimir E. Iashin
Weidi Xie
Esa Rahtu
Andrew Zisserman
147
32
0
13 Oct 2022
A Generalist Framework for Panoptic Segmentation of Images and Videos
A Generalist Framework for Panoptic Segmentation of Images and VideosIEEE International Conference on Computer Vision (ICCV), 2022
Ting-Li Chen
Lala Li
Saurabh Saxena
Geoffrey E. Hinton
David J. Fleet
VGenMLLM
442
131
0
12 Oct 2022
SaiT: Sparse Vision Transformers through Adaptive Token Pruning
SaiT: Sparse Vision Transformers through Adaptive Token Pruning
Ling Li
D. Thorsley
Joseph Hassoun
ViT
138
20
0
11 Oct 2022
Turbo Training with Token Dropout
Turbo Training with Token DropoutBritish Machine Vision Conference (BMVC), 2022
Tengda Han
Weidi Xie
Andrew Zisserman
ViT
214
14
0
10 Oct 2022
SCAM! Transferring humans between images with Semantic Cross Attention
  Modulation
SCAM! Transferring humans between images with Semantic Cross Attention ModulationEuropean Conference on Computer Vision (ECCV), 2022
Nicolas Dufour
David Picard
Vicky Kalogeiton
203
15
0
10 Oct 2022
ConTra: (Con)text (Tra)nsformer for Cross-Modal Video Retrieval
ConTra: (Con)text (Tra)nsformer for Cross-Modal Video RetrievalAsian Conference on Computer Vision (ACCV), 2022
A. Fragomeni
Michael Wray
Dima Damen
CLIPViT
145
4
0
09 Oct 2022
Learning Fine-Grained Visual Understanding for Video Question Answering
  via Decoupling Spatial-Temporal Modeling
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal ModelingBritish Machine Vision Conference (BMVC), 2022
Hsin-Ying Lee
Hung-Ting Su
Bing-Chen Tsai
Tsung-Han Wu
Jia-Fong Yeh
Winston H. Hsu
312
2
0
08 Oct 2022
VIMA: General Robot Manipulation with Multimodal Prompts
VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang
Agrim Gupta
Zichen Zhang
Guanzhi Wang
Yongqiang Dou
Yanjun Chen
Li Fei-Fei
Anima Anandkumar
Yuke Zhu
Linxi Fan
LM&Ro
390
475
0
06 Oct 2022
SPARC: Sparse Render-and-Compare for CAD model alignment in a single RGB
  image
SPARC: Sparse Render-and-Compare for CAD model alignment in a single RGB imageBritish Machine Vision Conference (BMVC), 2022
Florian Langer
Gwangbin Bae
Ignas Budvytis
R. Cipolla
3DPC
170
15
0
03 Oct 2022
Benign Autoencoders
Benign Autoencoders
Semyon Malamud
Teng Andrea Xu
Antoine Didisheim
DRLAI4CE
176
0
0
02 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Contrastive Audio-Visual Masked AutoencoderInternational Conference on Learning Representations (ICLR), 2022
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
396
167
0
02 Oct 2022
Construction and Evaluation of a Self-Attention Model for Semantic
  Understanding of Sentence-Final Particles
Construction and Evaluation of a Self-Attention Model for Semantic Understanding of Sentence-Final Particles
Shuhei Mandokoro
N. Oka
Akane Matsushima
Chie Fukada
Yuko Yoshimura
Koji Kawahara
Kazuaki Tanaka
120
1
0
01 Oct 2022
Cascaded Multi-Modal Mixing Transformers for Alzheimer's Disease
  Classification with Incomplete Data
Cascaded Multi-Modal Mixing Transformers for Alzheimer's Disease Classification with Incomplete DataNeuroImage (NeuroImage), 2022
Linfeng Liu
Siyu Liu
Lu Zhang
X. To
F. Nasrallah
Shekhar S. Chandra
MedIm
179
79
0
01 Oct 2022
Real-time Online Video Detection with Temporal Smoothing Transformers
Real-time Online Video Detection with Temporal Smoothing TransformersEuropean Conference on Computer Vision (ECCV), 2022
Yue Zhao
Philipp Krahenbuhl
ViT
178
91
0
19 Sep 2022
Distribution Aware Metrics for Conditional Natural Language Generation
Distribution Aware Metrics for Conditional Natural Language GenerationInternational Conference on Language Resources and Evaluation (LREC), 2022
David M. Chan
Yiming Ni
David A. Ross
Sudheendra Vijayanarasimhan
Austin Myers
John F. Canny
359
4
0
15 Sep 2022
Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer?
Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer?
Yi Wang
Zhiwen Fan
Tianlong Chen
Hehe Fan
Zinan Lin
ViT
251
10
0
15 Sep 2022
A patch-based architecture for multi-label classification from single
  label annotations
A patch-based architecture for multi-label classification from single label annotations
Warren Jouanneau
Aurélie Bugeau
Marc Palyart
Nicolas Papadakis
Laurent Vézard
169
0
0
14 Sep 2022
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Perceiver-Actor: A Multi-Task Transformer for Robotic ManipulationConference on Robot Learning (CoRL), 2022
Mohit Shridhar
Lucas Manuelli
Dieter Fox
LM&Ro
630
669
0
12 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles,
  Challenges, and Open Questions
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open QuestionsACM Computing Surveys (ACM CSUR), 2022
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
310
166
0
07 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Efficient Methods for Natural Language Processing: A SurveyTransactions of the Association for Computational Linguistics (TACL), 2022
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
373
141
0
31 Aug 2022
A Circular Window-based Cascade Transformer for Online Action Detection
A Circular Window-based Cascade Transformer for Online Action Detection
Shuyuan Cao
Weihua Luo
Bairui Wang
Wei Emma Zhang
Lin Ma
192
6
0
30 Aug 2022
Improving Small Molecule Generation using Mutual Information Machine
Improving Small Molecule Generation using Mutual Information Machine
Daniel A. Reidenbach
M. Livne
Rajesh Ilango
M. Gill
Johnny Israeli
278
20
0
18 Aug 2022
Efficient Multimodal Transformer with Dual-Level Feature Restoration for
  Robust Multimodal Sentiment Analysis
Efficient Multimodal Transformer with Dual-Level Feature Restoration for Robust Multimodal Sentiment AnalysisIEEE Transactions on Affective Computing (IEEE TAC), 2022
Guoying Zhao
Zheng Lian
B. Liu
Jianhua Tao
261
109
0
16 Aug 2022
Teacher Guided Training: An Efficient Framework for Knowledge Transfer
Teacher Guided Training: An Efficient Framework for Knowledge TransferInternational Conference on Learning Representations (ICLR), 2022
Manzil Zaheer
A. S. Rawat
Seungyeon Kim
Chong You
Himanshu Jain
Andreas Veit
Rob Fergus
Surinder Kumar
VLM
163
1
0
14 Aug 2022
Learning to Generalize with Object-centric Agents in the Open World
  Survival Game Crafter
Learning to Generalize with Object-centric Agents in the Open World Survival Game CrafterIEEE Transactions on Games (IEEE Trans. Games), 2022
Aleksandar Stanić
Yujin Tang
David R Ha
Jürgen Schmidhuber
ELM
255
15
0
05 Aug 2022
COPER: Continuous Patient State Perceiver
COPER: Continuous Patient State Perceiver
V. Chauhan
Anshul Thakur
Odhran O'Donoghue
David Clifton
AI4TSOOD
258
7
0
05 Aug 2022
Impact Makes a Sound and Sound Makes an Impact: Sound Guides
  Representations and Explorations
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and ExplorationsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Xufeng Zhao
C. Weber
Muhammad Burhan Hafez
S. Wermter
179
10
0
04 Aug 2022
CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
  Cloud Learning
CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud LearningIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Mahdi Saleh
Yige Wang
Nassir Navab
Benjamin Busam
F. Tombari
3DPC
235
4
0
31 Jul 2022
UAVM: Towards Unifying Audio and Visual Models
UAVM: Towards Unifying Audio and Visual ModelsIEEE Signal Processing Letters (SPL), 2022
Yuan Gong
Alexander H. Liu
Andrew Rouditchenko
James R. Glass
299
30
0
29 Jul 2022
Depth Field Networks for Generalizable Multi-view Scene Representation
Depth Field Networks for Generalizable Multi-view Scene RepresentationEuropean Conference on Computer Vision (ECCV), 2022
Vitor Campagnolo Guizilini
Igor Vasiljevic
Jiading Fang
Rares Andrei Ambrus
G. Shakhnarovich
Matthew R. Walter
Adrien Gaidon
3DVMDE
187
18
0
28 Jul 2022
Temporal and cross-modal attention for audio-visual zero-shot learning
Temporal and cross-modal attention for audio-visual zero-shot learningEuropean Conference on Computer Vision (ECCV), 2022
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
193
32
0
20 Jul 2022
Residual and Attentional Architectures for Vector-Symbols
Residual and Attentional Architectures for Vector-Symbols
W. Olin-Ammentorp
153
3
0
18 Jul 2022
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer
  to Unlabeled Modality
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled ModalityNeural Information Processing Systems (NeurIPS), 2022
Wei-Ning Hsu
Bowen Shi
SSLVLM
319
52
0
14 Jul 2022
Transformer-based Context Condensation for Boosting Feature Pyramids in
  Object Detection
Transformer-based Context Condensation for Boosting Feature Pyramids in Object DetectionInternational Journal of Computer Vision (IJCV), 2022
Zhe Chen
Jing Zhang
Yufei Xu
Dacheng Tao
ViT
220
15
0
14 Jul 2022
MM-ALT: A Multimodal Automatic Lyric Transcription System
MM-ALT: A Multimodal Automatic Lyric Transcription SystemACM Multimedia (ACM MM), 2022
Xiangming Gu
Longshen Ou
Danielle Ong
Ye Wang
215
15
0
13 Jul 2022
Previous
123...1213141516
Next