ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.03206
  4. Cited By
Perceiver: General Perception with Iterative Attention
v1v2 (latest)

Perceiver: General Perception with Iterative Attention

International Conference on Machine Learning (ICML), 2021
4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
    VLMViTMDE
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "Perceiver: General Perception with Iterative Attention"

50 / 790 papers shown
A Light-Weight Contrastive Approach for Aligning Human Pose Sequences
A Light-Weight Contrastive Approach for Aligning Human Pose Sequences
R. Collins
3DH
191
2
0
07 Mar 2023
Your representations are in the network: composable and parallel
  adaptation for large scale models
Your representations are in the network: composable and parallel adaptation for large scale modelsNeural Information Processing Systems (NeurIPS), 2023
Yonatan Dukler
Alessandro Achille
Hao Yang
Varsha Vivek
Luca Zancato
Benjamin Bowman
Avinash Ravichandran
Charless C. Fowlkes
A. Swaminathan
Stefano Soatto
297
3
0
07 Mar 2023
Prismer: A Vision-Language Model with Multi-Task Experts
Prismer: A Vision-Language Model with Multi-Task Experts
Shikun Liu
Linxi Fan
Edward Johns
Zhiding Yu
Chaowei Xiao
Anima Anandkumar
VLMMLLM
315
33
0
04 Mar 2023
AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal
  Reasoning
AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal ReasoningIEEE International Conference on Robotics and Automation (ICRA), 2023
Xijun Wang
Ruiqi Xian
Tianrui Guan
Celso M. de Melo
Stephen M. Nogar
Aniket Bera
Tianyi Zhou
160
18
0
02 Mar 2023
Directed Diffusion: Direct Control of Object Placement through Attention
  Guidance
Directed Diffusion: Direct Control of Object Placement through Attention GuidanceAAAI Conference on Artificial Intelligence (AAAI), 2023
W. Ma
J. P. Lewis
Avisek Lahiri
Thomas Leung
W. Kleijn
DiffM
363
82
0
25 Feb 2023
Language-Driven Representation Learning for Robotics
Language-Driven Representation Learning for Robotics
Siddharth Karamcheti
Suraj Nair
Annie S. Chen
Thomas Kollar
Chelsea Finn
Dorsa Sadigh
Abigail Z. Jacobs
LM&RoSSL
280
189
0
24 Feb 2023
Optical Transformers
Optical Transformers
Maxwell G. Anderson
Shifan Ma
Tianyu Wang
Logan G. Wright
Peter L. McMahon
150
36
0
20 Feb 2023
TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual
  Vision Transformer for Fast Arbitrary One-Shot Image Generation
TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual Vision Transformer for Fast Arbitrary One-Shot Image Generation
Yunliang Jiang
Li Yan
Xiongtao Zhang
Yong-Jin Liu
Da-Song Sun
ViT
214
5
0
16 Feb 2023
Cross-Modal Fine-Tuning: Align then Refine
Cross-Modal Fine-Tuning: Align then RefineInternational Conference on Machine Learning (ICML), 2023
Junhong Shen
Liam Li
Lucio Dery
Corey Staten
M. Khodak
Graham Neubig
Ameet Talwalkar
241
58
0
11 Feb 2023
DNArch: Learning Convolutional Neural Architectures by Backpropagation
DNArch: Learning Convolutional Neural Architectures by Backpropagation
David W. Romero
Neil Zeghidour
AI4CE
171
4
0
10 Feb 2023
Reversible Vision Transformers
Reversible Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2022
K. Mangalam
Haoqi Fan
Yanghao Li
Chaoxiong Wu
Bo Xiong
Christoph Feichtenhofer
Jitendra Malik
ViT
221
60
0
09 Feb 2023
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot
  Image Captioning
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Zhuolin Yang
Ming-Yu Liu
Zihan Liu
V. Korthikanti
Weili Nie
...
Yuke Zhu
Mohammad Shoeybi
Bryan Catanzaro
Chaowei Xiao
Anima Anandkumar
VLMRALM
203
50
0
09 Feb 2023
Efficient Attention via Control Variates
Efficient Attention via Control VariatesInternational Conference on Learning Representations (ICLR), 2023
Lin Zheng
Jianbo Yuan
Chong-Jun Wang
Lingpeng Kong
286
22
0
09 Feb 2023
Efficient Joint Learning for Clinical Named Entity Recognition and
  Relation Extraction Using Fourier Networks: A Use Case in Adverse Drug Events
Efficient Joint Learning for Clinical Named Entity Recognition and Relation Extraction Using Fourier Networks: A Use Case in Adverse Drug EventsICON (ICON), 2023
Anthony Yazdani
D. Proios
H. Rouhizadeh
Douglas Teodoro
155
8
0
08 Feb 2023
Multi-View Masked World Models for Visual Robotic Manipulation
Multi-View Masked World Models for Visual Robotic ManipulationInternational Conference on Machine Learning (ICML), 2023
Younggyo Seo
Junsup Kim
Stephen James
Kimin Lee
Jinwoo Shin
Pieter Abbeel
VGen
374
82
0
05 Feb 2023
3DShape2VecSet: A 3D Shape Representation for Neural Fields and
  Generative Diffusion Models
3DShape2VecSet: A 3D Shape Representation for Neural Fields and Generative Diffusion ModelsACM Transactions on Graphics (TOG), 2023
Biao Zhang
Jiapeng Tang
Matthias Niessner
Peter Wonka
DiffM
427
342
0
26 Jan 2023
Modelling Long Range Dependencies in $N$D: From Task-Specific to a
  General Purpose CNN
Modelling Long Range Dependencies in NNND: From Task-Specific to a General Purpose CNNInternational Conference on Learning Representations (ICLR), 2023
David M. Knigge
David W. Romero
Albert Gu
E. Gavves
Erik J. Bekkers
Jakub M. Tomczak
Mark Hoogendoorn
Jan-Jakob Sonke
3DV
214
28
0
25 Jan 2023
Zorro: the masked multimodal transformer
Zorro: the masked multimodal transformer
Adrià Recasens
Jason Lin
João Carreira
Drew Jaegle
Luyu Wang
...
Pauline Luc
Antoine Miech
Lucas Smaira
Ross Hemsley
Andrew Zisserman
231
23
0
23 Jan 2023
Multiview Compressive Coding for 3D Reconstruction
Multiview Compressive Coding for 3D ReconstructionComputer Vision and Pattern Recognition (CVPR), 2023
Chaozheng Wu
Justin Johnson
Jitendra Malik
Christoph Feichtenhofer
Georgia Gkioxari
285
91
0
19 Jan 2023
Laser: Latent Set Representations for 3D Generative Modeling
Laser: Latent Set Representations for 3D Generative Modeling
Pol Moreno
Adam R. Kosiorek
Heiko Strathmann
Daniel Zoran
Rosália G. Schneider
Bjorn Winckler
L. Markeeva
T. Weber
Danilo Jimenez Rezende
BDL3DVDRL
241
5
0
13 Jan 2023
TarViS: A Unified Approach for Target-based Video Segmentation
TarViS: A Unified Approach for Target-based Video SegmentationComputer Vision and Pattern Recognition (CVPR), 2023
A. Athar
Alexander Hermans
Jonathon Luiten
Deva Ramanan
Bastian Leibe
VOS
362
37
0
06 Jan 2023
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
All in Tokens: Unifying Output Space of Visual Tasks via Soft TokenIEEE International Conference on Computer Vision (ICCV), 2023
Jia Ning
Chen Li
Zheng Zhang
Zigang Geng
Jingdong Sun
Kun He
Han Hu
330
60
0
05 Jan 2023
Test of Time: Instilling Video-Language Models with a Sense of Time
Test of Time: Instilling Video-Language Models with a Sense of TimeComputer Vision and Pattern Recognition (CVPR), 2023
Piyush Bagad
Makarand Tapaswi
Cees G. M. Snoek
463
47
0
05 Jan 2023
Transformers in Action Recognition: A Review on Temporal Modeling
Transformers in Action Recognition: A Review on Temporal Modeling
Elham Shabaninia
Hossein Nezamabadi-pour
Fatemeh Shafizadegan
ViT
211
14
0
29 Dec 2022
Scalable Adaptive Computation for Iterative Generation
Scalable Adaptive Computation for Iterative GenerationInternational Conference on Machine Learning (ICML), 2022
Allan Jabri
David Fleet
Ting-Li Chen
DiffM
232
153
0
22 Dec 2022
Imitation Is Not Enough: Robustifying Imitation with Reinforcement
  Learning for Challenging Driving Scenarios
Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving ScenariosIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Yiren Lu
Justin Fu
George Tucker
Xinlei Pan
Eli Bronstein
...
Brandyn White
Aleksandra Faust
Shimon Whiteson
Drago Anguelov
Sergey Levine
OffRL
239
139
0
21 Dec 2022
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form
  Video Question Answering
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question AnsweringComputer Vision and Pattern Recognition (CVPR), 2022
Difei Gao
Luowei Zhou
Lei Ji
Linchao Zhu
Yezhou Yang
Mike Zheng Shou
221
86
0
19 Dec 2022
Medical Diagnosis with Large Scale Multimodal Transformers: Leveraging
  Diverse Data for More Accurate Diagnosis
Medical Diagnosis with Large Scale Multimodal Transformers: Leveraging Diverse Data for More Accurate Diagnosis
Firas Khader
Gustav Mueller-Franzes
Tian Wang
T. Han
Soroosh Tayebi Arasteh
...
Keno Bressem
Christiane Kuhl
S. Nebelung
Jakob Nikolas Kather
Daniel Truhn
100
9
0
18 Dec 2022
Inductive Attention for Video Action Anticipation
Inductive Attention for Video Action Anticipation
Tsung-Ming Tai
G. Fiameni
Cheng-Kuang Lee
Simon See
Oswald Lanz
209
1
0
17 Dec 2022
MAViL: Masked Audio-Video Learners
MAViL: Masked Audio-Video LearnersNeural Information Processing Systems (NeurIPS), 2022
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
322
73
0
15 Dec 2022
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Vision Transformers are Parameter-Efficient Audio-Visual LearnersComputer Vision and Pattern Recognition (CVPR), 2022
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
321
108
0
15 Dec 2022
Efficient Self-supervised Learning with Contextualized Target
  Representations for Vision, Speech and Language
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and LanguageInternational Conference on Machine Learning (ICML), 2022
Alexei Baevski
Arun Babu
Wei-Ning Hsu
Michael Auli
VLMSSL
352
123
0
14 Dec 2022
Structured 3D Features for Reconstructing Controllable Avatars
Structured 3D Features for Reconstructing Controllable AvatarsComputer Vision and Pattern Recognition (CVPR), 2022
Enric Corona
M. Zanfir
Thiemo Alldieck
Eduard Gabriel Bazavan
Andrei Zanfir
C. Sminchisescu
3DH
334
20
0
13 Dec 2022
Egocentric Video Task Translation
Egocentric Video Task TranslationComputer Vision and Pattern Recognition (CVPR), 2022
Zihui Xue
Yale Song
Kristen Grauman
Lorenzo Torresani
EgoV
262
18
0
13 Dec 2022
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with
  Multi-Source Multimodal Knowledge Memory
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge MemoryComputer Vision and Pattern Recognition (CVPR), 2022
Ziniu Hu
Ahmet Iscen
Chen Sun
Zirui Wang
Kai-Wei Chang
Luke Huan
Cordelia Schmid
David A. Ross
Alireza Fathi
RALMVLM
345
139
0
10 Dec 2022
Audiovisual Masked Autoencoders
Audiovisual Masked AutoencodersIEEE International Conference on Computer Vision (ICCV), 2022
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
317
56
0
09 Dec 2022
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive
  Captioners
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
Shen Yan
Tao Zhu
Zirui Wang
Yuan Cao
Mi Zhang
Soham Ghosh
Yonghui Wu
Jiahui Yu
VLMVGen
337
69
0
09 Dec 2022
A Flexible Nadaraya-Watson Head Can Offer Explainable and Calibrated
  Classification
A Flexible Nadaraya-Watson Head Can Offer Explainable and Calibrated Classification
Alan Q. Wang
M. Sabuncu
236
6
0
07 Dec 2022
Framework-agnostic Semantically-aware Global Reasoning for Segmentation
Framework-agnostic Semantically-aware Global Reasoning for SegmentationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Mir Rayat Imtiaz Hossain
Leonid Sigal
James J. Little
ViT
155
0
0
06 Dec 2022
Images Speak in Images: A Generalist Painter for In-Context Visual
  Learning
Images Speak in Images: A Generalist Painter for In-Context Visual LearningComputer Vision and Pattern Recognition (CVPR), 2022
Xinlong Wang
Wen Wang
Yue Cao
Chunhua Shen
Tiejun Huang
VLMMLLM
336
335
0
05 Dec 2022
Embedding Synthetic Off-Policy Experience for Autonomous Driving via
  Zero-Shot Curricula
Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot CurriculaConference on Robot Learning (CoRL), 2022
Eli Bronstein
S. Srinivasan
Supratik Paul
Aman Sinha
Matthew O'Kelly
Payam Nikdel
Shimon Whiteson
OffRL
249
19
0
02 Dec 2022
Survey on Self-Supervised Multimodal Representation Learning and
  Foundation Models
Survey on Self-Supervised Multimodal Representation Learning and Foundation Models
Sushil Thapa
AI4TSSSL
100
2
0
29 Nov 2022
A Light Touch Approach to Teaching Transformers Multi-view Geometry
A Light Touch Approach to Teaching Transformers Multi-view GeometryComputer Vision and Pattern Recognition (CVPR), 2022
Brandon Smart
Joao F. Henriques
Andrew Zisserman
ViT
201
6
0
28 Nov 2022
Continuous diffusion for categorical data
Continuous diffusion for categorical data
Sander Dieleman
Laurent Sartran
Arman Roshannai
Nikolay Savinov
Yaroslav Ganin
...
Conor Durkan
Curtis Hawthorne
Rémi Leblond
Will Grathwohl
J. Adler
DiffM
334
144
0
28 Nov 2022
Interaction Region Visual Transformer for Egocentric Action Anticipation
Interaction Region Visual Transformer for Egocentric Action AnticipationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Debaditya Roy
Ramanathan Rajendiran
Basura Fernando
417
27
0
25 Nov 2022
A Self-Attention Ansatz for Ab-initio Quantum Chemistry
A Self-Attention Ansatz for Ab-initio Quantum ChemistryInternational Conference on Learning Representations (ICLR), 2022
Ingrid von Glehn
J. Spencer
David Pfau
192
98
0
24 Nov 2022
Event Transformer+. A multi-purpose solution for efficient event data
  processing
Event Transformer+. A multi-purpose solution for efficient event data processingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Alberto Sabater
Luis Montesano
Ana C. Murillo
ViT
224
15
0
22 Nov 2022
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative
  Latent Attention
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent AttentionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Zineng Tang
Jaemin Cho
Jie Lei
Joey Tianyi Zhou
VLM
178
10
0
21 Nov 2022
Discovering Evolution Strategies via Meta-Black-Box Optimization
Discovering Evolution Strategies via Meta-Black-Box OptimizationInternational Conference on Learning Representations (ICLR), 2022
R. T. Lange
Tom Schaul
Yutian Chen
Tom Zahavy
Valenti Dallibard
Chris Xiaoxuan Lu
Satinder Singh
Sebastian Flennerhag
341
55
0
21 Nov 2022
PointResNet: Residual Network for 3D Point Cloud Segmentation and
  Classification
PointResNet: Residual Network for 3D Point Cloud Segmentation and Classification
Aadesh Desai
Saagar Parikh
S. Kumari
Shanmuganathan Raman
3DPC3DV
235
3
0
20 Nov 2022
Previous
123...111213141516
Next