v1v2 (latest)

Perceiver: General Perception with Iterative Attention

International Conference on Machine Learning (ICML), 2021

4 March 2021

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "Perceiver: General Perception with Iterative Attention"

50 / 790 papers shown

A Light-Weight Contrastive Approach for Aligning Human Pose Sequences

R. Collins

3DH

191

07 Mar 2023

Your representations are in the network: composable and parallel adaptation for large scale modelsNeural Information Processing Systems (NeurIPS), 2023

297

07 Mar 2023

Prismer: A Vision-Language Model with Multi-Task Experts

Linxi Fan

315

04 Mar 2023

AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal ReasoningIEEE International Conference on Robotics and Automation (ICRA), 2023

Celso M. de Melo

160

02 Mar 2023

Directed Diffusion: Direct Control of Object Placement through Attention GuidanceAAAI Conference on Artificial Intelligence (AAAI), 2023

363

25 Feb 2023

Language-Driven Representation Learning for Robotics

Dorsa Sadigh

280

189

24 Feb 2023

150

20 Feb 2023

TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual Vision Transformer for Fast Arbitrary One-Shot Image Generation

214

16 Feb 2023

Cross-Modal Fine-Tuning: Align then RefineInternational Conference on Machine Learning (ICML), 2023

Graham Neubig

241

11 Feb 2023

DNArch: Learning Convolutional Neural Architectures by Backpropagation

David W. Romero

Neil Zeghidour

AI4CE

171

10 Feb 2023

Reversible Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2022

Christoph Feichtenhofer

Jitendra Malik

ViT

221

09 Feb 2023

Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

...

203

09 Feb 2023

Efficient Attention via Control VariatesInternational Conference on Learning Representations (ICLR), 2023

Lin Zheng

Jianbo Yuan

Chong-Jun Wang

Lingpeng Kong

286

09 Feb 2023

Efficient Joint Learning for Clinical Named Entity Recognition and Relation Extraction Using Fourier Networks: A Use Case in Adverse Drug EventsICON (ICON), 2023

155

08 Feb 2023

Multi-View Masked World Models for Visual Robotic ManipulationInternational Conference on Machine Learning (ICML), 2023

Pieter Abbeel

374

05 Feb 2023

3DShape2VecSet: A 3D Shape Representation for Neural Fields and Generative Diffusion ModelsACM Transactions on Graphics (TOG), 2023

Matthias Niessner

427

342

26 Jan 2023

Modelling Long Range Dependencies in

N

D: From Task-Specific to a General Purpose CNNInternational Conference on Learning Representations (ICLR), 2023

214

25 Jan 2023

Zorro: the masked multimodal transformer

...

231

23 Jan 2023

Multiview Compressive Coding for 3D ReconstructionComputer Vision and Pattern Recognition (CVPR), 2023

Chaozheng Wu

Justin Johnson

Jitendra Malik

Christoph Feichtenhofer

Georgia Gkioxari

285

19 Jan 2023

Laser: Latent Set Representations for 3D Generative Modeling

Danilo Jimenez Rezende

BDL 3DV DRL

241

13 Jan 2023

TarViS: A Unified Approach for Target-based Video SegmentationComputer Vision and Pattern Recognition (CVPR), 2023

362

06 Jan 2023

All in Tokens: Unifying Output Space of Visual Tasks via Soft TokenIEEE International Conference on Computer Vision (ICCV), 2023

330

05 Jan 2023

Test of Time: Instilling Video-Language Models with a Sense of TimeComputer Vision and Pattern Recognition (CVPR), 2023

Piyush Bagad

Makarand Tapaswi

Cees G. M. Snoek

463

05 Jan 2023

Transformers in Action Recognition: A Review on Temporal Modeling

Elham Shabaninia

Hossein Nezamabadi-pour

Fatemeh Shafizadegan

ViT

211

29 Dec 2022

Scalable Adaptive Computation for Iterative GenerationInternational Conference on Machine Learning (ICML), 2022

Allan Jabri

David Fleet

Ting-Li Chen

DiffM

232

153

22 Dec 2022

Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving ScenariosIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022

Yiren Lu

...

239

139

21 Dec 2022

MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question AnsweringComputer Vision and Pattern Recognition (CVPR), 2022

221

19 Dec 2022

Medical Diagnosis with Large Scale Multimodal Transformers: Leveraging Diverse Data for More Accurate Diagnosis

Firas Khader

Gustav Mueller-Franzes

Tian Wang

T. Han

Soroosh Tayebi Arasteh

...

Keno Bressem

Christiane Kuhl

S. Nebelung

Jakob Nikolas Kather

Daniel Truhn

100

18 Dec 2022

Inductive Attention for Video Action Anticipation

209

17 Dec 2022

MAViL: Masked Audio-Video LearnersNeural Information Processing Systems (NeurIPS), 2022

Po-Yao (Bernie) Huang

Christoph Feichtenhofer

322

15 Dec 2022

Vision Transformers are Parameter-Efficient Audio-Visual LearnersComputer Vision and Pattern Recognition (CVPR), 2022

Yan-Bo Lin

Yi-Lin Sung

Jie Lei

Joey Tianyi Zhou

Gedas Bertasius

321

108

15 Dec 2022

Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and LanguageInternational Conference on Machine Learning (ICML), 2022

352

123

14 Dec 2022

Structured 3D Features for Reconstructing Controllable AvatarsComputer Vision and Pattern Recognition (CVPR), 2022

Enric Corona

M. Zanfir

Thiemo Alldieck

Eduard Gabriel Bazavan

Andrei Zanfir

C. Sminchisescu

3DH

334

13 Dec 2022

Egocentric Video Task TranslationComputer Vision and Pattern Recognition (CVPR), 2022

262

13 Dec 2022

REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge MemoryComputer Vision and Pattern Recognition (CVPR), 2022

345

139

10 Dec 2022

Audiovisual Masked AutoencodersIEEE International Conference on Computer Vision (ICCV), 2022

Mariana-Iuliana Georgescu

317

09 Dec 2022

VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners

337

09 Dec 2022

A Flexible Nadaraya-Watson Head Can Offer Explainable and Calibrated Classification

Alan Q. Wang

M. Sabuncu

236

07 Dec 2022

Framework-agnostic Semantically-aware Global Reasoning for SegmentationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022

Mir Rayat Imtiaz Hossain

Leonid Sigal

James J. Little

ViT

155

06 Dec 2022

Images Speak in Images: A Generalist Painter for In-Context Visual LearningComputer Vision and Pattern Recognition (CVPR), 2022

Chunhua Shen

336

335

05 Dec 2022

Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot CurriculaConference on Robot Learning (CoRL), 2022

249

02 Dec 2022

Survey on Self-Supervised Multimodal Representation Learning and Foundation Models

Sushil Thapa

AI4TS SSL

100

29 Nov 2022

A Light Touch Approach to Teaching Transformers Multi-view GeometryComputer Vision and Pattern Recognition (CVPR), 2022

Brandon Smart

Joao F. Henriques

Andrew Zisserman

ViT

201

28 Nov 2022

Continuous diffusion for categorical data

...

334

144

28 Nov 2022

Interaction Region Visual Transformer for Egocentric Action AnticipationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022

Debaditya Roy

Ramanathan Rajendiran

Basura Fernando

417

25 Nov 2022

A Self-Attention Ansatz for Ab-initio Quantum ChemistryInternational Conference on Learning Representations (ICLR), 2022

Ingrid von Glehn

J. Spencer

David Pfau

192

24 Nov 2022

Event Transformer+. A multi-purpose solution for efficient event data processingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

224

22 Nov 2022

Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent AttentionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022

178

21 Nov 2022

Discovering Evolution Strategies via Meta-Black-Box OptimizationInternational Conference on Learning Representations (ICLR), 2022

341

21 Nov 2022

PointResNet: Residual Network for 3D Point Cloud Segmentation and Classification

235

20 Nov 2022