v1v2 (latest)

Perceiver: General Perception with Iterative Attention

International Conference on Machine Learning (ICML), 2021

4 March 2021

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "Perceiver: General Perception with Iterative Attention"

50 / 790 papers shown

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer

Yi Wang

Yu Qiao

224

156

17 Nov 2022

NANSY++: Unified Voice Synthesis with Neural Analysis and SynthesisInternational Conference on Learning Representations (ICLR), 2022

Hyeong-Seok Choi

Jinhyeok Yang

Juheon Lee

Hyeongju Kim

233

17 Nov 2022

Token Turing MachinesComputer Vision and Pattern Recognition (CVPR), 2022

239

16 Nov 2022

Latent Bottlenecked Attentive Neural ProcessesInternational Conference on Learning Representations (ICLR), 2022

Leo Feng

Hossein Hajimirsadeghi

Yoshua Bengio

Mohamed Osama Ahmed

BDL

214

15 Nov 2022

NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research

...

325

15 Nov 2022

Efficient Speech Translation with Dynamic Latent PerceiversIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

237

28 Oct 2022

A single-cell gene expression language model

Will Connell

Umair W Khan

Michael J. Keiser

115

25 Oct 2022

Solving Reasoning Tasks with a Slot Transformer

Ryan Faulkner

Daniel Zoran

LRM

147

20 Oct 2022

Play It Back: Iterative Attention for Audio RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Alexandros Stergiou

Dima Damen

192

20 Oct 2022

Coordinates Are NOT Lonely -- Codebook Prior Helps Implicit Neural 3D RepresentationsNeural Information Processing Systems (NeurIPS), 2022

Tao Chen

141

20 Oct 2022

Hierarchical Model-Based Imitation Learning for Planning in Autonomous DrivingIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022

...

287

18 Oct 2022

Improving Object-centric Learning with Query OptimizationInternational Conference on Learning Representations (ICLR), 2022

Baoxiong Jia

Yu Liu

Siyuan Huang

OCL

262

17 Oct 2022

Linear Video Transformer with Feature Fixation

Zhen Qin

...

Yuchao Dai

199

15 Oct 2022

Neural Attentive CircuitsNeural Information Processing Systems (NeurIPS), 2022

Francesco Locatello

292

14 Oct 2022

RecipeMind: Guiding Ingredient Choices from Food Pairing to Recipe Completion using Cascaded Set TransformerInternational Conference on Information and Knowledge Management (CIKM), 2022

162

14 Oct 2022

Sparse in Space and Time: Audio-visual Synchronisation with Trainable SelectorsBritish Machine Vision Conference (BMVC), 2022

Vladimir E. Iashin

Weidi Xie

Esa Rahtu

Andrew Zisserman

147

13 Oct 2022

A Generalist Framework for Panoptic Segmentation of Images and VideosIEEE International Conference on Computer Vision (ICCV), 2022

David J. Fleet

442

131

12 Oct 2022

SaiT: Sparse Vision Transformers through Adaptive Token Pruning

138

11 Oct 2022

Turbo Training with Token DropoutBritish Machine Vision Conference (BMVC), 2022

214

10 Oct 2022

SCAM! Transferring humans between images with Semantic Cross Attention ModulationEuropean Conference on Computer Vision (ECCV), 2022

Nicolas Dufour

David Picard

Vicky Kalogeiton

203

10 Oct 2022

ConTra: (Con)text (Tra)nsformer for Cross-Modal Video RetrievalAsian Conference on Computer Vision (ACCV), 2022

A. Fragomeni

Michael Wray

Dima Damen

CLIP ViT

145

09 Oct 2022

Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal ModelingBritish Machine Vision Conference (BMVC), 2022

Hsin-Ying Lee

Hung-Ting Su

312

08 Oct 2022

VIMA: General Robot Manipulation with Multimodal Prompts

Li Fei-Fei

Linxi Fan

390

475

06 Oct 2022

SPARC: Sparse Render-and-Compare for CAD model alignment in a single RGB imageBritish Machine Vision Conference (BMVC), 2022

170

03 Oct 2022

176

02 Oct 2022

Contrastive Audio-Visual Masked AutoencoderInternational Conference on Learning Representations (ICLR), 2022

396

167

02 Oct 2022

Construction and Evaluation of a Self-Attention Model for Semantic Understanding of Sentence-Final Particles

120

01 Oct 2022

Cascaded Multi-Modal Mixing Transformers for Alzheimer's Disease Classification with Incomplete DataNeuroImage (NeuroImage), 2022

179

01 Oct 2022

Real-time Online Video Detection with Temporal Smoothing TransformersEuropean Conference on Computer Vision (ECCV), 2022

Yue Zhao

Philipp Krahenbuhl

ViT

178

19 Sep 2022

Distribution Aware Metrics for Conditional Natural Language GenerationInternational Conference on Language Resources and Evaluation (LREC), 2022

David M. Chan

Yiming Ni

David A. Ross

Sudheendra Vijayanarasimhan

Austin Myers

John F. Canny

359

15 Sep 2022

Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer?

Hehe Fan

251

15 Sep 2022

A patch-based architecture for multi-label classification from single label annotations

169

14 Sep 2022

Perceiver-Actor: A Multi-Task Transformer for Robotic ManipulationConference on Robot Learning (CoRL), 2022

630

669

12 Sep 2022

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open QuestionsACM Computing Surveys (ACM CSUR), 2022

Paul Pu Liang

Amir Zadeh

Louis-Philippe Morency

310

166

07 Sep 2022

Efficient Methods for Natural Language Processing: A SurveyTransactions of the Association for Computational Linguistics (TACL), 2022

Marcos Vinícius Treviso

...

Niranjan Balasubramanian

Leon Derczynski

Iryna Gurevych

Roy Schwartz

373

141

31 Aug 2022

A Circular Window-based Cascade Transformer for Online Action Detection

192

30 Aug 2022

Improving Small Molecule Generation using Mutual Information Machine

278

18 Aug 2022

Efficient Multimodal Transformer with Dual-Level Feature Restoration for Robust Multimodal Sentiment AnalysisIEEE Transactions on Affective Computing (IEEE TAC), 2022

261

109

16 Aug 2022

Teacher Guided Training: An Efficient Framework for Knowledge TransferInternational Conference on Learning Representations (ICLR), 2022

163

14 Aug 2022

Learning to Generalize with Object-centric Agents in the Open World Survival Game CrafterIEEE Transactions on Games (IEEE Trans. Games), 2022

255

05 Aug 2022

COPER: Continuous Patient State Perceiver

258

05 Aug 2022

Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and ExplorationsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022

Xufeng Zhao

C. Weber

Muhammad Burhan Hafez

S. Wermter

179

04 Aug 2022

CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud LearningIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022

Nassir Navab

235

31 Jul 2022

UAVM: Towards Unifying Audio and Visual ModelsIEEE Signal Processing Letters (SPL), 2022

299

29 Jul 2022

Depth Field Networks for Generalizable Multi-view Scene RepresentationEuropean Conference on Computer Vision (ECCV), 2022

Vitor Campagnolo Guizilini

Igor Vasiljevic

Adrien Gaidon

187

28 Jul 2022

Temporal and cross-modal attention for audio-visual zero-shot learningEuropean Conference on Computer Vision (ECCV), 2022

Otniel-Bogdan Mercea

Thomas Hummel

A. Sophia Koepke

Zeynep Akata

193

20 Jul 2022

Residual and Attentional Architectures for Vector-Symbols

W. Olin-Ammentorp

153

18 Jul 2022

u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled ModalityNeural Information Processing Systems (NeurIPS), 2022

Wei-Ning Hsu

Bowen Shi

SSL VLM

319

14 Jul 2022

Transformer-based Context Condensation for Boosting Feature Pyramids in Object DetectionInternational Journal of Computer Vision (IJCV), 2022

Jing Zhang

220

14 Jul 2022

MM-ALT: A Multimodal Automatic Lyric Transcription SystemACM Multimedia (ACM MM), 2022

215

13 Jul 2022