ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.03206
  4. Cited By
Perceiver: General Perception with Iterative Attention
v1v2 (latest)

Perceiver: General Perception with Iterative Attention

International Conference on Machine Learning (ICML), 2021
4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
    VLMViTMDE
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "Perceiver: General Perception with Iterative Attention"

50 / 782 papers shown
Title
DeGMix: Efficient Multi-Task Dense Prediction with Deformable and Gating Mixer
DeGMix: Efficient Multi-Task Dense Prediction with Deformable and Gating Mixer
Yangyang Xu
Jianlong Wu
Bernard Ghanemm
Guang Dai
Du Bo
Dacheng Tao
198
1
0
10 Aug 2023
Towards Generalist Foundation Model for Radiology by Leveraging
  Web-scale 2D&3D Medical Data
Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data
Chaoyi Wu
Xiaoman Zhang
Ya Zhang
Yanfeng Wang
Weidi Xie
MedImLM&MA
369
210
0
04 Aug 2023
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive
  Vision-Language Models
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Anas Awadalla
Irena Gao
Josh Gardner
Jack Hessel
Yusuf Hanafy
...
Simon Kornblith
Pang Wei Koh
Gabriel Ilharco
Mitchell Wortsman
Ludwig Schmidt
MLLM
320
525
0
02 Aug 2023
From Sparse to Soft Mixtures of Experts
From Sparse to Soft Mixtures of ExpertsInternational Conference on Learning Representations (ICLR), 2023
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
375
185
0
02 Aug 2023
Monaural Multi-Speaker Speech Separation Using Efficient Transformer
  Model
Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model
Sankalpa Rijal
Rajan Neupane
Saroj Prasad Mainali
Shishir K. Regmi
Shanta Maharjan
181
0
0
29 Jul 2023
Towards Generalist Biomedical AI
Towards Generalist Biomedical AI
Tao Tu
Shekoofeh Azizi
Danny Driess
M. Schaekermann
Mohamed Amin
...
Yossi Matias
K. Singhal
Peter R. Florence
Alan Karthikesalingam
Vivek Natarajan
LM&MAMedImAI4MH
230
387
0
26 Jul 2023
OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured
  Traffic Scenarios
OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured Traffic Scenarios
Aditya Nalgunda Ganesh
Dhruval Pobbathi Badrinath
Harshith Mohan Kumar
S.Sony Priya
Surabhi Narayan
ViT
119
3
0
20 Jul 2023
Does Visual Pretraining Help End-to-End Reasoning?
Does Visual Pretraining Help End-to-End Reasoning?Neural Information Processing Systems (NeurIPS), 2023
Chen Sun
Calvin Luo
Xingyi Zhou
Anurag Arnab
Cordelia Schmid
OCLLRMViT
254
4
0
17 Jul 2023
Transformers are Universal Predictors
Transformers are Universal Predictors
Sourya Basu
Moulik Choraria
Lav Varshney
114
6
0
15 Jul 2023
Dual-Query Multiple Instance Learning for Dynamic Meta-Embedding based
  Tumor Classification
Dual-Query Multiple Instance Learning for Dynamic Meta-Embedding based Tumor ClassificationBritish Machine Vision Conference (BMVC), 2023
Simon Holdenried-Krafft
Peter Somers
Ivonne A. Montes-Majarro
Diana Silimon
Cristina Tarín
F. Fend
Hendrik P. A. Lensch
MedIm
259
3
0
14 Jul 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action
  Recognition
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action RecognitionIEEE International Conference on Computer Vision (ICCV), 2023
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
Fahad Shahbaz Khan
ViT
216
27
0
13 Jul 2023
PolyLM: An Open Source Polyglot Large Language Model
PolyLM: An Open Source Polyglot Large Language Model
Xiangpeng Wei
Hao-Ran Wei
Huan Lin
Tianhao Li
Pei Zhang
...
Yu Bowen
Dayiheng Liu
Baosong Yang
Fei Huang
Jun Xie
LRM
198
70
0
12 Jul 2023
One-Versus-Others Attention: Scalable Multimodal Integration for
  Clinical Data
One-Versus-Others Attention: Scalable Multimodal Integration for Clinical DataPacific Symposium on Biocomputing. Pacific Symposium on Biocomputing (PSB), 2023
Michal Golovanevsky
Eva Schiller
Akira Nair
Ritambhara Singh
Carsten Eickhoff
231
7
0
11 Jul 2023
LongNet: Scaling Transformers to 1,000,000,000 Tokens
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Jiayu Ding
Shuming Ma
Li Dong
Xingxing Zhang
Shaohan Huang
Wenhui Wang
Nanning Zheng
Furu Wei
CLL
405
216
0
05 Jul 2023
What Matters in Training a GPT4-Style Language Model with Multimodal
  Inputs?
What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Yan Zeng
Hanbo Zhang
Jiani Zheng
Jiangnan Xia
Guoqiang Wei
Yang Wei
Yuchen Zhang
Tao Kong
MLLM
230
88
0
05 Jul 2023
Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation
Act3D: 3D Feature Field Transformers for Multi-Task Robotic ManipulationConference on Robot Learning (CoRL), 2023
Théophile Gervet
Zhou Xian
N. Gkanatsios
Katerina Fragkiadaki
279
120
0
30 Jun 2023
An Efficient General-Purpose Modular Vision Model via Multi-Task
  Heterogeneous Training
An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training
Z. Chen
Mingyu Ding
Songlin Yang
Wei Zhan
Masayoshi Tomizuka
Erik Learned-Miller
Chuang Gan
MoE
103
8
0
29 Jun 2023
Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text
  Aligned Latent Representation
Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent RepresentationNeural Information Processing Systems (NeurIPS), 2023
Zibo Zhao
Wen Liu
Xin Chen
Xi Zeng
Rui Wang
Pei Cheng
Bin-Bin Fu
Tao Chen
Gang Yu
Shenghua Gao
DiffM
257
162
0
29 Jun 2023
Semi-supervised Multimodal Representation Learning through a Global
  Workspace
Semi-supervised Multimodal Representation Learning through a Global WorkspaceIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023
Benjamin Devillers
Léopold Maytié
R. V. Rullen
SSL
118
10
0
27 Jun 2023
BatchGFN: Generative Flow Networks for Batch Active Learning
BatchGFN: Generative Flow Networks for Batch Active Learning
Shreshth A. Malik
Salem Lahlou
Andrew Jesson
Moksh Jain
Nikolay Malkin
T. Deleu
Yoshua Bengio
Y. Gal
AI4CE
121
4
0
26 Jun 2023
RVT: Robotic View Transformer for 3D Object Manipulation
RVT: Robotic View Transformer for 3D Object ManipulationConference on Robot Learning (CoRL), 2023
Ankit Goyal
Jie Xu
Yijie Guo
Valts Blukis
Yu-Wei Chao
Dieter Fox
LM&Ro
291
213
0
26 Jun 2023
AR2-D2:Training a Robot Without a Robot
AR2-D2:Training a Robot Without a RobotConference on Robot Learning (CoRL), 2023
Jiafei Duan
Yi Ru Wang
Mohit Shridhar
Dieter Fox
Ranjay Krishna
194
42
0
23 Jun 2023
ProRes: Exploring Degradation-aware Visual Prompt for Universal Image
  Restoration
ProRes: Exploring Degradation-aware Visual Prompt for Universal Image Restoration
Jiaqi Ma
Tianheng Cheng
Guoli Wang
Qian Zhang
Xinggang Wang
Guang Dai
DiffMVLM
168
66
0
23 Jun 2023
LightGlue: Local Feature Matching at Light Speed
LightGlue: Local Feature Matching at Light SpeedIEEE International Conference on Computer Vision (ICCV), 2023
Philipp Lindenberger
Paul-Edouard Sarlin
Marc Pollefeys
3DVVLM
330
702
0
23 Jun 2023
Learning Unseen Modality Interaction
Learning Unseen Modality InteractionNeural Information Processing Systems (NeurIPS), 2023
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
223
12
0
22 Jun 2023
Constant Memory Attention Block
Constant Memory Attention Block
Leo Feng
Frederick Tung
Hossein Hajimirsadeghi
Yoshua Bengio
Mohamed Osama Ahmed
210
0
0
21 Jun 2023
Exploring the Role of Audio in Video Captioning
Exploring the Role of Audio in Video Captioning
Yuhan Shen
Linjie Yang
Longyin Wen
Haichao Yu
Ehsan Elhamifar
Heng Wang
140
5
0
21 Jun 2023
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text
  Documents
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text DocumentsNeural Information Processing Systems (NeurIPS), 2023
Hugo Laurenccon
Lucile Saulnier
Léo Tronchon
Stas Bekman
Amanpreet Singh
...
Siddharth Karamcheti
Alexander M. Rush
Douwe Kiela
Matthieu Cord
Victor Sanh
287
313
0
21 Jun 2023
Dynamic Perceiver for Efficient Visual Recognition
Dynamic Perceiver for Efficient Visual RecognitionIEEE International Conference on Computer Vision (ICCV), 2023
Yizeng Han
Dongchen Han
Zeyu Liu
Yulin Wang
Xuran Pan
Yifan Pu
Chaorui Deng
Junlan Feng
Qing Xiao
Gao Huang
249
40
0
20 Jun 2023
Sparse Modular Activation for Efficient Sequence Modeling
Sparse Modular Activation for Efficient Sequence ModelingNeural Information Processing Systems (NeurIPS), 2023
Liliang Ren
Yang Liu
Shuohang Wang
Yichong Xu
Chenguang Zhu
Chengxiang Zhai
207
17
0
19 Jun 2023
Multitrack Music Transcription with a Time-Frequency Perceiver
Multitrack Music Transcription with a Time-Frequency PerceiverIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Weiyi Lu
Ju-Chiang Wang
Yun-Ning Hung
ViTAI4TS
127
29
0
19 Jun 2023
RedMotion: Motion Prediction via Redundancy Reduction
RedMotion: Motion Prediction via Redundancy Reduction
Royden Wagner
Omer Sahin Tas
Marvin Klemp
Carlos Fernandez Lopez
Christoph Stiller
592
10
0
19 Jun 2023
The Big Data Myth: Using Diffusion Models for Dataset Generation to
  Train Deep Detection Models
The Big Data Myth: Using Diffusion Models for Dataset Generation to Train Deep Detection Models
Roy Voetman
Maya Aghaei
K. Dijkstra
DiffM
154
14
0
16 Jun 2023
FedMultimodal: A Benchmark For Multimodal Federated Learning
FedMultimodal: A Benchmark For Multimodal Federated LearningKnowledge Discovery and Data Mining (KDD), 2023
Tiantian Feng
Digbalay Bose
Tuo Zhang
Rajat Hebbar
Anil Ramakrishna
Rahul Gupta
Mi Zhang
Salman Avestimehr
Shrikanth Narayanan
307
89
0
15 Jun 2023
High-performance deep spiking neural networks with 0.3 spikes per neuron
High-performance deep spiking neural networks with 0.3 spikes per neuronNature Communications (Nat. Commun.), 2023
A. Stanojević
Stanislaw Wo'zniak
G. Bellec
G. Cherubini
A. Pantazi
W. Gerstner
268
46
0
14 Jun 2023
A Survey of Vision-Language Pre-training from the Lens of Multimodal
  Machine Translation
A Survey of Vision-Language Pre-training from the Lens of Multimodal Machine Translation
Jeremy Gwinnup
Kevin Duh
VLM
108
7
0
12 Jun 2023
Learning Probabilistic Symmetrization for Architecture Agnostic
  Equivariance
Learning Probabilistic Symmetrization for Architecture Agnostic EquivarianceNeural Information Processing Systems (NeurIPS), 2023
Jinwoo Kim
Tien Dat Nguyen
Ayhan Suleymanzade
Hyeokjun An
Seunghoon Hong
322
27
0
05 Jun 2023
Transformer-Based UNet with Multi-Headed Cross-Attention Skip
  Connections to Eliminate Artifacts in Scanned Documents
Transformer-Based UNet with Multi-Headed Cross-Attention Skip Connections to Eliminate Artifacts in Scanned Documents
David Kreuzer
M. Munz
ViTMedIm
130
0
0
05 Jun 2023
Systematic Visual Reasoning through Object-Centric Relational
  Abstraction
Systematic Visual Reasoning through Object-Centric Relational AbstractionNeural Information Processing Systems (NeurIPS), 2023
Taylor Webb
S. S. Mondal
Jonathan D. Cohen
OCL
343
27
0
04 Jun 2023
A Transformer-based representation-learning model with unified
  processing of multimodal input for clinical diagnostics
A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnosticsNature Biomedical Engineering (Nat. Biomed. Eng.), 2023
Hong-Yu Zhou
Yizhou Yu
Chengdi Wang
Shu Zhen Zhang
Yuanxu Gao
Jia Pan
Jun Shao
Guangming Lu
Kang Zhang
Weimin Li
MedIm
169
242
0
01 Jun 2023
Bytes Are All You Need: Transformers Operating Directly On File Bytes
Bytes Are All You Need: Transformers Operating Directly On File Bytes
Maxwell Horton
Sachin Mehta
Ali Farhadi
Mohammad Rastegari
VLM
184
11
0
31 May 2023
Joint Adaptive Representations for Image-Language Learning
Joint Adaptive Representations for Image-Language Learning
A. Piergiovanni
A. Angelova
VLM
231
0
0
31 May 2023
Gemtelligence: Accelerating Gemstone classification with Deep Learning
Gemtelligence: Accelerating Gemstone classification with Deep LearningCommunications Engineer (CE), 2023
Tommaso Bendinelli
Luca Biggio
D. Nyfeler
Abhigyan Ghosh
P. Tollan
M. Kirschmann
Olga Fink
108
6
0
31 May 2023
Blockwise Parallel Transformer for Large Context Models
Blockwise Parallel Transformer for Large Context Models
Hao Liu
Pieter Abbeel
237
13
0
30 May 2023
NetHack is Hard to Hack
NetHack is Hard to HackNeural Information Processing Systems (NeurIPS), 2023
Ulyana Piterbarg
Lerrel Pinto
Rob Fergus
191
8
0
30 May 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive
  Transformers
Dynamic Context Pruning for Efficient and Interpretable Autoregressive TransformersNeural Information Processing Systems (NeurIPS), 2023
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
306
66
0
25 May 2023
Lucy-SKG: Learning to Play Rocket League Efficiently Using Deep
  Reinforcement Learning
Lucy-SKG: Learning to Play Rocket League Efficiently Using Deep Reinforcement Learning
V. Moschopoulos
Pantelis Kyriakidis
A. Lazaridis
I. Vlahavas
69
1
0
25 May 2023
Concept-Centric Transformers: Enhancing Model Interpretability through
  Object-Centric Concept Learning within a Shared Global Workspace
Concept-Centric Transformers: Enhancing Model Interpretability through Object-Centric Concept Learning within a Shared Global WorkspaceIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Jinyung Hong
Keun Hee Park
Theodore P. Pavlic
244
9
0
25 May 2023
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental
  Algorithm for Referring Expression Generation from Examples
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from ExamplesConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
P. Sadler
David Schlangen
128
3
0
24 May 2023
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT OperatorAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Ziwei He
Meng Yang
Minwei Feng
Jingcheng Yin
Xiang Wang
Jingwen Leng
Zhouhan Lin
ViT
267
19
0
24 May 2023
Previous
123...91011...141516
Next