Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2103.03206
Cited By
v1
v2 (latest)
Perceiver: General Perception with Iterative Attention
International Conference on Machine Learning (ICML), 2021
4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"Perceiver: General Perception with Iterative Attention"
50 / 783 papers shown
Title
VideoGameBunny: Towards vision assistants for video games
Mohammad Reza Taesiri
Cor-Paul Bezemer
VLM
MLLM
181
7
0
21 Jul 2024
Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation
Yu Zhang
Ruijie Yu
Kaipeng Zeng
Ding Li
Feng Zhu
Yunbo Wang
Yaohui Jin
Yanyan Xu
145
1
0
21 Jul 2024
Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning
Chen Shen
Chunfeng Lian
Wanqing Zhang
Fan Wang
Jianhua Zhang
...
Hongshu Mu
Hao Wu
Xinggong Liang
Jianhua Ma
Zhenyuan Wang
167
5
0
20 Jul 2024
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
S. Swetha
Jinyu Yang
T. Neiman
Mamshad Nayeem Rizve
Son Tran
Benjamin Z. Yao
Trishul Chilimbi
Mubarak Shah
236
9
0
18 Jul 2024
Audio-visual Generalized Zero-shot Learning the Easy Way
Shentong Mo
Pedro Morgado
207
7
0
18 Jul 2024
MetaSumPerceiver: Multimodal Multi-Document Evidence Summarization for Fact-Checking
Ting-Chih Chen
Chia-Wei Tang
Chris Thomas
222
10
0
18 Jul 2024
IoT-LM: Large Multisensory Language Models for the Internet of Things
Shentong Mo
Russ Salakhutdinov
Louis-Philippe Morency
Paul Pu Liang
MLLM
140
19
0
13 Jul 2024
Paving the way toward foundation models for irregular and unaligned Satellite Image Time Series
Iris Dumeur
Silvia Valero
Jordi Inglada
235
7
0
11 Jul 2024
YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation
Sungkyun Chang
Emmanouil Benetos
Holger Kirchhoff
Simon Dixon
254
8
0
05 Jul 2024
LaRa: Efficient Large-Baseline Radiance Fields
Anpei Chen
Haofei Xu
Stefano Esposito
Siyu Tang
Andreas Geiger
AI4CE
321
48
0
05 Jul 2024
ADAPT: Multimodal Learning for Detecting Physiological Changes under Missing Modalities
Julie Mordacq
Léo Milecki
Maria Vakalopoulou
Steve Oudot
Vicky Kalogeiton
OffRL
MedIm
133
7
0
04 Jul 2024
Multi-State-Action Tokenisation in Decision Transformers for Multi-Discrete Action Spaces
Perusha Moodley
Pramod S. Kaushik
Dhillu Thambi
Mark Trovinger
Praveen Paruchuri
Xia Hong
Benjamin Rosman
278
0
0
01 Jul 2024
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
William Berman
A. Peysakhovich
217
5
0
26 Jun 2024
MatchTime: Towards Automatic Soccer Game Commentary Generation
Jiayuan Rao
Haoning Wu
Chang-rui Liu
Yanfeng Wang
Weidi Xie
213
25
0
26 Jun 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
...
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
3DV
MLLM
311
603
0
24 Jun 2024
GeoMFormer: A General Architecture for Geometric Molecular Representation Learning
Tianlang Chen
Shengjie Luo
Di He
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
AI4CE
194
9
0
24 Jun 2024
Towards Zero-Shot Text-To-Speech for Arabic Dialects
Khai Duy Doan
Abdul Waheed
Muhammad Abdul-Mageed
278
2
0
24 Jun 2024
Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models
Xicheng Zhang
Fuchun Sun
Bin Zhao
Junchi Yan
Xiu Li
Xuelong Li
OffRL
201
10
0
22 Jun 2024
Learning Efficient and Robust Language-conditioned Manipulation using Textual-Visual Relevancy and Equivariant Language Mapping
Mingxi Jia
Haojie Huang
Zhewen Zhang
Chenghao Wang
Linfeng Zhao
Dian Wang
J. Liu
Robin Walters
Robert Platt
Stefanie Tellex
LM&Ro
335
6
0
21 Jun 2024
Multi-modal Transfer Learning between Biological Foundation Models
Juan Jose Garau-Luis
Patrick Bordes
Liam Gonzalez
Masa Roller
Bernardo P. de Almeida
...
Stefan Laurent
Jan Grzegorzewski
Maren Lang
Thomas Pierrot
Guillaume Richard
AI4CE
257
9
0
20 Jun 2024
In-Context In-Context Learning with Transformer Neural Processes
Symposium on Advances in Approximate Bayesian Inference (AABI), 2024
Matthew Ashman
Cristiana-Diana Diaconu
Adrian Weller
Richard E. Turner
170
4
0
19 Jun 2024
Approximately Equivariant Neural Processes
Neural Information Processing Systems (NeurIPS), 2024
Matthew Ashman
Cristiana-Diana Diaconu
Adrian Weller
W. Bruinsma
Richard E. Turner
BDL
166
4
0
19 Jun 2024
Recurrence over Video Frames (RoVF) for the Re-identification of Meerkats
Mitchell Rogers
Kobe Knowles
Gaël Gendron
Shahrokh Heidari
David Arturo Soriano Valdez
Mihailo Azhar
Padriac O'Leary
Simon Eyre
Michael Witbrock
Patrice Delmas
121
2
0
18 Jun 2024
Translation Equivariant Transformer Neural Processes
Matthew Ashman
Cristiana-Diana Diaconu
Junhyuck Kim
Lakee Sivaraya
Stratis Markou
James Requeima
W. Bruinsma
Richard E. Turner
215
8
0
18 Jun 2024
Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers
Omer Sahin Tas
Royden Wagner
353
1
0
17 Jun 2024
Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
Interspeech (Interspeech), 2024
Yicong Jiang
Tianzi Wang
Xurong Xie
Juan Liu
Wei Sun
Nan Yan
Hui Chen
Lan Wang
Xunying Liu
Feng Tian
121
7
0
14 Jun 2024
Contrastive Imitation Learning for Language-guided Multi-Task Robotic Manipulation
Conference on Robot Learning (CoRL), 2024
Teli Ma
Jiaming Zhou
Zifan Wang
Ronghe Qiu
Junwei Liang
200
17
0
14 Jun 2024
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing
Neha Sahipjohn
Ashishkumar Gudmalwar
Nirmesh Shah
Pankaj Wasnik
R. Shah
237
12
0
13 Jun 2024
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Yi-Fan Zhang
Qingsong Wen
Chaoyou Fu
Xue Wang
Zhang Zhang
Liwen Wang
Rong Jin
209
64
0
12 Jun 2024
Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities
Sai Munikoti
Ian Stewart
Sameera Horawalavithana
Henry Kvinge
Tegan H. Emerson
Sandra E Thompson
Karl Pazdernik
199
4
0
08 Jun 2024
CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling
International Conference on Learning Representations (ICLR), 2024
Matthew Fortier
Mats L. Richter
O. Sonnentag
Chris Pal
AI4CE
213
2
0
07 Jun 2024
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
Neural Information Processing Systems (NeurIPS), 2024
Lingchen Meng
Jianwei Yang
Rui Tian
Xiyang Dai
Zuxuan Wu
Jianfeng Gao
Yu-Gang Jiang
VLM
211
22
0
06 Jun 2024
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Jinlong Xue
Yayue Deng
Yicheng Han
Yingming Gao
Ya Li
219
6
0
06 Jun 2024
Item-Language Model for Conversational Recommendation
Li Yang
Anushya Subbiah
Hardik Patel
Judith Yue Li
Yanwei Song
Reza Mirghaderi
Vikram Aggarwal
Qifan Wang
KELM
178
9
0
05 Jun 2024
Diffusion Features to Bridge Domain Gap for Semantic Segmentation
Yuxiang Ji
Boyong He
Chenyuan Qu
Zhuoyue Tan
Chuan Qin
Liaoni Wu
333
5
0
02 Jun 2024
Direct Cardiac Segmentation from Undersampled K-space Using Transformers
Yundi Zhang
Nil Stolt Ansó
Jiazhen Pan
Wenqi Huang
Kerstin Hammernik
Daniel Rueckert
MedIm
194
4
0
31 May 2024
Evaluating Vision-Language Models on Bistable Images
Artemis Panagopoulou
Coby Melkin
Chris Callison-Burch
149
2
0
29 May 2024
VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers
Jun Zheng
Fuwei Zhao
Youjiang Xu
Xin Dong
Xiaodan Liang
VGen
DiffM
216
9
0
28 May 2024
Lateralization MLP: A Simple Brain-inspired Architecture for Diffusion
Zizhao Hu
Mohammad Rostami
195
0
0
25 May 2024
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
Yue Zhang
Hehe Fan
Yi Yang
235
6
0
24 May 2024
Planted: a dataset for planted forest identification from multi-satellite time series
L. M. Pazos-Outón
Cristina Nader Vasconcelos
Anton Raichuk
Anurag Arnab
Dan Morris
Maxim Neumann
157
8
0
24 May 2024
Direct3D: Scalable Image-to-3D Generation via 3D Latent Diffusion Transformer
Neural Information Processing Systems (NeurIPS), 2024
Shuang Wu
Youtian Lin
Feihu Zhang
Yifei Zeng
Jingxi Xu
Juil Sock
Xun Cao
Yao Yao
225
136
0
23 May 2024
Transformers for Image-Goal Navigation
Nikhilanj Pelluri
ViT
269
2
0
23 May 2024
Attention as an RNN
Leo Feng
Frederick Tung
Hossein Hajimirsadeghi
Mohamed Osama Ahmed
Yoshua Bengio
Greg Mori
GNN
AI4TS
248
16
0
22 May 2024
From CNNs to Transformers in Multimodal Human Action Recognition: A Survey
Muhammad Bilal Shaikh
Syed Mohammed Shamsul Islam
Douglas Chai
Naveed Akhtar
277
26
0
22 May 2024
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction
Conference on Robot Learning (CoRL), 2024
Yunfan Jiang
Chen Wang
Ruohan Zhang
Jiajun Wu
Fei-Fei Li
OnRL
256
57
0
16 May 2024
PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology
Eugene Vorontsov
Adam Casson
Kristen Severson
Eric Zimmermann
Yi Kan Wang
...
Peter Hamilton
William A. Moye
Eugene Vorontsov
Siqi Liu
Thomas J. Fuchs
MedIm
241
61
0
16 May 2024
Cross-sensor self-supervised training and alignment for remote sensing
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE JSTARS), 2024
V. Marsocci
Nicolas Audebert
245
4
0
16 May 2024
A Survey on Transformers in NLP with Focus on Efficiency
Wazib Ansar
Saptarsi Goswami
Amlan Chakrabarti
MedIm
269
11
0
15 May 2024
MedVersa: A Generalist Foundation Model for Medical Image Interpretation
Hong-Yu Zhou
Subathra Adithan
J. N. Acosta
Suvrankar Datta
E. Topol
Pranav Rajpurkar
MedIm
357
29
0
13 May 2024
Previous
1
2
3
...
5
6
7
...
14
15
16
Next