Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2103.03206
Cited By
v1
v2 (latest)
Perceiver: General Perception with Iterative Attention
International Conference on Machine Learning (ICML), 2021
4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"Perceiver: General Perception with Iterative Attention"
50 / 792 papers shown
VOLoc: Visual Place Recognition by Querying Compressed Lidar Map
Xudong Cai
Yongcai Wang
zhe Huang
Yu Shao
Deying Li
201
7
0
25 Feb 2024
Multimodal Transformer With a Low-Computational-Cost Guarantee
Sungjin Park
Edward Choi
168
2
0
23 Feb 2024
Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning
Johnathan Xie
Yoonho Lee
Annie S. Chen
Chelsea Finn
183
4
0
22 Feb 2024
Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
Fabien Baradel
M. Armando
Salma Galaaoui
Romain Brégier
Philippe Weinzaepfel
Grégory Rogez
Thomas Lucas
3DH
268
65
0
22 Feb 2024
Semantic Image Synthesis with Unconditional Generator
Jungwoo Chae
Hyunin Cho
Sooyeon Go
Kyungmook Choi
Youngjung Uh
268
6
0
22 Feb 2024
PQA: Zero-shot Protein Question Answering for Free-form Scientific Enquiry with Large Language Models
Eli M. Carrami
Sahand Sharifzadeh
144
2
0
21 Feb 2024
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Chien-Yao Wang
I-Hau Yeh
Hongpeng Liao
426
2,984
0
21 Feb 2024
User-LLM: Efficient LLM Contextualization with User Embeddings
Lin Ning
Luyang Liu
Jiaxing Wu
Neo Wu
D. Berlowitz
Sushant Prakash
Bradley Green
S. O’Banion
Jun Xie
273
72
0
21 Feb 2024
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni
Federico Cocchi
Luca Barsellotti
Nicholas Moratelli
Sara Sarto
Lorenzo Baraldi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
LRM
VLM
359
124
0
19 Feb 2024
Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers
Markus Hiller
Krista A. Ehinger
Tom Drummond
329
8
0
19 Feb 2024
Universal Physics Transformers: A Framework For Efficiently Scaling Neural Operators
Benedikt Alkin
Andreas Fürst
Simon Schmid
Lukas Gruber
Markus Holzleitner
Johannes Brandstetter
PINN
AI4CE
596
18
0
19 Feb 2024
Semantically-aware Neural Radiance Fields for Visual Scene Understanding: A Comprehensive Review
Thang-Anh-Quan Nguyen
Amine Bourki
Mátyás Macudzinski
Anthony Brunel
M. Bennamoun
366
16
0
17 Feb 2024
3D Diffuser Actor: Policy Diffusion with 3D Scene Representations
Tsung-Wei Ke
N. Gkanatsios
Katerina Fragkiadaki
VGen
403
244
0
16 Feb 2024
Are Semi-Dense Detector-Free Methods Good at Matching Local Features?
Matthieu Vilain
Rémi Giraud
Hugo Germain
Guillaume Bourmaud
332
2
0
13 Feb 2024
Offline Actor-Critic Reinforcement Learning Scales to Large Models
Jost Tobias Springenberg
A. Abdolmaleki
Jingwei Zhang
Oliver Groth
Michael Bloesch
...
Sarah Bechtle
Steven Kapturowski
Agrim Gupta
N. Heess
Martin Riedmiller
OffRL
LRM
220
35
0
08 Feb 2024
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu
Jaehong Yoon
Mohit Bansal
516
16
0
08 Feb 2024
Tuning In: Analysis of Audio Classifier Performance in Clinical Settings with Limited Data
Hamza Mahdi
Eptehal Nashnoush
Rami Saab
Arjun Balachandar
Rishit Dagli
Lucas X. Perri
H. Khosravani
377
1
0
07 Feb 2024
Positional Encoding Helps Recurrent Neural Networks Handle a Large Vocabulary
Takashi Morita
474
8
0
31 Jan 2024
Topology-Aware Latent Diffusion for 3D Shape Generation
Jiangbei Hu
Ben Fei
Baixin Xu
Fei Hou
Weidong Yang
Shengfa Wang
Na Lei
Chen Qian
Ying He
224
9
0
31 Jan 2024
Triple Disentangled Representation Learning for Multimodal Affective Analysis
Information Fusion (Inf. Fusion), 2024
Ying Zhou
Xuefeng Liang
Han Chen
Yin Zhao
Xin Chen
Lida Yu
235
15
0
29 Jan 2024
On the generalization capacity of neural networks during generic multimodal reasoning
International Conference on Learning Representations (ICLR), 2024
Takuya Ito
Soham Dan
Mattia Rigotti
James Kozloski
Murray Campbell
LRM
252
4
0
26 Jan 2024
Jump Cut Smoothing for Talking Heads
Xiaojuan Wang
Taesung Park
Yang Zhou
Eli Shechtman
Richard Zhang
VGen
212
1
0
09 Jan 2024
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
International Journal of Computer Vision (IJCV), 2024
Zhi-Song Liu
Robin Courant
Vicky Kalogeiton
346
9
0
08 Jan 2024
Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
286
7
0
08 Jan 2024
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
165
5
0
08 Jan 2024
PIXAR: Auto-Regressive Language Modeling in Pixel Space
Yintao Tai
Xiyang Liao
Alessandro Suglia
Antonio Vergari
MLLM
357
14
0
06 Jan 2024
CaMML: Context-Aware Multimodal Learner for Large Models
Yixin Chen
Shuai Zhang
Boran Han
Tong He
Bo Li
VLM
312
6
0
06 Jan 2024
Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues
David Gimeno-Gómez
Ana-Maria Bucur
Adrian Cosma
Carlos David Martínez Hinarejos
Paolo Rosso
232
25
0
05 Jan 2024
AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided Diagnosis
IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024
Qiuhui Chen
Yi Hong
MedIm
428
8
0
02 Jan 2024
Saliency-Aware Regularized Graph Neural Network
Artificial Intelligence (AI), 2024
Wenjie Pei
Weina Xu
Zongze Wu
Weichao Li
Jinfan Wang
Guangming Lu
Xiangrong Wang
157
10
0
01 Jan 2024
SVFAP: Self-supervised Video Facial Affect Perceiver
IEEE Transactions on Affective Computing (TAC), 2023
Guoying Zhao
Zheng Lian
Kexin Wang
Yu He
Ming Xu
Haiyang Sun
Yinan Han
Jianhua Tao
190
26
0
31 Dec 2023
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Xiangxiang Chu
Limeng Qiao
Xinyang Lin
Shuang Xu
Yang Yang
...
Fei Wei
Xinyu Zhang
Bo Zhang
Xiaolin Wei
Chunhua Shen
MLLM
318
72
0
28 Dec 2023
Deformable Audio Transformer for Audio Event Detection
Wentao Zhu
162
0
0
24 Dec 2023
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation
Hongtao Wu
Ya Jing
Chi-Hou Cheang
Guangzeng Chen
Jiafeng Xu
Xinghang Li
Minghuan Liu
Hang Li
Tao Kong
468
240
0
20 Dec 2023
Inducing Point Operator Transformer: A Flexible and Scalable Architecture for Solving PDEs
Seungjun Lee
Taeil Oh
263
17
0
18 Dec 2023
Reconstruction of Fields from Sparse Sensing: Differentiable Sensor Placement Enhances Generalization
Agnese Marcato
Dan O’Malley
Hari S. Viswanathan
E. Guiltinan
Javier E. Santos
105
2
0
14 Dec 2023
Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers
Computer Vision and Pattern Recognition (CVPR), 2023
Zi-Xin Zou
Zhipeng Yu
Yuanchen Guo
Yangguang Li
Ding Liang
Yan-Pei Cao
Song-Hai Zhang
3DGS
379
271
0
14 Dec 2023
A Foundational Multimodal Vision Language AI Assistant for Human Pathology
Ming Y. Lu
Bowen Chen
Drew F. K. Williamson
Richard J. Chen
Kenji Ikamura
...
Ivy Liang
L. Le
Tong Ding
Anil V. Parwani
Faisal Mahmood
MedIm
LM&MA
211
30
0
13 Dec 2023
NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image
European Conference on Computer Vision (ECCV), 2023
Yoonwoo Jeong
Jinwoo Lee
Chiheon Kim
Minsu Cho
Doyup Lee
171
9
0
12 Dec 2023
DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors
Federico Landini
Mireia Díez
Themos Stafylakis
Lukávs Burget
372
20
0
07 Dec 2023
UPOCR: Towards Unified Pixel-Level OCR Interface
International Conference on Machine Learning (ICML), 2023
Dezhi Peng
Zhenhua Yang
Jiaxin Zhang
Chongyu Liu
Yongxin Shi
Kai Ding
Fengjun Guo
Lianwen Jin
352
13
0
05 Dec 2023
Learning to Compose SuperWeights for Neural Parameter Allocation Search
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Piotr Teterwak
Soren Nelson
Nikoli Dryden
D. Bashkirova
Kate Saenko
Bryan A. Plummer
287
3
0
03 Dec 2023
Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models
Andrés Villa
Juan Carlos León Alcázar
Alvaro Soto
Bernard Ghanem
MLLM
VLM
292
19
0
03 Dec 2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning
Artemis Panagopoulou
Le Xue
Ning Yu
Junnan Li
Dongxu Li
Shafiq Joty
Ran Xu
Silvio Savarese
Caiming Xiong
Juan Carlos Niebles
VLM
MLLM
282
71
0
30 Nov 2023
GeoDeformer: Geometric Deformable Transformer for Action Recognition
Jinhui Ye
Jiaming Zhou
Hui Xiong
Junwei Liang
ViT
115
1
0
29 Nov 2023
ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
IEEE transactions on multimedia (IEEE TMM), 2023
Fukun Yin
Xin Chen
C. Zhang
Biao Jiang
Zibo Zhao
Jiayuan Fan
Gang Yu
Taihao Li
Tao Chen
467
41
0
29 Nov 2023
Contrastive Vision-Language Alignment Makes Efficient Instruction Learner
Lizhao Liu
Xinyu Sun
Tianhang Xiang
Zhuangwei Zhuang
Liuren Yin
Mingkui Tan
VLM
182
4
0
29 Nov 2023
ViT-Lens: Towards Omni-modal Representations
Computer Vision and Pattern Recognition (CVPR), 2023
Weixian Lei
Yixiao Ge
Kun Yi
Jianfeng Zhang
Difei Gao
Dylan Sun
Yuying Ge
Ying Shan
Mike Zheng Shou
208
32
0
27 Nov 2023
Unlearning via Sparse Representations
Vedant Shah
Frederik Trauble
Ashish Malik
Hugo Larochelle
Michael C. Mozer
Sanjeev Arora
Yoshua Bengio
Anirudh Goyal
MU
274
9
0
26 Nov 2023
Looped Transformers are Better at Learning Learning Algorithms
International Conference on Learning Representations (ICLR), 2023
Liu Yang
Kangwook Lee
Robert D. Nowak
Dimitris Papailiopoulos
460
55
0
21 Nov 2023
Previous
1
2
3
...
7
8
9
...
14
15
16
Next
Page 8 of 16
Page
of 16
Go