Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2103.03206
Cited By
v1
v2 (latest)
Perceiver: General Perception with Iterative Attention
International Conference on Machine Learning (ICML), 2021
4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"Perceiver: General Perception with Iterative Attention"
50 / 790 papers shown
Title
Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning
Johnathan Xie
Yoonho Lee
Annie S. Chen
Chelsea Finn
155
4
0
22 Feb 2024
Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
Fabien Baradel
M. Armando
Salma Galaaoui
Romain Brégier
Philippe Weinzaepfel
Grégory Rogez
Thomas Lucas
3DH
248
55
0
22 Feb 2024
Semantic Image Synthesis with Unconditional Generator
Jungwoo Chae
Hyunin Cho
Sooyeon Go
Kyungmook Choi
Youngjung Uh
249
5
0
22 Feb 2024
PQA: Zero-shot Protein Question Answering for Free-form Scientific Enquiry with Large Language Models
Eli M. Carrami
Sahand Sharifzadeh
138
2
0
21 Feb 2024
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Chien-Yao Wang
I-Hau Yeh
Hongpeng Liao
404
2,757
0
21 Feb 2024
User-LLM: Efficient LLM Contextualization with User Embeddings
Lin Ning
Luyang Liu
Jiaxing Wu
Neo Wu
D. Berlowitz
Sushant Prakash
Bradley Green
S. O’Banion
Jun Xie
250
64
0
21 Feb 2024
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni
Federico Cocchi
Luca Barsellotti
Nicholas Moratelli
Sara Sarto
Lorenzo Baraldi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
LRM
VLM
336
118
0
19 Feb 2024
Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers
Markus Hiller
Krista A. Ehinger
Tom Drummond
295
7
0
19 Feb 2024
Universal Physics Transformers: A Framework For Efficiently Scaling Neural Operators
Benedikt Alkin
Andreas Fürst
Simon Schmid
Lukas Gruber
Markus Holzleitner
Johannes Brandstetter
PINN
AI4CE
545
18
0
19 Feb 2024
Semantically-aware Neural Radiance Fields for Visual Scene Understanding: A Comprehensive Review
Thang-Anh-Quan Nguyen
Amine Bourki
Mátyás Macudzinski
Anthony Brunel
M. Bennamoun
342
16
0
17 Feb 2024
3D Diffuser Actor: Policy Diffusion with 3D Scene Representations
Tsung-Wei Ke
N. Gkanatsios
Katerina Fragkiadaki
VGen
337
232
0
16 Feb 2024
Are Semi-Dense Detector-Free Methods Good at Matching Local Features?
Matthieu Vilain
Rémi Giraud
Hugo Germain
Guillaume Bourmaud
291
2
0
13 Feb 2024
Offline Actor-Critic Reinforcement Learning Scales to Large Models
Jost Tobias Springenberg
A. Abdolmaleki
Jingwei Zhang
Oliver Groth
Michael Bloesch
...
Sarah Bechtle
Steven Kapturowski
Agrim Gupta
N. Heess
Martin Riedmiller
OffRL
LRM
197
33
0
08 Feb 2024
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu
Jaehong Yoon
Mohit Bansal
476
15
0
08 Feb 2024
Tuning In: Analysis of Audio Classifier Performance in Clinical Settings with Limited Data
Hamza Mahdi
Eptehal Nashnoush
Rami Saab
Arjun Balachandar
Rishit Dagli
Lucas X. Perri
H. Khosravani
354
1
0
07 Feb 2024
Positional Encoding Helps Recurrent Neural Networks Handle a Large Vocabulary
Takashi Morita
436
7
0
31 Jan 2024
Topology-Aware Latent Diffusion for 3D Shape Generation
Jiangbei Hu
Ben Fei
Baixin Xu
Fei Hou
Weidong Yang
Shengfa Wang
Na Lei
Chen Qian
Ying He
203
9
0
31 Jan 2024
Triple Disentangled Representation Learning for Multimodal Affective Analysis
Information Fusion (Inf. Fusion), 2024
Ying Zhou
Xuefeng Liang
Han Chen
Yin Zhao
Xin Chen
Lida Yu
192
11
0
29 Jan 2024
On the generalization capacity of neural networks during generic multimodal reasoning
International Conference on Learning Representations (ICLR), 2024
Takuya Ito
Soham Dan
Mattia Rigotti
James Kozloski
Murray Campbell
LRM
201
4
0
26 Jan 2024
Jump Cut Smoothing for Talking Heads
Xiaojuan Wang
Taesung Park
Yang Zhou
Eli Shechtman
Richard Zhang
VGen
168
1
0
09 Jan 2024
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
International Journal of Computer Vision (IJCV), 2024
Zhi-Song Liu
Robin Courant
Vicky Kalogeiton
331
9
0
08 Jan 2024
Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
259
7
0
08 Jan 2024
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
134
5
0
08 Jan 2024
PIXAR: Auto-Regressive Language Modeling in Pixel Space
Yintao Tai
Xiyang Liao
Alessandro Suglia
Antonio Vergari
MLLM
301
13
0
06 Jan 2024
CaMML: Context-Aware Multimodal Learner for Large Models
Yixin Chen
Shuai Zhang
Boran Han
Tong He
Bo Li
VLM
224
6
0
06 Jan 2024
Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues
David Gimeno-Gómez
Ana-Maria Bucur
Adrian Cosma
Carlos David Martínez Hinarejos
Paolo Rosso
199
24
0
05 Jan 2024
AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided Diagnosis
IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024
Qiuhui Chen
Yi Hong
MedIm
386
6
0
02 Jan 2024
Saliency-Aware Regularized Graph Neural Network
Artificial Intelligence (AI), 2024
Wenjie Pei
Weina Xu
Zongze Wu
Weichao Li
Jinfan Wang
Guangming Lu
Xiangrong Wang
131
10
0
01 Jan 2024
SVFAP: Self-supervised Video Facial Affect Perceiver
IEEE Transactions on Affective Computing (TAC), 2023
Guoying Zhao
Zheng Lian
Kexin Wang
Yu He
Ming Xu
Haiyang Sun
Yinan Han
Jianhua Tao
170
24
0
31 Dec 2023
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Xiangxiang Chu
Limeng Qiao
Xinyang Lin
Shuang Xu
Yang Yang
...
Fei Wei
Xinyu Zhang
Bo Zhang
Xiaolin Wei
Chunhua Shen
MLLM
276
68
0
28 Dec 2023
Deformable Audio Transformer for Audio Event Detection
Wentao Zhu
151
0
0
24 Dec 2023
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation
Hongtao Wu
Ya Jing
Chi-Hou Cheang
Guangzeng Chen
Jiafeng Xu
Xinghang Li
Minghuan Liu
Hang Li
Tao Kong
450
227
0
20 Dec 2023
Inducing Point Operator Transformer: A Flexible and Scalable Architecture for Solving PDEs
Seungjun Lee
Taeil Oh
246
15
0
18 Dec 2023
Reconstruction of Fields from Sparse Sensing: Differentiable Sensor Placement Enhances Generalization
Agnese Marcato
Dan O’Malley
Hari S. Viswanathan
E. Guiltinan
Javier E. Santos
91
2
0
14 Dec 2023
Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers
Computer Vision and Pattern Recognition (CVPR), 2023
Zi-Xin Zou
Zhipeng Yu
Yuanchen Guo
Yangguang Li
Ding Liang
Yan-Pei Cao
Song-Hai Zhang
3DGS
350
260
0
14 Dec 2023
A Foundational Multimodal Vision Language AI Assistant for Human Pathology
Ming Y. Lu
Bowen Chen
Drew F. K. Williamson
Richard J. Chen
Kenji Ikamura
...
Ivy Liang
L. Le
Tong Ding
Anil V. Parwani
Faisal Mahmood
MedIm
LM&MA
182
29
0
13 Dec 2023
NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image
European Conference on Computer Vision (ECCV), 2023
Yoonwoo Jeong
Jinwoo Lee
Chiheon Kim
Minsu Cho
Doyup Lee
156
9
0
12 Dec 2023
DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors
Federico Landini
Mireia Díez
Themos Stafylakis
Lukávs Burget
308
20
0
07 Dec 2023
UPOCR: Towards Unified Pixel-Level OCR Interface
International Conference on Machine Learning (ICML), 2023
Dezhi Peng
Zhenhua Yang
Jiaxin Zhang
Chongyu Liu
Yongxin Shi
Kai Ding
Fengjun Guo
Lianwen Jin
337
13
0
05 Dec 2023
Learning to Compose SuperWeights for Neural Parameter Allocation Search
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Piotr Teterwak
Soren Nelson
Nikoli Dryden
D. Bashkirova
Kate Saenko
Bryan A. Plummer
262
3
0
03 Dec 2023
Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models
Andrés Villa
Juan Carlos León Alcázar
Alvaro Soto
Bernard Ghanem
MLLM
VLM
280
18
0
03 Dec 2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning
Artemis Panagopoulou
Le Xue
Ning Yu
Junnan Li
Dongxu Li
Shafiq Joty
Ran Xu
Silvio Savarese
Caiming Xiong
Juan Carlos Niebles
VLM
MLLM
260
69
0
30 Nov 2023
GeoDeformer: Geometric Deformable Transformer for Action Recognition
Jinhui Ye
Jiaming Zhou
Hui Xiong
Junwei Liang
ViT
95
1
0
29 Nov 2023
ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
IEEE transactions on multimedia (IEEE TMM), 2023
Fukun Yin
Xin Chen
C. Zhang
Biao Jiang
Zibo Zhao
Jiayuan Fan
Gang Yu
Taihao Li
Tao Chen
413
40
0
29 Nov 2023
Contrastive Vision-Language Alignment Makes Efficient Instruction Learner
Lizhao Liu
Xinyu Sun
Tianhang Xiang
Zhuangwei Zhuang
Liuren Yin
Mingkui Tan
VLM
159
4
0
29 Nov 2023
ViT-Lens: Towards Omni-modal Representations
Computer Vision and Pattern Recognition (CVPR), 2023
Weixian Lei
Yixiao Ge
Kun Yi
Jianfeng Zhang
Difei Gao
Dylan Sun
Yuying Ge
Ying Shan
Mike Zheng Shou
192
32
0
27 Nov 2023
Unlearning via Sparse Representations
Vedant Shah
Frederik Trauble
Ashish Malik
Hugo Larochelle
Michael C. Mozer
Sanjeev Arora
Yoshua Bengio
Anirudh Goyal
MU
267
9
0
26 Nov 2023
Looped Transformers are Better at Learning Learning Algorithms
International Conference on Learning Representations (ICLR), 2023
Liu Yang
Kangwook Lee
Robert D. Nowak
Dimitris Papailiopoulos
426
55
0
21 Nov 2023
Long-MIL: Scaling Long Contextual Multiple Instance Learning for Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Chenglu Zhu
Jiatong Cai
Sunyi Zheng
Lin Yang
VLM
257
6
0
21 Nov 2023
InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Xiaotian Han
Quanzeng You
Yongfei Liu
Wentao Chen
Huangjie Zheng
...
Yiqi Wang
Bohan Zhai
Jianbo Yuan
Heng Wang
Hongxia Yang
ReLM
LRM
ELM
383
11
0
20 Nov 2023
Previous
1
2
3
...
7
8
9
...
14
15
16
Next