v1v2 (latest)

Perceiver: General Perception with Iterative Attention

International Conference on Machine Learning (ICML), 2021

4 March 2021

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "Perceiver: General Perception with Iterative Attention"

50 / 792 papers shown

VOLoc: Visual Place Recognition by Querying Compressed Lidar Map

201

25 Feb 2024

Multimodal Transformer With a Low-Computational-Cost Guarantee

Sungjin Park

Edward Choi

168

23 Feb 2024

Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning

183

22 Feb 2024

Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot

268

22 Feb 2024

Semantic Image Synthesis with Unconditional Generator

268

22 Feb 2024

PQA: Zero-shot Protein Question Answering for Free-form Scientific Enquiry with Large Language Models

Eli M. Carrami

Sahand Sharifzadeh

144

21 Feb 2024

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Chien-Yao Wang

I-Hau Yeh

Hongpeng Liao

426

2,984

21 Feb 2024

User-LLM: Efficient LLM Contextualization with User Embeddings

273

21 Feb 2024

The Revolution of Multimodal Large Language Models: A Survey

Lorenzo Baraldi

359

124

19 Feb 2024

Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers

Markus Hiller

Krista A. Ehinger

Tom Drummond

329

19 Feb 2024

Universal Physics Transformers: A Framework For Efficiently Scaling Neural Operators

Johannes Brandstetter

PINN AI4CE

596

19 Feb 2024

Semantically-aware Neural Radiance Fields for Visual Scene Understanding: A Comprehensive Review

Thang-Anh-Quan Nguyen

366

17 Feb 2024

3D Diffuser Actor: Policy Diffusion with 3D Scene Representations

403

244

16 Feb 2024

Are Semi-Dense Detector-Free Methods Good at Matching Local Features?

332

13 Feb 2024

Offline Actor-Critic Reinforcement Learning Scales to Large Models

Jost Tobias Springenberg

A. Abdolmaleki

Jingwei Zhang

Oliver Groth

Michael Bloesch

...

Sarah Bechtle

Martin Riedmiller

220

08 Feb 2024

CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion

Shoubin Yu

Jaehong Yoon

Mohit Bansal

516

08 Feb 2024

Tuning In: Analysis of Audio Classifier Performance in Clinical Settings with Limited Data

377

07 Feb 2024

Positional Encoding Helps Recurrent Neural Networks Handle a Large Vocabulary

Takashi Morita

474

31 Jan 2024

Topology-Aware Latent Diffusion for 3D Shape Generation

Ben Fei

Weidong Yang

224

31 Jan 2024

Triple Disentangled Representation Learning for Multimodal Affective AnalysisInformation Fusion (Inf. Fusion), 2024

235

29 Jan 2024

On the generalization capacity of neural networks during generic multimodal reasoningInternational Conference on Learning Representations (ICLR), 2024

252

26 Jan 2024

Jump Cut Smoothing for Talking Heads

212

09 Jan 2024

FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the WildInternational Journal of Computer Vision (IJCV), 2024

Zhi-Song Liu

Robin Courant

Vicky Kalogeiton

346

08 Jan 2024

Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification

Wentao Zhu

286

08 Jan 2024

Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification

Wentao Zhu

165

08 Jan 2024

PIXAR: Auto-Regressive Language Modeling in Pixel Space

357

06 Jan 2024

CaMML: Context-Aware Multimodal Learner for Large Models

312

06 Jan 2024

Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues

David Gimeno-Gómez

Ana-Maria Bucur

Adrian Cosma

Carlos David Martínez Hinarejos

Paolo Rosso

232

05 Jan 2024

AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided DiagnosisIEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024

Qiuhui Chen

Yi Hong

MedIm

428

02 Jan 2024

Saliency-Aware Regularized Graph Neural NetworkArtificial Intelligence (AI), 2024

157

01 Jan 2024

SVFAP: Self-supervised Video Facial Affect PerceiverIEEE Transactions on Affective Computing (TAC), 2023

190

31 Dec 2023

MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices

...

Chunhua Shen

318

28 Dec 2023

Deformable Audio Transformer for Audio Event Detection

Wentao Zhu

162

24 Dec 2023

Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation

468

240

20 Dec 2023

Inducing Point Operator Transformer: A Flexible and Scalable Architecture for Solving PDEs

Seungjun Lee

Taeil Oh

263

18 Dec 2023

Reconstruction of Fields from Sparse Sensing: Differentiable Sensor Placement Enhances Generalization

105

14 Dec 2023

Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with TransformersComputer Vision and Pattern Recognition (CVPR), 2023

379

271

14 Dec 2023

A Foundational Multimodal Vision Language AI Assistant for Human Pathology

Ming Y. Lu

Bowen Chen

Drew F. K. Williamson

...

211

13 Dec 2023

NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single ImageEuropean Conference on Computer Vision (ECCV), 2023

171

12 Dec 2023

DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors

372

07 Dec 2023

UPOCR: Towards Unified Pixel-Level OCR InterfaceInternational Conference on Machine Learning (ICML), 2023

Lianwen Jin

352

05 Dec 2023

Learning to Compose SuperWeights for Neural Parameter Allocation SearchIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

287

03 Dec 2023

Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models

Andrés Villa

Juan Carlos León Alcázar

Alvaro Soto

Bernard Ghanem

MLLM VLM

292

03 Dec 2023

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

Ran Xu

Silvio Savarese

Caiming Xiong

Juan Carlos Niebles

VLM MLLM

282

30 Nov 2023

GeoDeformer: Geometric Deformable Transformer for Action Recognition

115

29 Nov 2023

ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language ModelIEEE transactions on multimedia (IEEE TMM), 2023

467

29 Nov 2023

Contrastive Vision-Language Alignment Makes Efficient Instruction Learner

182

29 Nov 2023

ViT-Lens: Towards Omni-modal RepresentationsComputer Vision and Pattern Recognition (CVPR), 2023

Ying Shan

208

27 Nov 2023

Unlearning via Sparse Representations

274

26 Nov 2023

Looped Transformers are Better at Learning Learning AlgorithmsInternational Conference on Learning Representations (ICLR), 2023

Liu Yang

Kangwook Lee

Robert D. Nowak

Dimitris Papailiopoulos

460

21 Nov 2023