v1v2 (latest)

Perceiver: General Perception with Iterative Attention

International Conference on Machine Learning (ICML), 2021

4 March 2021

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "Perceiver: General Perception with Iterative Attention"

50 / 790 papers shown

Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment

259

04 Dec 2025

FlashVGGT: Efficient and Scalable Visual Geometry Transformers with Compressed Descriptor Attention

Zipeng Wang

Dan Xu

ViT

111

01 Dec 2025

OmniFD: A Unified Model for Versatile Face Forgery Detection

287

30 Nov 2025

From Inpainting to Layer Decomposition: Repurposing Generative Inpainting Models for Image Layer Decomposition

376

26 Nov 2025

AdaPerceiver: Transformers with Adaptive Width, Depth, and Tokens

Purvish Jajal

Nick Eliopoulos

Benjamin Shiue-Hal Chou

George K. Thiruvathukal

Yung-Hsiang Lu

James C. Davis

112

22 Nov 2025

V2X-RECT: An Efficient V2X Trajectory Prediction Framework via Redundant Interaction Filtering and Tracking Error Correction

22 Nov 2025

CAMS: Towards Compositional Zero-Shot Learning via Gated Cross-Attention and Multi-Space Disentanglement

312

20 Nov 2025

N-GLARE: An Non-Generative Latent Representation-Efficient LLM Safety Evaluator

142

18 Nov 2025

MergeDNA: Context-aware Genome Modeling with Dynamic Tokenization through Token Merging

17 Nov 2025

Functional Mean Flow in Hilbert Space

185

17 Nov 2025

CLAReSNet: When Convolution Meets Latent Attention for Hyperspectral Image Classification

Asmit Bandyopadhyay

Anindita Das Bhattacharjee

Rakesh Das

115

15 Nov 2025

Spatial Reasoning in Multimodal Large Language Models: A Survey of Tasks, Benchmarks and Methods

114

14 Nov 2025

Batch Transformer Architecture: Case of Synthetic Image Generation for Emotion Expression Facial RecognitionAthens Journal of Sciences (JAS), 2025

Stanislav Selitskiy

ViT

150

13 Nov 2025

Integrating Temporal and Structural Context in Graph Transformers for Relational Deep Learning

162

06 Nov 2025

Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models

177

04 Nov 2025

Context Engineering 2.0: The Context of Context Engineering

390

30 Oct 2025

BLM$_1$: A Boundless Large Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning

BLM

_1

: A Boundless Large Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning

...

168

28 Oct 2025

Perception Learning: A Formal Separation of Sensory Representation Learning from Decision Learning

Suman Sanyal

SSL

318

28 Oct 2025

Energy-Efficient Domain-Specific Artificial Intelligence Models and Agents: Pathways and Paradigms

406

24 Oct 2025

Diffusion Autoencoders with Perceivers for Long, Irregular and Multimodal Astronomical Sequences

Yunyi Shen

Alexander T. Gagliano

DiffM

125

23 Oct 2025

A Scalable, Causal, and Energy Efficient Framework for Neural Decoding with Spiking Neural Networks

Georgios Mentzelopoulos

141

23 Oct 2025

LLavaCode: Compressed Code Representations for Retrieval-Augmented Code Generation

123

22 Oct 2025

AUGUSTUS: An LLM-Driven Multimodal Agent System with Contextualized User Memory

145

17 Oct 2025

AB-UPT for Automotive and Aerospace Applications

Johannes Brandstetter

AI4CE

103

17 Oct 2025

GOPLA: Generalizable Object Placement Learning via Synthetic Augmentation of Human Arrangement

253

16 Oct 2025

PAINT: Parallel-in-time Neural Twins for Dynamical System Reconstruction

Andreas Radler

Vincent Seyfried

Stefan Pirker

Johannes Brandstetter

Thomas Lichtenegger

132

14 Oct 2025

A Review of Longitudinal Radiology Report Generation: Dataset Composition, Methods, and Performance Evaluation

123

14 Oct 2025

Inpainting the Neural Picture: Inferring Unrecorded Brain Area Dynamics from Multi-Animal Datasets

102

13 Oct 2025

Class Prototypes based Contrastive Learning for Classifying Multi-Label and Fine-Grained Educational VideosComputer Vision and Pattern Recognition (CVPR), 2023

158

13 Oct 2025

Placeit! A Framework for Learning Robot Object Placement Skills

120

10 Oct 2025

DM1: MeanFlow with Dispersive Regularization for 1-Step Robotic Manipulation

111

09 Oct 2025

Single layer tiny Co

^4

outpaces GPT-2 and GPT-BERT

183

09 Oct 2025

Looking to Learn: Token-wise Dynamic Gating for Low-Resource Vision-Language Modelling

Bianca-Mihaela Ganescu

137

09 Oct 2025

Lung Infection Severity Prediction Using Transformers with Conditional TransMix Augmentation and Cross-Attention

203

08 Oct 2025

GyroSwin: 5D Surrogates for Gyrokinetic Plasma Turbulence Simulations

Johannes Brandstetter

208

08 Oct 2025

PhaseFormer: From Patches to Phases for Efficient and Effective Time Series Forecasting

123

05 Oct 2025

MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation

125

30 Sep 2025

EntroPE: Entropy-Guided Dynamic Patch Encoder for Time Series Forecasting

141

30 Sep 2025

Indirect Attention: Turning Context Misalignment into a Feature

114

30 Sep 2025

NeMo: Needle in a Montage for Video-Language Understanding

...

174

29 Sep 2025

UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections

180

29 Sep 2025

GeoFunFlow: Geometric Function Flow Matching for Inverse Operator Learning over Complex Geometries

148

28 Sep 2025

Task-Adaptive Parameter-Efficient Fine-Tuning for Weather Foundation Models

...

131

26 Sep 2025

Contrastive Mutual Information Learning: Toward Robust Representations without Positive-Pair Augmentations

Micha Livne

SSL

135

25 Sep 2025

Diffusion-Based Impedance Learning for Contact-Rich Manipulation Tasks

182

24 Sep 2025

CompLLM: Compression for Long Context Q&A

Gabriele Berton

Jayakrishnan Unnikrishnan

Son Tran

Mubarak Shah

23 Sep 2025

Learning Attribute-Aware Hash Codes for Fine-Grained Image Retrieval via Query Optimization

163

21 Sep 2025

MAST: Multi-Agent Spatial Transformer for Learning to Collaborate

181

21 Sep 2025

DAFTED: Decoupled Asymmetric Fusion of Tabular and Echocardiographic Data for Cardiac Hypertension Diagnosis

147

19 Sep 2025

Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation

...

206

16 Sep 2025