v1v2 (latest)

Perceiver: General Perception with Iterative Attention

International Conference on Machine Learning (ICML), 2021

4 March 2021

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "Perceiver: General Perception with Iterative Attention"

50 / 792 papers shown

Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking

749

24 May 2025

ConnectomeDiffuser: Generative AI Enables Brain Network Construction from Diffusion Tensor ImagingIEEE transactions on consumer electronics (IEEE TCE), 2025

236

23 May 2025

Exploring The Visual Feature Space for Multimodal Neural Decoding

Weihao Xia

Steven Chacko

289

21 May 2025

Adaptive Visuo-Tactile Fusion with Predictive Force Attention for Dexterous Manipulation

339

20 May 2025

PhySense: Sensor Placement Optimization for Accurate Physics Sensing

526

19 May 2025

GeoMaNO: Geometric Mamba Neural Operator for Partial Differential Equations

326

17 May 2025

EnerVerse-AC: Envisioning Embodied Environments with Action Condition

...

272

14 May 2025

Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets

...

364

12 May 2025

ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning

313

12 May 2025

KDC-Diff: A Latent-Aware Diffusion Model with Knowledge Retention for Memory-Efficient Image Generation

Md. Naimur Asif Borno

Md Sakib Hossain Shovon

Asmaa Soliman Al-Moisheer

Mohammad Ali Moni

303

11 May 2025

Efficient Robotic Policy Learning via Latent Space Backward Planning

316

11 May 2025

Visual Instruction Tuning with Chain of Region-of-Interest

283

11 May 2025

Anymate: A Dataset and Baselines for Learning 3D Object Rigging

513

09 May 2025

LONGER: Scaling Up Long Sequence Modeling in Industrial RecommendersACM Conference on Recommender Systems (RecSys), 2025

...

346

07 May 2025

Text2CT: Towards 3D CT Volume Generation from Free-text Descriptions Using Diffusion Model

...

302

07 May 2025

Beyond Attention: Toward Machines with Intrinsic Higher Mental States

Ahsan Adeel

OffRL LRM

190

02 May 2025

RoboGround: Robotic Manipulation with Grounded Vision-Language PriorsComputer Vision and Pattern Recognition (CVPR), 2025

413

30 Apr 2025

Direct Motion Models for Assessing Generated Videos

...

Sjoerd van Steenkiste

EGVM DiffM VGen

491

30 Apr 2025

CLR-Wire: Towards Continuous Latent Representations for 3D Curve Wireframe Generation

409

27 Apr 2025

Multimodal graph representation learning for website generation based on visual sketch

309

25 Apr 2025

Token Sequence Compression for Efficient Multimodal Computing

Yasmine Omri

Parth Shroff

Thierry Tambe

280

24 Apr 2025

A multi-scale vision transformer-based multimodal GeoAI model for mapping Arctic permafrost thawIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE J-STARS), 2025

270

23 Apr 2025

MR. Video: "MapReduce" is the Principle for Long Video Understanding

Ziqi Pang

Yu-Xiong Wang

VLM

278

22 Apr 2025

Scaling and Beyond: Advancing Spatial Reasoning in MLLMs Requires New Recipes

...

310

21 Apr 2025

Cross-attention for State-based model RWKV-7

127

19 Apr 2025

Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models

...

479

16 Apr 2025

DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis

Efthymios Georgiou

Vassilis Katsouros

Yannis Avrithis

Alexandros Potamianos

404

15 Apr 2025

Evolved Hierarchical Masking for Self-Supervised LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024

Zhanzhou Feng

Shiliang Zhang

375

12 Apr 2025

FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation

317

10 Apr 2025

EDIT: Enhancing Vision Transformers by Mitigating Attention Sink through an Encoder-Decoder Architecture

348

09 Apr 2025

Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive JailbreakingComputer Vision and Pattern Recognition (CVPR), 2025

Junxi Chen

Junhao Dong

Xiaohua Xie

361

08 Apr 2025

A Self-Supervised Framework for Space Object Behaviour Characterisation

103

08 Apr 2025

Memory-Modular Classification: Learning to Generalize with Memory Replacement

328

08 Apr 2025

SmolVLM: Redefining small and efficient multimodal models

...

503

119

07 Apr 2025

A Survey of Pathology Foundation Model: Progress and Future DirectionsInternational Joint Conference on Artificial Intelligence (IJCAI), 2024

478

05 Apr 2025

Learning Audio-guided Video Representation with Gated Attention for Video-Text RetrievalComputer Vision and Pattern Recognition (CVPR), 2025

308

03 Apr 2025

Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization

339

03 Apr 2025

AttentiveGRU: Recurrent Spatio-Temporal Modeling for Advanced Radar-Based BEV Object Detection

249

01 Apr 2025

Beyond Unimodal Boundaries: Generative Recommendation with Multimodal Semantics

213

30 Mar 2025

UniSTD: Towards Unified Spatio-Temporal Learning across Diverse DisciplinesComputer Vision and Pattern Recognition (CVPR), 2025

287

26 Mar 2025

Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping

635

26 Mar 2025

Latent Beam Diffusion Models for Generating Visual Sequences

402

26 Mar 2025

SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding

432

24 Mar 2025

Hierarchy-Aware and Channel-Adaptive Semantic Communication for Bandwidth-Limited Data FusionIEEE Wireless Communications Letters (WCL), 2025

188

22 Mar 2025

Unleashing Vecset Diffusion Model for Fast Shape Generation

...

1.1K

20 Mar 2025

Cube: A Roblox View of 3D Intelligence

Foundation AI Team Roblox

...

287

19 Mar 2025

ACE: A Cardinality Estimator for Set-Valued QueriesProceedings of the VLDB Endowment (PVLDB), 2025

345

19 Mar 2025

Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory

Saket Gurukar

Asim Kadav

VLM

364

17 Mar 2025

VRsketch2Gaussian: 3D VR Sketch Guided 3D Object Generation with Gaussian Splatting

199

16 Mar 2025

FastVID: Dynamic Density Pruning for Fast Video Large Language Models

410

14 Mar 2025