ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.03206
  4. Cited By
Perceiver: General Perception with Iterative Attention
v1v2 (latest)

Perceiver: General Perception with Iterative Attention

International Conference on Machine Learning (ICML), 2021
4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
    VLMViTMDE
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "Perceiver: General Perception with Iterative Attention"

50 / 792 papers shown
Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking
Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking
Chen-Hao Chao
Wei-Fang Sun
Hanwen Liang
Chun-Yi Lee
Rahul G. Krishnan
DiffM
749
7
0
24 May 2025
ConnectomeDiffuser: Generative AI Enables Brain Network Construction from Diffusion Tensor Imaging
ConnectomeDiffuser: Generative AI Enables Brain Network Construction from Diffusion Tensor ImagingIEEE transactions on consumer electronics (IEEE TCE), 2025
Xuhang Chen
Michael Kwok-Po Ng
Kim-Fung Tsang
Chi-Man Pun
Shuqiang Wang
DiffMMedIm
236
2
0
23 May 2025
Exploring The Visual Feature Space for Multimodal Neural Decoding
Exploring The Visual Feature Space for Multimodal Neural Decoding
Weihao Xia
Steven Chacko
289
5
0
21 May 2025
Adaptive Visuo-Tactile Fusion with Predictive Force Attention for Dexterous Manipulation
Adaptive Visuo-Tactile Fusion with Predictive Force Attention for Dexterous Manipulation
Jinzhou Li
Tianhao Wu
Jiyao Zhang
Zeyuan Chen
Haotian Jin
Mingdong Wu
Yujun Shen
Yaodong Yang
Hao Dong
339
2
0
20 May 2025
PhySense: Sensor Placement Optimization for Accurate Physics Sensing
PhySense: Sensor Placement Optimization for Accurate Physics Sensing
Yuezhou Ma
Haixu Wu
Hang Zhou
Huikun Weng
Chao Guo
Mingsheng Long
DiffM
526
0
0
19 May 2025
GeoMaNO: Geometric Mamba Neural Operator for Partial Differential Equations
GeoMaNO: Geometric Mamba Neural Operator for Partial Differential Equations
Xi Han
Jingwei Zhang
Dimitris Samaras
Fei Hou
Hong Qin
AI4CE
326
2
0
17 May 2025
EnerVerse-AC: Envisioning Embodied Environments with Action Condition
Yuxin Jiang
Shengcong Chen
Siyuan Huang
Liliang Chen
Pengfei Zhou
...
Xindong He
Chiming Liu
Jiaming Song
Maoqing Yao
Maoqing Yao
272
16
0
14 May 2025
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets
Weiyu Li
Xiao-Yong Zhang
Zheng Sun
Di Qi
Haoyang Li
...
Zeming Li
Gang Yu
Xiangyu Zhang
Daxin Jiang
Ping Tan
364
34
0
12 May 2025
ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning
ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning
Hongyin Zhang
Zifeng Zhuang
Han Zhao
Pengxiang Ding
Hongchao Lu
Xuetao Zhang
OffRL
313
21
0
12 May 2025
KDC-Diff: A Latent-Aware Diffusion Model with Knowledge Retention for Memory-Efficient Image Generation
KDC-Diff: A Latent-Aware Diffusion Model with Knowledge Retention for Memory-Efficient Image Generation
Md. Naimur Asif Borno
Md Sakib Hossain Shovon
Asmaa Soliman Al-Moisheer
Mohammad Ali Moni
303
0
0
11 May 2025
Efficient Robotic Policy Learning via Latent Space Backward Planning
Efficient Robotic Policy Learning via Latent Space Backward Planning
Dongxiu Liu
Haoyi Niu
Zhihao Wang
Jinliang Zheng
Yinan Zheng
Zhonghong Ou
Jianming Hu
Jianxiong Li
Xianyuan Zhan
316
5
0
11 May 2025
Visual Instruction Tuning with Chain of Region-of-Interest
Visual Instruction Tuning with Chain of Region-of-Interest
Yixin Chen
Shuai Zhang
Boran Han
Bernie Wang
283
2
0
11 May 2025
Anymate: A Dataset and Baselines for Learning 3D Object Rigging
Anymate: A Dataset and Baselines for Learning 3D Object Rigging
Yufan Deng
Yuhao Zhang
Chen Geng
Shangzhe Wu
Jiajun Wu
3DH
513
10
0
09 May 2025
LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders
LONGER: Scaling Up Long Sequence Modeling in Industrial RecommendersACM Conference on Recommender Systems (RecSys), 2025
Zheng Chai
Qin Ren
Xijun Xiao
Heng Yang
Bo Han
...
Xiang Sun
Yaocheng Tan
Peng Xu
Yuchao Zheng
Di Wu
346
25
0
07 May 2025
Text2CT: Towards 3D CT Volume Generation from Free-text Descriptions Using Diffusion Model
Text2CT: Towards 3D CT Volume Generation from Free-text Descriptions Using Diffusion Model
Pengfei Guo
Can Zhao
Dong Yang
Yufan He
V. Nath
...
Zongwei Zhou
Benjamin D. Simon
Stephanie Harmon
Baris Turkbey
Daguang Xu
DiffMMedIm
302
6
0
07 May 2025
Beyond Attention: Toward Machines with Intrinsic Higher Mental States
Beyond Attention: Toward Machines with Intrinsic Higher Mental States
Ahsan Adeel
OffRLLRM
190
1
0
02 May 2025
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
RoboGround: Robotic Manipulation with Grounded Vision-Language PriorsComputer Vision and Pattern Recognition (CVPR), 2025
Haifeng Huang
Xinyi Chen
Yuxiao Chen
Haoyang Li
Xiaoshen Han
Zihao Wang
Tai Wang
Jiangmiao Pang
Zhou Zhao
LM&Ro
413
14
0
30 Apr 2025
Direct Motion Models for Assessing Generated Videos
Direct Motion Models for Assessing Generated Videos
Kelsey R. Allen
Carl Doersch
Guangyao Zhou
Mohammed Suhail
Danny Driess
...
Thomas Kipf
Mehdi S. M. Sajjadi
Kevin P. Murphy
João Carreira
Sjoerd van Steenkiste
EGVMDiffMVGen
491
5
0
30 Apr 2025
CLR-Wire: Towards Continuous Latent Representations for 3D Curve Wireframe Generation
CLR-Wire: Towards Continuous Latent Representations for 3D Curve Wireframe Generation
Xueqi Ma
Yong Liu
Tianlong Gao
Qingming Huang
Hui Huang
3DVAI4CE
409
2
0
27 Apr 2025
Multimodal graph representation learning for website generation based on visual sketch
Multimodal graph representation learning for website generation based on visual sketch
Tung D. Vu
Chung Hoang
Truong-Son Hy
3DV
309
1
0
25 Apr 2025
Token Sequence Compression for Efficient Multimodal Computing
Token Sequence Compression for Efficient Multimodal Computing
Yasmine Omri
Parth Shroff
Thierry Tambe
280
5
0
24 Apr 2025
A multi-scale vision transformer-based multimodal GeoAI model for mapping Arctic permafrost thaw
A multi-scale vision transformer-based multimodal GeoAI model for mapping Arctic permafrost thawIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE J-STARS), 2025
Wenwen Li
Chia-Yu Hsu
Sizhe Wang
Zhining Gu
Yili Yang
Brendan M. Rogers
A. Liljedahl
270
5
0
23 Apr 2025
MR. Video: "MapReduce" is the Principle for Long Video Understanding
MR. Video: "MapReduce" is the Principle for Long Video Understanding
Ziqi Pang
Yu-Xiong Wang
VLM
278
7
0
22 Apr 2025
Scaling and Beyond: Advancing Spatial Reasoning in MLLMs Requires New Recipes
Scaling and Beyond: Advancing Spatial Reasoning in MLLMs Requires New Recipes
Huanyu Zhang
Chengzu Li
Wenshan Wu
Shaoguang Mao
Yan Xia
...
Zheng Zhang
Liang Wang
Liang Wang
Tieniu Tan
Furu Wei
LRM
310
4
0
21 Apr 2025
Cross-attention for State-based model RWKV-7
Cross-attention for State-based model RWKV-7
Liu Xiao
Li Zhiyuan
Lin Yueyu
OffRL
127
0
0
19 Apr 2025
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models
Zhanglin Wu
Tengfei Song
Ning Xie
Mengli Zhu
Weidong Zhang
...
Pengfei Li
Chong Li
Junhao Zhu
Hao Yang
Shiliang Sun
479
2
0
16 Apr 2025
DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis
DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis
Efthymios Georgiou
Vassilis Katsouros
Yannis Avrithis
Alexandros Potamianos
404
1
0
15 Apr 2025
Evolved Hierarchical Masking for Self-Supervised Learning
Evolved Hierarchical Masking for Self-Supervised LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Zhanzhou Feng
Shiliang Zhang
375
1
0
12 Apr 2025
FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation
FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation
Linyan Huang
Haonan Lin
Yanning Zhou
Kaiwen Xiao
317
3
0
10 Apr 2025
EDIT: Enhancing Vision Transformers by Mitigating Attention Sink through an Encoder-Decoder Architecture
EDIT: Enhancing Vision Transformers by Mitigating Attention Sink through an Encoder-Decoder Architecture
Wenfeng Feng
Guoying Sun
Jianlong Wang
Xin Zhang
Jingjing Zhao
Yueyue Liang
Xiang Chen
Duokui Han
348
2
0
09 Apr 2025
Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking
Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive JailbreakingComputer Vision and Pattern Recognition (CVPR), 2025
Junxi Chen
Junhao Dong
Xiaohua Xie
361
5
0
08 Apr 2025
A Self-Supervised Framework for Space Object Behaviour Characterisation
A Self-Supervised Framework for Space Object Behaviour Characterisation
Ian Groves
Andrew Campbell
James Fernandes
Diego Rodriguez
Paul Murray
Massimiliano Vasile
Victoria Nockles
103
0
0
08 Apr 2025
Memory-Modular Classification: Learning to Generalize with Memory Replacement
Memory-Modular Classification: Learning to Generalize with Memory Replacement
Dahyun Kang
Ahmet Iscen
Eunchan Jo
Sua Choi
Minsu Cho
Cordelia Schmid
VLMKELMOffRL
328
0
0
08 Apr 2025
SmolVLM: Redefining small and efficient multimodal models
SmolVLM: Redefining small and efficient multimodal models
Andres Marafioti
Orr Zohar
Miquel Farré
Merve Noyan
Elie Bakouch
...
Hugo Larcher
Mathieu Morlon
Lewis Tunstall
Leandro von Werra
Thomas Wolf
VLM
503
119
0
07 Apr 2025
A Survey of Pathology Foundation Model: Progress and Future Directions
A Survey of Pathology Foundation Model: Progress and Future DirectionsInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Conghao Xiong
Hao Chen
Joseph J. Y. Sung
LM&MAAI4CE
478
7
0
05 Apr 2025
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval
Learning Audio-guided Video Representation with Gated Attention for Video-Text RetrievalComputer Vision and Pattern Recognition (CVPR), 2025
Boseung Jeong
Jicheol Park
Sungyeon Kim
Suha Kwak
308
3
0
03 Apr 2025
Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization
Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization
Kangle Deng
Hsueh-Ti Derek Liu
Yiheng Zhu
Xiaoxia Sun
Chong Shang
Kiran Bhat
Deva Ramanan
Jun-Yan Zhu
Maneesh Agrawala
Tinghui Zhou
339
2
0
03 Apr 2025
AttentiveGRU: Recurrent Spatio-Temporal Modeling for Advanced Radar-Based BEV Object Detection
AttentiveGRU: Recurrent Spatio-Temporal Modeling for Advanced Radar-Based BEV Object Detection
Loveneet Saini
Mirko Meuter
Hasan Tercan
Tobias Meisen
249
1
0
01 Apr 2025
Beyond Unimodal Boundaries: Generative Recommendation with Multimodal Semantics
Beyond Unimodal Boundaries: Generative Recommendation with Multimodal Semantics
Jing Zhu
Mingxuan Ju
Yozen Liu
Danai Koutra
Neil Shah
Tong Zhao
213
3
0
30 Mar 2025
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse DisciplinesComputer Vision and Pattern Recognition (CVPR), 2025
Chen Tang
Cheng Wang
Encheng Su
Xiufeng Song
Xiaohong Liu
Wei-Hong Li
Lei Bai
Wanli Ouyang
Xiangyu Yue
3DGSAI4TS
287
0
0
26 Mar 2025
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Weili Zeng
Ziyuan Huang
Kaixiang Ji
Manwen Liao
VLM
635
5
0
26 Mar 2025
Latent Beam Diffusion Models for Generating Visual Sequences
Latent Beam Diffusion Models for Generating Visual Sequences
Guilherme Fernandes
Vasco Ramos
Regev Cohen
Idan Szpektor
João Magalhães
402
2
0
26 Mar 2025
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
Mingze Xu
Mingfei Gao
Shiyu Li
Jiasen Lu
Zhe Gan
Zhengfeng Lai
Meng Cao
Kai Kang
Yue Yang
Afshin Dehghan
432
15
0
24 Mar 2025
Hierarchy-Aware and Channel-Adaptive Semantic Communication for Bandwidth-Limited Data Fusion
Hierarchy-Aware and Channel-Adaptive Semantic Communication for Bandwidth-Limited Data FusionIEEE Wireless Communications Letters (WCL), 2025
Lei Guo
Wei Chen
Yuxuan Sun
Bo Ai
Nikolaos Pappas
T. Quek
188
3
0
22 Mar 2025
Unleashing Vecset Diffusion Model for Fast Shape Generation
Unleashing Vecset Diffusion Model for Fast Shape Generation
Zeqiang Lai
Yunfei Zhao
Zibo Zhao
Haolin Liu
Fuyun Wang
...
Jinwei Huang
Yuhong Liu
Jie Jiang
Chunchao Guo
Xiangyu Yue
DiffM
1.1K
14
0
20 Mar 2025
Cube: A Roblox View of 3D Intelligence
Cube: A Roblox View of 3D Intelligence
Foundation AI Team Roblox
Kiran Bhat
Nishchaie Khanna
Karun Channa
Tinghui Zhou
...
Kyle Price
Steve Han
Yiqing Wang
A. Singh
David Baszucki
287
6
0
19 Mar 2025
ACE: A Cardinality Estimator for Set-Valued Queries
ACE: A Cardinality Estimator for Set-Valued QueriesProceedings of the VLDB Endowment (PVLDB), 2025
Yufan Sheng
Xin Cao
Kaiqi Zhao
Yixiang Fang
Jianzhong Qi
Wenjie Zhang
Christian S. Jensen
345
0
0
19 Mar 2025
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory
Saket Gurukar
Asim Kadav
VLM
364
2
0
17 Mar 2025
VRsketch2Gaussian: 3D VR Sketch Guided 3D Object Generation with Gaussian Splatting
VRsketch2Gaussian: 3D VR Sketch Guided 3D Object Generation with Gaussian Splatting
Songen Gu
Haoxuan Song
Binjie Liu
Qian Yu
Sanyi Zhang
Haiyong Jiang
Jin Huang
Feng Tian
3DGS3DV
199
3
0
16 Mar 2025
FastVID: Dynamic Density Pruning for Fast Video Large Language Models
FastVID: Dynamic Density Pruning for Fast Video Large Language Models
Leqi Shen
Guoqiang Gong
Tao He
Yifeng Zhang
Pengzhang Liu
Sicheng Zhao
Guiguang Ding
VLM
410
16
0
14 Mar 2025
Previous
123456...141516
Next
Page 3 of 16
Pageof 16