ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.03206
  4. Cited By
Perceiver: General Perception with Iterative Attention

Perceiver: General Perception with Iterative Attention

4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
    VLM
    ViT
    MDE
ArXivPDFHTML

Papers citing "Perceiver: General Perception with Iterative Attention"

50 / 680 papers shown
Title
ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning
ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning
Hongyin Zhang
Zifeng Zhuang
H. Zhao
Pengxiang Ding
Hongchao Lu
Donglin Wang
OffRL
34
0
0
12 May 2025
Visual Instruction Tuning with Chain of Region-of-Interest
Visual Instruction Tuning with Chain of Region-of-Interest
Yixin Chen
Shuai Zhang
Boran Han
Bernie Wang
21
0
0
11 May 2025
Efficient Robotic Policy Learning via Latent Space Backward Planning
Efficient Robotic Policy Learning via Latent Space Backward Planning
Dongxiu Liu
Haoyi Niu
Zhihao Wang
Jinliang Zheng
Yinan Zheng
Zhonghong Ou
Jianming Hu
Jianxiong Li
Xianyuan Zhan
13
0
0
11 May 2025
Replay-Based Continual Learning with Dual-Layered Distillation and a Streamlined U-Net for Efficient Text-to-Image Generation
Replay-Based Continual Learning with Dual-Layered Distillation and a Streamlined U-Net for Efficient Text-to-Image Generation
Md. Naimur Asif Borno
Md Sakib Hossain Shovon
Asmaa Soliman Al-Moisheer
Mohammad Ali Moni
24
0
0
11 May 2025
Anymate: A Dataset and Baselines for Learning 3D Object Rigging
Anymate: A Dataset and Baselines for Learning 3D Object Rigging
Yufan Deng
Yuhao Zhang
Chen Geng
Shangzhe Wu
Jiajun Wu
3DH
45
0
0
09 May 2025
Text2CT: Towards 3D CT Volume Generation from Free-text Descriptions Using Diffusion Model
Text2CT: Towards 3D CT Volume Generation from Free-text Descriptions Using Diffusion Model
Pengfei Guo
Can Zhao
Dong Yang
Yufan He
V. Nath
...
Zongwei Zhou
Benjamin D. Simon
Stephanie Harmon
B. Turkbey
Daguang Xu
DiffM
MedIm
36
0
0
07 May 2025
LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders
LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders
Zheng Chai
Qin Ren
Xijun Xiao
H. Yang
Bo Han
...
Xiang Sun
Yaocheng Tan
Peng Xu
Yuchao Zheng
Di Wu
33
0
0
07 May 2025
Beyond Attention: Toward Machines with Intrinsic Higher Mental States
Beyond Attention: Toward Machines with Intrinsic Higher Mental States
Ahsan Adeel
OffRL
LRM
16
0
0
02 May 2025
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Haifeng Huang
Xinyi Chen
Y. Chen
H. Li
Xiaoshen Han
Z. Wang
Tai Wang
Jiangmiao Pang
Zhou Zhao
LM&Ro
75
0
0
30 Apr 2025
Direct Motion Models for Assessing Generated Videos
Direct Motion Models for Assessing Generated Videos
Kelsey R. Allen
Carl Doersch
Guangyao Zhou
Mohammed Suhail
Danny Driess
...
Thomas Kipf
Mehdi S. M. Sajjadi
Kevin P. Murphy
João Carreira
Sjoerd van Steenkiste
EGVM
DiffM
VGen
74
0
0
30 Apr 2025
CLR-Wire: Towards Continuous Latent Representations for 3D Curve Wireframe Generation
CLR-Wire: Towards Continuous Latent Representations for 3D Curve Wireframe Generation
Xueqi Ma
Y. Liu
Tianlong Gao
Q. Huang
Hui Huang
3DV
AI4CE
37
0
0
27 Apr 2025
Multimodal graph representation learning for website generation based on visual sketch
Multimodal graph representation learning for website generation based on visual sketch
Tung D. Vu
Chung Hoang
Truong-Son Hy
3DV
48
0
0
25 Apr 2025
Token Sequence Compression for Efficient Multimodal Computing
Token Sequence Compression for Efficient Multimodal Computing
Yasmine Omri
Parth Shroff
Thierry Tambe
51
0
0
24 Apr 2025
A multi-scale vision transformer-based multimodal GeoAI model for mapping Arctic permafrost thaw
A multi-scale vision transformer-based multimodal GeoAI model for mapping Arctic permafrost thaw
Wenwen Li
Chia-Yu Hsu
Sizhe Wang
Zhining Gu
Yili Yang
Brendan M. Rogers
A. Liljedahl
50
0
0
23 Apr 2025
MR. Video: "MapReduce" is the Principle for Long Video Understanding
MR. Video: "MapReduce" is the Principle for Long Video Understanding
Ziqi Pang
Yu-xiong Wang
VLM
34
0
0
22 Apr 2025
A Call for New Recipes to Enhance Spatial Reasoning in MLLMs
A Call for New Recipes to Enhance Spatial Reasoning in MLLMs
Huanyu Zhang
Chengzu Li
Wenshan Wu
Shaoguang Mao
Yan Xia
Ivan Vulić
Z. Zhang
Liang Wang
T. Tan
Furu Wei
LRM
34
1
0
21 Apr 2025
Cross-attention for State-based model RWKV-7
Cross-attention for State-based model RWKV-7
Liu Xiao
Li Zhiyuan
Lin Yueyu
OffRL
17
0
0
19 Apr 2025
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models
Zhanglin Wu
Tengfei Song
Ning Xie
Weidong Zhang
Mengli Zhu
...
Pengfei Li
C. Li
Junhao Zhu
Hao-Yu Yang
Shiliang Sun
26
1
0
16 Apr 2025
DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis
DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis
Efthymios Georgiou
V. Katsouros
Yannis Avrithis
Alexandros Potamianos
24
0
0
15 Apr 2025
Evolved Hierarchical Masking for Self-Supervised Learning
Evolved Hierarchical Masking for Self-Supervised Learning
Zhanzhou Feng
Shiliang Zhang
37
0
0
12 Apr 2025
FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation
FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation
Linyan Huang
Haonan Lin
Yanning Zhou
Kaiwen Xiao
42
0
0
10 Apr 2025
EDIT: Enhancing Vision Transformers by Mitigating Attention Sink through an Encoder-Decoder Architecture
EDIT: Enhancing Vision Transformers by Mitigating Attention Sink through an Encoder-Decoder Architecture
Wenfeng Feng
Guoying Sun
23
0
0
09 Apr 2025
Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking
Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking
Junxi Chen
Junhao Dong
Xiaohua Xie
33
0
0
08 Apr 2025
A Self-Supervised Framework for Space Object Behaviour Characterisation
A Self-Supervised Framework for Space Object Behaviour Characterisation
Ian Groves
Andrew Campbell
James Fernandes
Diego Rodriguez
Paul Murray
Massimiliano Vasile
Victoria Nockles
21
0
0
08 Apr 2025
Memory-Modular Classification: Learning to Generalize with Memory Replacement
Memory-Modular Classification: Learning to Generalize with Memory Replacement
Dahyun Kang
Ahmet Iscen
Eunchan Jo
Sua Choi
Minsu Cho
Cordelia Schmid
VLM
KELM
OffRL
22
0
0
08 Apr 2025
SmolVLM: Redefining small and efficient multimodal models
SmolVLM: Redefining small and efficient multimodal models
Andres Marafioti
Orr Zohar
Miquel Farré
Merve Noyan
Elie Bakouch
...
Hugo Larcher
Mathieu Morlon
Lewis Tunstall
Leandro von Werra
Thomas Wolf
VLM
34
4
0
07 Apr 2025
A Survey of Pathology Foundation Model: Progress and Future Directions
A Survey of Pathology Foundation Model: Progress and Future Directions
Conghao Xiong
Hao Chen
Joseph J. Y. Sung
LM&MA
AI4CE
51
0
0
05 Apr 2025
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval
Boseung Jeong
Jicheol Park
Sungyeon Kim
Suha Kwak
36
0
0
03 Apr 2025
Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization
Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization
Kangle Deng
Hsueh-Ti Derek Liu
Yiheng Zhu
Xiaoxia Sun
Chong Shang
Kiran Bhat
Deva Ramanan
Jun-Yan Zhu
Maneesh Agrawala
Tinghui Zhou
75
0
0
03 Apr 2025
AttentiveGRU: Recurrent Spatio-Temporal Modeling for Advanced Radar-Based BEV Object Detection
AttentiveGRU: Recurrent Spatio-Temporal Modeling for Advanced Radar-Based BEV Object Detection
Loveneet Saini
Mirko Meuter
Hasan Tercan
Tobias Meisen
46
0
0
01 Apr 2025
Beyond Unimodal Boundaries: Generative Recommendation with Multimodal Semantics
Beyond Unimodal Boundaries: Generative Recommendation with Multimodal Semantics
Jing Zhu
Mingxuan Ju
Yozen Liu
Danai Koutra
Neil Shah
Tong Zhao
41
0
0
30 Mar 2025
Latent Beam Diffusion Models for Decoding Image Sequences
Latent Beam Diffusion Models for Decoding Image Sequences
Guilherme Fernandes
Vasco Ramos
Regev Cohen
Idan Szpektor
João Magalhães
76
0
0
26 Mar 2025
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
Chen Tang
Xinzhu Ma
Encheng Su
Xiufeng Song
Xiaohong Liu
Wei-Hong Li
Lei Bai
Wanli Ouyang
Xiangyu Yue
3DGS
AI4TS
64
0
0
26 Mar 2025
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Weili Zeng
Ziyuan Huang
Kaixiang Ji
Yichao Yan
VLM
42
1
0
26 Mar 2025
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
Mingze Xu
Mingfei Gao
Shiyu Li
Jiasen Lu
Zhe Gan
Zhengfeng Lai
Meng Cao
Kai Kang
Y. Yang
Afshin Dehghan
51
1
0
24 Mar 2025
Hierarchy-Aware and Channel-Adaptive Semantic Communication for Bandwidth-Limited Data Fusion
Hierarchy-Aware and Channel-Adaptive Semantic Communication for Bandwidth-Limited Data Fusion
Lei Guo
Wei Chen
Yuxuan Sun
Bo Ai
Nikolaos Pappas
T. Quek
36
0
0
22 Mar 2025
Unleashing Vecset Diffusion Model for Fast Shape Generation
Unleashing Vecset Diffusion Model for Fast Shape Generation
Zeqiang Lai
Yunfei Zhao
Zibo Zhao
Haolin Liu
Fuyun Wang
...
Jinwei Huang
Yuhong Liu
Jie Jiang
Chunchao Guo
Xiangyu Yue
DiffM
78
0
0
20 Mar 2025
Cube: A Roblox View of 3D Intelligence
Cube: A Roblox View of 3D Intelligence
Foundation AI Team Roblox
Kiran Bhat
Nishchaie Khanna
Karun Channa
Tinghui Zhou
...
Kyle Price
Steve Han
Yiqing Wang
A. Singh
David Baszucki
55
0
0
19 Mar 2025
ACE: A Cardinality Estimator for Set-Valued Queries
ACE: A Cardinality Estimator for Set-Valued Queries
Yufan Sheng
Xin Cao
Kaiqi Zhao
Yixiang Fang
Jianzhong Qi
Wenjie Zhang
Christian S. Jensen
55
0
0
19 Mar 2025
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory
Saket Gurukar
Asim Kadav
VLM
50
0
0
17 Mar 2025
VRsketch2Gaussian: 3D VR Sketch Guided 3D Object Generation with Gaussian Splatting
VRsketch2Gaussian: 3D VR Sketch Guided 3D Object Generation with Gaussian Splatting
Songen Gu
Haoxuan Song
Binjie Liu
Qian Yu
Sanyi Zhang
Haiyong Jiang
Jin Huang
Feng Tian
3DGS
3DV
50
0
0
16 Mar 2025
Similarity-Aware Token Pruning: Your VLM but Faster
Ahmadreza Jeddi
Negin Baghbanzadeh
Elham Dolatabadi
Babak Taati
3DV
VLM
52
1
0
14 Mar 2025
FastVID: Dynamic Density Pruning for Fast Video Large Language Models
Leqi Shen
Guoqiang Gong
Tao He
Yifeng Zhang
Pengzhang Liu
Sicheng Zhao
Guiguang Ding
VLM
63
0
0
14 Mar 2025
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance
Yufan Deng
Xun Guo
Y. Wang
Jacob Zhiyuan Fang
Angtian Wang
Shenghai Yuan
Yiding Yang
Bo Liu
Haibin Huang
Chongyang Ma
DiffM
VGen
64
0
0
13 Mar 2025
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Xuying Zhang
Yutong Liu
Yangguang Li
Renrui Zhang
Y. Liu
...
Wanli Ouyang
Zhiwei Xiong
Peng Gao
Qibin Hou
Ming-Ming Cheng
110
3
0
13 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
64
0
0
13 Mar 2025
Hyper3D: Efficient 3D Representation via Hybrid Triplane and Octree Feature for Enhanced 3D Shape Variational Auto-Encoders
J. Guo
Sensen Gao
Jia-Wang Bian
Wanhu Sun
Heliang Zheng
Rongfei Jia
Mingming Gong
43
1
0
13 Mar 2025
Piece it Together: Part-Based Concepting with IP-Priors
Elad Richardson
Kfir Goldberg
Yuval Alaluf
Daniel Cohen-Or
DiffM
61
0
0
13 Mar 2025
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
Md. Mohaiminul Islam
Tushar Nagarajan
Huiyu Wang
Gedas Bertasius
Lorenzo Torresani
66
0
0
12 Mar 2025
Multi-Modal Foundation Models for Computational Pathology: A Survey
Multi-Modal Foundation Models for Computational Pathology: A Survey
Dong Li
Guihong Wan
Xintao Wu
Xinyu Wu
Xiaohui Chen
Yi He
Christine G. Lian
Peter K. Sorger
Yevgeniy R. Semenov
Chen Zhao
MedIm
44
0
0
12 Mar 2025
1234...121314
Next