Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2103.03206
Cited By
v1
v2 (latest)
Perceiver: General Perception with Iterative Attention
International Conference on Machine Learning (ICML), 2021
4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"Perceiver: General Perception with Iterative Attention"
50 / 782 papers shown
Title
EnerVerse-AC: Envisioning Embodied Environments with Action Condition
Yuxin Jiang
Shengcong Chen
Siyuan Huang
Liliang Chen
Pengfei Zhou
...
Xindong He
Chiming Liu
Jiaming Song
Maoqing Yao
Maoqing Yao
204
12
0
14 May 2025
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets
Weiyu Li
Xiao-Yong Zhang
Zheng Sun
Di Qi
Haoyang Li
...
Zeming Li
Gang Yu
Xiangyu Zhang
Daxin Jiang
Ping Tan
338
27
0
12 May 2025
ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning
Hongyin Zhang
Zifeng Zhuang
Han Zhao
Pengxiang Ding
Hongchao Lu
Xuetao Zhang
OffRL
255
18
0
12 May 2025
Visual Instruction Tuning with Chain of Region-of-Interest
Yixin Chen
Shuai Zhang
Boran Han
Bernie Wang
222
2
0
11 May 2025
Efficient Robotic Policy Learning via Latent Space Backward Planning
Dongxiu Liu
Haoyi Niu
Zhihao Wang
Jinliang Zheng
Yinan Zheng
Zhonghong Ou
Jianming Hu
Jianxiong Li
Xianyuan Zhan
261
4
0
11 May 2025
KDC-Diff: A Latent-Aware Diffusion Model with Knowledge Retention for Memory-Efficient Image Generation
Md. Naimur Asif Borno
Md Sakib Hossain Shovon
Asmaa Soliman Al-Moisheer
Mohammad Ali Moni
229
0
0
11 May 2025
Anymate: A Dataset and Baselines for Learning 3D Object Rigging
Yufan Deng
Yuhao Zhang
Chen Geng
Shangzhe Wu
Jiajun Wu
3DH
441
6
0
09 May 2025
Text2CT: Towards 3D CT Volume Generation from Free-text Descriptions Using Diffusion Model
Pengfei Guo
Can Zhao
Dong Yang
Yufan He
V. Nath
...
Zongwei Zhou
Benjamin D. Simon
Stephanie Harmon
Baris Turkbey
Daguang Xu
DiffM
MedIm
247
6
0
07 May 2025
LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders
ACM Conference on Recommender Systems (RecSys), 2025
Zheng Chai
Qin Ren
Xijun Xiao
Heng Yang
Bo Han
...
Xiang Sun
Yaocheng Tan
Peng Xu
Yuchao Zheng
Di Wu
253
14
0
07 May 2025
Beyond Attention: Toward Machines with Intrinsic Higher Mental States
Ahsan Adeel
OffRL
LRM
142
1
0
02 May 2025
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Computer Vision and Pattern Recognition (CVPR), 2025
Haifeng Huang
Xinyi Chen
Yuxiao Chen
Haoyang Li
Xiaoshen Han
Zihao Wang
Tai Wang
Jiangmiao Pang
Zhou Zhao
LM&Ro
318
11
0
30 Apr 2025
Direct Motion Models for Assessing Generated Videos
Kelsey R. Allen
Carl Doersch
Guangyao Zhou
Mohammed Suhail
Danny Driess
...
Thomas Kipf
Mehdi S. M. Sajjadi
Kevin P. Murphy
João Carreira
Sjoerd van Steenkiste
EGVM
DiffM
VGen
419
5
0
30 Apr 2025
CLR-Wire: Towards Continuous Latent Representations for 3D Curve Wireframe Generation
Xueqi Ma
Yong Liu
Tianlong Gao
Qingming Huang
Hui Huang
3DV
AI4CE
328
1
0
27 Apr 2025
Multimodal graph representation learning for website generation based on visual sketch
Tung D. Vu
Chung Hoang
Truong-Son Hy
3DV
241
0
0
25 Apr 2025
Token Sequence Compression for Efficient Multimodal Computing
Yasmine Omri
Parth Shroff
Thierry Tambe
175
4
0
24 Apr 2025
A multi-scale vision transformer-based multimodal GeoAI model for mapping Arctic permafrost thaw
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE J-STARS), 2025
Wenwen Li
Chia-Yu Hsu
Sizhe Wang
Zhining Gu
Yili Yang
Brendan M. Rogers
A. Liljedahl
198
3
0
23 Apr 2025
MR. Video: "MapReduce" is the Principle for Long Video Understanding
Ziqi Pang
Yu-Xiong Wang
VLM
193
5
0
22 Apr 2025
Scaling and Beyond: Advancing Spatial Reasoning in MLLMs Requires New Recipes
Huanyu Zhang
Chengzu Li
Wenshan Wu
Shaoguang Mao
Yan Xia
...
Zheng Zhang
Liang Wang
Liang Wang
Tieniu Tan
Furu Wei
LRM
253
4
0
21 Apr 2025
Cross-attention for State-based model RWKV-7
Liu Xiao
Li Zhiyuan
Lin Yueyu
OffRL
79
0
0
19 Apr 2025
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models
Zhanglin Wu
Tengfei Song
Ning Xie
Mengli Zhu
Weidong Zhang
...
Pengfei Li
Chong Li
Junhao Zhu
Hao Yang
Shiliang Sun
335
2
0
16 Apr 2025
DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis
Efthymios Georgiou
Vassilis Katsouros
Yannis Avrithis
Alexandros Potamianos
317
1
0
15 Apr 2025
Evolved Hierarchical Masking for Self-Supervised Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Zhanzhou Feng
Shiliang Zhang
291
1
0
12 Apr 2025
FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation
Linyan Huang
Haonan Lin
Yanning Zhou
Kaiwen Xiao
249
2
0
10 Apr 2025
EDIT: Enhancing Vision Transformers by Mitigating Attention Sink through an Encoder-Decoder Architecture
Wenfeng Feng
Guoying Sun
Jianlong Wang
Xin Zhang
Jingjing Zhao
Yueyue Liang
Xiang Chen
Duokui Han
251
2
0
09 Apr 2025
Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking
Computer Vision and Pattern Recognition (CVPR), 2025
Junxi Chen
Junhao Dong
Xiaohua Xie
278
1
0
08 Apr 2025
A Self-Supervised Framework for Space Object Behaviour Characterisation
Ian Groves
Andrew Campbell
James Fernandes
Diego Rodriguez
Paul Murray
Massimiliano Vasile
Victoria Nockles
95
0
0
08 Apr 2025
Memory-Modular Classification: Learning to Generalize with Memory Replacement
Dahyun Kang
Ahmet Iscen
Eunchan Jo
Sua Choi
Minsu Cho
Cordelia Schmid
VLM
KELM
OffRL
273
0
0
08 Apr 2025
SmolVLM: Redefining small and efficient multimodal models
Andres Marafioti
Orr Zohar
Miquel Farré
Merve Noyan
Elie Bakouch
...
Hugo Larcher
Mathieu Morlon
Lewis Tunstall
Leandro von Werra
Thomas Wolf
VLM
346
100
0
07 Apr 2025
A Survey of Pathology Foundation Model: Progress and Future Directions
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Conghao Xiong
Hao Chen
Joseph J. Y. Sung
LM&MA
AI4CE
370
5
0
05 Apr 2025
Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization
Kangle Deng
Hsueh-Ti Derek Liu
Yiheng Zhu
Xiaoxia Sun
Chong Shang
Kiran Bhat
Deva Ramanan
Jun-Yan Zhu
Maneesh Agrawala
Tinghui Zhou
297
2
0
03 Apr 2025
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval
Computer Vision and Pattern Recognition (CVPR), 2025
Boseung Jeong
Jicheol Park
Sungyeon Kim
Suha Kwak
221
2
0
03 Apr 2025
AttentiveGRU: Recurrent Spatio-Temporal Modeling for Advanced Radar-Based BEV Object Detection
Loveneet Saini
Mirko Meuter
Hasan Tercan
Tobias Meisen
206
1
0
01 Apr 2025
Beyond Unimodal Boundaries: Generative Recommendation with Multimodal Semantics
Jing Zhu
Mingxuan Ju
Yozen Liu
Danai Koutra
Neil Shah
Tong Zhao
165
3
0
30 Mar 2025
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
Computer Vision and Pattern Recognition (CVPR), 2025
Chen Tang
Cheng Wang
Encheng Su
Xiufeng Song
Xiaohong Liu
Wei-Hong Li
Lei Bai
Wanli Ouyang
Xiangyu Yue
3DGS
AI4TS
186
0
0
26 Mar 2025
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Weili Zeng
Ziyuan Huang
Kaixiang Ji
Manwen Liao
VLM
546
3
0
26 Mar 2025
Latent Beam Diffusion Models for Generating Visual Sequences
Guilherme Fernandes
Vasco Ramos
Regev Cohen
Idan Szpektor
João Magalhães
312
1
0
26 Mar 2025
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
Mingze Xu
Mingfei Gao
Shiyu Li
Jiasen Lu
Zhe Gan
Zhengfeng Lai
Meng Cao
Kai Kang
Yue Yang
Afshin Dehghan
358
13
0
24 Mar 2025
Hierarchy-Aware and Channel-Adaptive Semantic Communication for Bandwidth-Limited Data Fusion
IEEE Wireless Communications Letters (WCL), 2025
Lei Guo
Wei Chen
Yuxuan Sun
Bo Ai
Nikolaos Pappas
T. Quek
134
3
0
22 Mar 2025
Unleashing Vecset Diffusion Model for Fast Shape Generation
Zeqiang Lai
Yunfei Zhao
Zibo Zhao
Haolin Liu
Fuyun Wang
...
Jinwei Huang
Yuhong Liu
Jie Jiang
Chunchao Guo
Xiangyu Yue
DiffM
1.0K
10
0
20 Mar 2025
ACE: A Cardinality Estimator for Set-Valued Queries
Proceedings of the VLDB Endowment (PVLDB), 2025
Yufan Sheng
Xin Cao
Kaiqi Zhao
Yixiang Fang
Jianzhong Qi
Wenjie Zhang
Christian S. Jensen
254
0
0
19 Mar 2025
Cube: A Roblox View of 3D Intelligence
Foundation AI Team Roblox
Kiran Bhat
Nishchaie Khanna
Karun Channa
Tinghui Zhou
...
Kyle Price
Steve Han
Yiqing Wang
A. Singh
David Baszucki
232
5
0
19 Mar 2025
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory
Saket Gurukar
Asim Kadav
VLM
305
1
0
17 Mar 2025
VRsketch2Gaussian: 3D VR Sketch Guided 3D Object Generation with Gaussian Splatting
Songen Gu
Haoxuan Song
Binjie Liu
Qian Yu
Sanyi Zhang
Haiyong Jiang
Jin Huang
Feng Tian
3DGS
3DV
166
3
0
16 Mar 2025
Similarity-Aware Token Pruning: Your VLM but Faster
Ahmadreza Jeddi
Negin Baghbanzadeh
Elham Dolatabadi
Babak Taati
3DV
VLM
268
8
0
14 Mar 2025
FastVID: Dynamic Density Pruning for Fast Video Large Language Models
Leqi Shen
Guoqiang Gong
Tao He
Yifeng Zhang
Pengzhang Liu
Sicheng Zhao
Guiguang Ding
VLM
302
12
0
14 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
358
8
0
13 Mar 2025
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance
Yufan Deng
Xun Guo
Yanjie Wang
Yizhi Wang
Angtian Wang
Shenghai Yuan
Yiding Yang
Bo Liu
Haibin Huang
Chongyang Ma
DiffM
VGen
261
7
0
13 Mar 2025
Hyper3D: Efficient 3D Representation via Hybrid Triplane and Octree Feature for Enhanced 3D Shape Variational Auto-Encoders
Jinpei Guo
Sensen Gao
Jia-Wang Bian
Wanhu Sun
Heliang Zheng
Rongfei Jia
Biwei Huang
260
3
0
13 Mar 2025
Piece it Together: Part-Based Concepting with IP-Priors
Elad Richardson
Kfir Goldberg
Yuval Alaluf
Daniel Cohen-Or
DiffM
190
3
0
13 Mar 2025
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
Computer Vision and Pattern Recognition (CVPR), 2025
Md. Mohaiminul Islam
Tushar Nagarajan
Huiyu Wang
Gedas Bertasius
Lorenzo Torresani
937
10
0
12 Mar 2025
Previous
1
2
3
4
5
6
...
14
15
16
Next