Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2103.03206
Cited By
v1
v2 (latest)
Perceiver: General Perception with Iterative Attention
International Conference on Machine Learning (ICML), 2021
4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"Perceiver: General Perception with Iterative Attention"
50 / 788 papers shown
Title
Hyper3D: Efficient 3D Representation via Hybrid Triplane and Octree Feature for Enhanced 3D Shape Variational Auto-Encoders
Jinpei Guo
Sensen Gao
Jia-Wang Bian
Wanhu Sun
Heliang Zheng
Rongfei Jia
Biwei Huang
300
3
0
13 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
434
10
0
13 Mar 2025
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance
Yufan Deng
Xun Guo
Yanjie Wang
Yizhi Wang
Angtian Wang
Shenghai Yuan
Yiding Yang
Bo Liu
Haibin Huang
Chongyang Ma
DiffM
VGen
301
7
0
13 Mar 2025
Multi-Modal Foundation Models for Computational Pathology: A Survey
Dong Li
Guihong Wan
Xintao Wu
Xinyu Wu
Xiaohui Chen
Yi He
Christine G. Lian
Peter K. Sorger
Yevgeniy R. Semenov
Chen Zhao
MedIm
412
5
0
12 Mar 2025
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
Computer Vision and Pattern Recognition (CVPR), 2025
Md. Mohaiminul Islam
Tushar Nagarajan
Huiyu Wang
Gedas Bertasius
Lorenzo Torresani
985
10
0
12 Mar 2025
iManip: Skill-Incremental Learning for Robotic Manipulation
Zexin Zheng
Jia-Feng Cai
Xiao-Ming Wu
Yi-Lin Wei
Yu-Ming Tang
Wei-Shi Zheng
CLL
230
4
0
10 Mar 2025
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation
Computer Vision and Pattern Recognition (CVPR), 2025
Hanzhi Chen
Boyang Sun
Anran Zhang
Marc Pollefeys
Stefan Leutenegger
LM&Ro
388
28
0
10 Mar 2025
Optimal Transport for Brain-Image Alignment: Unveiling Redundancy and Synergy in Neural Information Processing
Yang Xiao
Wang Lu
Jie Ji
Ruimeng Ye
Gen Li
Xiaolong Ma
Bo Hui
OT
286
0
0
09 Mar 2025
Removing Averaging: Personalized Lip-Sync Driven Characters Based on Identity Adapter
Yanyu Zhu
Licheng Bai
Jintao Xu
Jiwei Tang
313
0
0
09 Mar 2025
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
Computer Vision and Pattern Recognition (CVPR), 2025
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
263
29
0
08 Mar 2025
ALVI Interface: Towards Full Hand Motion Decoding for Amputees Using sEMG
A. Kovalev
Anna Makarova
Petr Chizhov
Matvey Antonov
Gleb Duplin
...
Viacheslav Gostevskii
Vladimir Bessonov
Andrey Tsurkan
Mikhail Korobok
Aleksejs Timčenko
82
1
0
28 Feb 2025
Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation
Conference on Machine Translation (WMT), 2025
Shaharukh Khan
Ayush Tarun
Ali Faraz
Palash Kamble
Vivek Dahiya
Praveen Kumar Pokala
Ashish Kulkarni
Chandra Khatri
Abhinav Ravi
Shubham Agarwal
862
6
0
27 Feb 2025
Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
R. Lucassen
Sander P.J. Moonemans
Tijn van de Luijtgaarden
Gerben E. Breimer
W. Blokx
M. Veta
MedIm
282
3
0
26 Feb 2025
DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications
Ibrahim Fayad
Max Zimmer
Martin Schwartz
P. Ciais
Fabian Gieseke
Gabriel Belouze
Sarah Brood
A. D. Truchis
Alexandre d’Aspremont
AI4TS
352
0
0
24 Feb 2025
Graph Perceiver IO: A General Architecture for Graph Structured Data
Pattern Recognition (Pattern Recogn.), 2022
Seyun Bae
Hoyoon Byun
Changdae Oh
Yoon-Sik Cho
Kyungwoo Song
GNN
366
3
0
24 Feb 2025
Chitrarth: Bridging Vision and Language for a Billion People
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Shaharukh Khan
Ayush Tarun
Abhinav Ravi
Ali Faraz
Akshat Patidar
Praveen Kumar Pokala
Anagha Bhangare
Raja Kolla
Chandra Khatri
Shubham Agarwal
VLM
538
8
0
21 Feb 2025
Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers
A. Fuller
Yousef Yassin
Daniel G. Kyrollos
Evan Shelhamer
James R. Green
427
1
0
20 Feb 2025
LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Sumin An
Junyoung Sung
Wonpyo Park
Chanjun Park
Paul Hongsuck Seo
589
0
0
10 Feb 2025
FreeBlend: Advancing Concept Blending with Staged Feedback-Driven Interpolation Diffusion
Yufan Zhou
Haoyu Shen
Huan Wang
DiffM
607
6
0
08 Feb 2025
VILP: Imitation Learning with Latent Video Planning
IEEE Robotics and Automation Letters (IEEE RA-L), 2025
Zhengtong Xu
Qiang Qiu
Yu She
VGen
253
4
0
03 Feb 2025
Imitation Game for Adversarial Disillusion with Multimodal Generative Chain-of-Thought Role-Play
Ching-Chun Chang
Fan-Yun Chen
Shih-Hong Gu
Kai Gao
Hanrui Wang
Isao Echizen
AAML
973
0
0
31 Jan 2025
Continuous 3D Perception Model with Persistent State
Computer Vision and Pattern Recognition (CVPR), 2025
Qianqian Wang
Yifei Zhang
Aleksander Holyñski
Alexei A. Efros
Angjoo Kanazawa
VGen
307
208
0
21 Jan 2025
CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation
Hwan Heo
Jangyeong Kim
Seongyeong Lee
Jeong A Wi
Junyoung Choi
Sangjun Ahn
241
0
0
17 Jan 2025
Principles for Responsible AI Consciousness Research
Journal of Artificial Intelligence Research (JAIR), 2025
Patrick Butlin
Theodoros Lappas
163
9
0
13 Jan 2025
EdgeTAM: On-Device Track Anything Model
Computer Vision and Pattern Recognition (CVPR), 2025
Chong Zhou
Chenchen Zhu
Yunyang Xiong
Saksham Suri
Fanyi Xiao
...
Raghuraman Krishnamoorthi
Bo Dai
Chen Change Loy
Vikas Chandra
Bilge Soran
VLM
284
8
0
13 Jan 2025
Natural Language Supervision for Low-light Image Enhancement
Jiahui Tang
Kaihua Zhou
Zhijian Luo
Yueen Hou
295
1
0
11 Jan 2025
Towards Generalizable Trajectory Prediction Using Dual-Level Representation Learning And Adaptive Prompting
Computer Vision and Pattern Recognition (CVPR), 2025
Kaouther Messaoud
Matthieu Cord
Alexandre Alahi
200
3
0
10 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
Computer Vision and Pattern Recognition (CVPR), 2023
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Yuan Liu
Kaipeng Zhang
Dahua Lin
Yu Qiao
Shiyang Feng
Xiangyu Yue
MLLM
536
190
0
10 Jan 2025
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
AAAI Conference on Artificial Intelligence (AAAI), 2025
Tian Jin
Yuxiao Luo
Yue Ma
Yu Qiao
Yali Wang
Mamba
250
6
0
08 Jan 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Computer Vision and Pattern Recognition (CVPR), 2022
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
299
33
0
02 Jan 2025
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Information Fusion (Inf. Fusion), 2024
Hanguang Xiao
Feizhong Zhou
Xianglong Liu
Tianqi Liu
Zhipeng Li
Xin Liu
Xiaoxuan Huang
AILaw
LM&MA
LRM
415
78
0
31 Dec 2024
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Computer Vision and Pattern Recognition (CVPR), 2023
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Celine Lee
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
252
38
0
31 Dec 2024
An Ensemble Approach to Short-form Video Quality Assessment Using Multimodal LLM
Wen Wen
Yilin Wang
Neil Birkbeck
Balu Adsumilli
169
5
0
24 Dec 2024
AV-EmoDialog: Chat with Audio-Visual Users Leveraging Emotional Cues
Se Jin Park
Yeonju Kim
Hyeongseop Rha
Bella Godiva
Y. Ro
140
2
0
23 Dec 2024
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Xuying Zhang
Yutong Liu
Yangguang Li
Renrui Zhang
Yong Liu
...
Wanli Ouyang
Zhiwei Xiong
Shiyang Feng
Qibin Hou
Ming-Ming Cheng
604
8
0
22 Dec 2024
A Full Transformer-based Framework for Automatic Pain Estimation using Videos
Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2023
Stefanos Gkikas
Manolis Tsiknakis
MedIm
ViT
252
14
0
19 Dec 2024
Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models
Xinghang Li
Peiyan Li
Minghuan Liu
Dong Wang
Jirong Liu
Bingyi Kang
Xiao Ma
Tao Kong
Hanbo Zhang
Huaping Liu
LM&Ro
440
90
0
18 Dec 2024
A Concept-Centric Approach to Multi-Modality Learning
Yuchong Geng
Ao Tang
296
0
0
18 Dec 2024
AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities
Computer Vision and Pattern Recognition (CVPR), 2024
Guillaume Astruc
Nicolas Gonthier
Clement Mallet
Loic Landrieu
282
5
0
18 Dec 2024
Advances in Transformers for Robotic Applications: A Review
Nikunj Sanghai
Nik Bear Brown
AI4CE
347
5
0
13 Dec 2024
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Computer Vision and Pattern Recognition (CVPR), 2024
Orr Zohar
Xiaohan Wang
Yann Dubois
Nikhil Mehta
Tong Xiao
...
Xiaofang Wang
F. Xu
Ning Zhang
Serena Yeung-Levy
Xide Xia
VLM
371
0
0
13 Dec 2024
A Decade of Deep Learning: A Survey on The Magnificent Seven
Dilshod Azizov
Muhammad Arslan Manzoor
Velibor Bojkovic
Yingxu Wang
Liang Luo
...
Liang Li
Houcheng Su
Yu Zhong
Wei Liu
Shangsong Liang
OOD
AI4TS
MedIm
272
0
0
13 Dec 2024
GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Yanjie Wang
Zhikang Zhang
Jue Wang
D. Fan
Zhenlin Xu
Linda Liu
Xiang Hao
Vimal Bhat
Xinyu Li
VLM
255
1
0
10 Dec 2024
AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation
Guanxing Lu
Tengbo Yu
Haoyuan Deng
Season Si Chen
Yansong Tang
Ziwei Wang
404
9
0
09 Dec 2024
PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation
Ao Wang
Hui Chen
Jianchao Tan
Jianchao Tan
Xunliang Cai
Zijia Lin
Jiawei Han
Jungong Han
Guiguang Ding
VLM
480
5
0
04 Dec 2024
SIL-RRT*: Learning Sampling Distribution through Self Imitation Learning
Xuzhe Dang
Stefan Edelkamp
301
0
0
26 Nov 2024
Solaris: A Foundation Model of the Sun
Harris Abdul Majid
Pietro Sittoni
Francesco Tudisco
154
4
0
25 Nov 2024
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
ACM Computing Surveys (ACM CSUR), 2024
Luis Vilaca
Yi Yu
Paula Vinan
442
3
0
24 Nov 2024
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
Computer Vision and Pattern Recognition (CVPR), 2024
Sanghyeok Lee
Joonmyung Choi
Hyunwoo J. Kim
425
21
0
22 Nov 2024
SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers
Shravan Venkatraman
Jaskaran Singh Walia
J. Raheja
ViT
459
4
0
14 Nov 2024
Previous
1
2
3
4
5
...
14
15
16
Next