Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.03206
Cited By
Perceiver: General Perception with Iterative Attention
4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Perceiver: General Perception with Iterative Attention"
50 / 680 papers shown
Title
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation
Hanzhi Chen
Boyang Sun
Anran Zhang
Marc Pollefeys
Stefan Leutenegger
LM&Ro
63
0
0
10 Mar 2025
iManip: Skill-Incremental Learning for Robotic Manipulation
Zexin Zheng
Jia-Feng Cai
Xiao-Ming Wu
Yi-Lin Wei
Yu-Ming Tang
Wei-Shi Zheng
CLL
54
0
0
10 Mar 2025
Removing Averaging: Personalized Lip-Sync Driven Characters Based on Identity Adapter
Yanyu Zhu
Licheng Bai
Jintao Xu
Jiwei Tang
Hai-tao Zheng
31
0
0
09 Mar 2025
Optimal Transport for Brain-Image Alignment: Unveiling Redundancy and Synergy in Neural Information Processing
Yang Xiao
Wang Lu
Jie Ji
Ruimeng Ye
Gen Li
Xiaolong Ma
Bo Hui
OT
43
0
0
09 Mar 2025
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
85
2
0
08 Mar 2025
ALVI Interface: Towards Full Hand Motion Decoding for Amputees Using sEMG
A. Kovalev
Anna Makarova
Petr Chizhov
Matvey Antonov
Gleb Duplin
...
Viacheslav Gostevskii
Vladimir Bessonov
Andrey Tsurkan
Mikhail Korobok
Aleksejs Timčenko
34
0
0
28 Feb 2025
Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation
Shaharukh Khan
Ayush Tarun
Ali Faraz
Palash Kamble
Vivek Dahiya
Praveen Kumar Pokala
Ashish Kulkarni
Chandra Khatri
Abhinav Ravi
Shubham Agarwal
53
0
0
27 Feb 2025
Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions
R. Lucassen
Sander P.J. Moonemans
Tijn van de Luijtgaarden
Gerben E. Breimer
W. Blokx
M. Veta
MedIm
60
1
0
26 Feb 2025
Graph Perceiver IO: A General Architecture for Graph Structured Data
Seyun Bae
Hoyoon Byun
Changdae Oh
Yoon-Sik Cho
Kyungwoo Song
GNN
87
2
0
24 Feb 2025
DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications
Ibrahim Fayad
Max Zimmer
Martin Schwartz
P. Ciais
Fabian Gieseke
Gabriel Belouze
Sarah Brood
A. D. Truchis
Alexandre d’Aspremont
AI4TS
38
0
0
24 Feb 2025
Chitrarth: Bridging Vision and Language for a Billion People
Shaharukh Khan
Ayush Tarun
Abhinav Ravi
Ali Faraz
Akshat Patidar
Praveen Kumar Pokala
Anagha Bhangare
Raja Kolla
Chandra Khatri
Shubham Agarwal
VLM
110
1
0
21 Feb 2025
FreeBlend: Advancing Concept Blending with Staged Feedback-Driven Interpolation Diffusion
Yufan Zhou
Haoyu Shen
Huan Wang
DiffM
97
0
0
17 Feb 2025
LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs
Sumin An
Junyoung Sung
Wonpyo Park
Chanjun Park
Paul Hongsuck Seo
90
0
0
10 Feb 2025
VILP: Imitation Learning with Latent Video Planning
Zhengtong Xu
Qiang Qiu
Yu She
VGen
61
0
0
03 Feb 2025
Imitation Game for Adversarial Disillusion with Multimodal Generative Chain-of-Thought Role-Play
Ching-Chun Chang
Fan-Yun Chen
Shih-Hong Gu
Kai Gao
Hanrui Wang
Isao Echizen
AAML
69
0
0
31 Jan 2025
CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation
Hwan Heo
Jangyeong Kim
Seongyeong Lee
Jeong A Wi
Junyoung Choi
Sangjun Ahn
46
0
0
17 Jan 2025
Principles for Responsible AI Consciousness Research
Patrick Butlin
Theodoros Lappas
33
1
0
13 Jan 2025
EdgeTAM: On-Device Track Anything Model
Chong Zhou
Chenchen Zhu
Yunyang Xiong
Saksham Suri
Fanyi Xiao
...
Raghuraman Krishnamoorthi
Bo Dai
Chen Change Loy
Vikas Chandra
Bilge Soran
VLM
58
0
0
13 Jan 2025
Natural Language Supervision for Low-light Image Enhancement
Jiahui Tang
Kaihua Zhou
Zhijian Luo
Yueen Hou
34
0
0
11 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Jiaqi Wang
Kaipeng Zhang
D. Lin
Yu Qiao
Peng Gao
Xiangyu Yue
MLLM
104
102
0
10 Jan 2025
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
S. Chen
Yuxiao Luo
Yue Ma
Yu Qiao
Yali Wang
Mamba
42
1
0
08 Jan 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
37
28
0
02 Jan 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
56
23
0
31 Dec 2024
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Hanguang Xiao
Feizhong Zhou
X. Liu
Tianqi Liu
Zhipeng Li
Xin Liu
Xiaoxuan Huang
AILaw
LM&MA
LRM
59
17
0
31 Dec 2024
AV-EmoDialog: Chat with Audio-Visual Users Leveraging Emotional Cues
Se Jin Park
Yeonju Kim
Hyeongseop Rha
Bella Godiva
Y. Ro
36
1
0
23 Dec 2024
A Full Transformer-based Framework for Automatic Pain Estimation using Videos
Stefanos Gkikas
M. Tsiknakis
MedIm
ViT
99
8
0
19 Dec 2024
Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models
Xinghang Li
Peiyan Li
Minghuan Liu
Dong Wang
Jirong Liu
Bingyi Kang
Xiao Ma
Tao Kong
Hanbo Zhang
Huaping Liu
LM&Ro
88
14
0
18 Dec 2024
A Concept-Centric Approach to Multi-Modality Learning
Yuchong Geng
Ao Tang
70
0
0
18 Dec 2024
Advances in Transformers for Robotic Applications: A Review
Nikunj Sanghai
Nik Bear Brown
AI4CE
70
0
0
13 Dec 2024
A Decade of Deep Learning: A Survey on The Magnificent Seven
Dilshod Azizov
Muhammad Arslan Manzoor
Velibor Bojkovic
Yingxu Wang
Z. Wang
...
Liang Li
Siwei Liu
Yu Zhong
Wei Liu
Shangsong Liang
OOD
AI4TS
MedIm
111
0
0
13 Dec 2024
GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning
Y. Wang
Zhikang Zhang
Jue Wang
D. Fan
Zhenlin Xu
Linda Liu
Xiang Hao
Vimal Bhat
Xinyu Li
VLM
69
1
0
10 Dec 2024
AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation
Guanxing Lu
Tengbo Yu
Haoyuan Deng
Season Si Chen
Yansong Tang
Ziwei Wang
70
3
0
09 Dec 2024
PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation
Ao Wang
Hui Chen
Jianchao Tan
K. Zhang
Xunliang Cai
Zijia Lin
J. Han
Guiguang Ding
VLM
77
3
0
04 Dec 2024
SIL-RRT*: Learning Sampling Distribution through Self Imitation Learning
Xuzhe Dang
Stefan Edelkamp
58
0
0
26 Nov 2024
Solaris: A Foundation Model of the Sun
Harris Abdul Majid
Pietro Sittoni
Francesco Tudisco
59
0
0
25 Nov 2024
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luis Vilaca
Yi Yu
Paula Vinan
68
0
0
24 Nov 2024
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
Sanghyeok Lee
Joonmyung Choi
Hyunwoo J. Kim
105
3
0
22 Nov 2024
SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers
Shravan Venkatraman
Jaskaran Singh Walia
J. Raheja
ViT
28
0
0
14 Nov 2024
NeuralDEM -- Real-time Simulation of Industrial Particulate Flows
Benedikt Alkin
Tobias Kronlachner
Samuele Papa
Stefan Pirker
Thomas Lichtenegger
Johannes Brandstetter
PINN
AI4CE
34
1
1
14 Nov 2024
Moving Off-the-Grid: Scene-Grounded Video Representations
Sjoerd van Steenkiste
Daniel Zoran
Yi Yang
Yulia Rubanova
Rishabh Kabra
...
Thomas Keck
João Carreira
Alexey Dosovitskiy
Mehdi S. M. Sajjadi
Thomas Kipf
26
3
0
08 Nov 2024
Wave Network: An Ultra-Small Language Model
Xin Zhang
Victor S. Sheng
39
1
0
04 Nov 2024
Adaptive Length Image Tokenization via Recurrent Allocation
Shivam Duggal
Phillip Isola
Antonio Torralba
William T. Freeman
VLM
24
4
0
04 Nov 2024
PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views
Xin Fei
Wenzhao Zheng
Yueqi Duan
W. Zhan
M. Tomizuka
Kurt Keutzer
Jiwen Lu
3DGS
30
3
0
24 Oct 2024
PerspectiveNet: Multi-View Perception for Dynamic Scene Understanding
Vinh Nguyen
3DV
16
0
0
22 Oct 2024
ARCADE: Scalable Demonstration Collection and Generation via Augmented Reality for Imitation Learning
Yue Yang
Bryce Ikeda
Gedas Bertasius
D. Szafir
16
4
0
21 Oct 2024
SEA: State-Exchange Attention for High-Fidelity Physics Based Transformers
Parsa Esmati
Amirhossein Dadashzadeh
Vahid Goodarzi
Nicolas Larrosa
Nicolo Grilli
24
0
0
20 Oct 2024
Generalized Multimodal Fusion via Poisson-Nernst-Planck Equation
Jiayu Xiong
Jing Wang
Hengjing Xiang
Jun Xue
Chen Xu
Zhouqiang Jiang
22
0
0
20 Oct 2024
AugInsert: Learning Robust Visual-Force Policies via Data Augmentation for Object Assembly Tasks
Ryan Diaz
Adam Imdieke
Vivek Veeriah
Karthik Desingh
23
0
0
19 Oct 2024
Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Pingyi Chen
Zhongyi Shui
Chenglu Zhu
Lin Yang
MedIm
32
4
0
18 Oct 2024
Efficient Vision-Language Models by Summarizing Visual Tokens into Compact Registers
Yuxin Wen
Qingqing Cao
Qichen Fu
Sachin Mehta
Mahyar Najibi
VLM
25
4
0
17 Oct 2024
Previous
1
2
3
4
5
...
12
13
14
Next