Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2103.03206
Cited By
v1
v2 (latest)
Perceiver: General Perception with Iterative Attention
International Conference on Machine Learning (ICML), 2021
4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"Perceiver: General Perception with Iterative Attention"
50 / 787 papers shown
Title
PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology
Eugene Vorontsov
Adam Casson
Kristen Severson
Eric Zimmermann
Yi Kan Wang
...
Peter Hamilton
William A. Moye
Eugene Vorontsov
Siqi Liu
Thomas J. Fuchs
MedIm
257
63
0
16 May 2024
Cross-sensor self-supervised training and alignment for remote sensing
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE JSTARS), 2024
V. Marsocci
Nicolas Audebert
277
4
0
16 May 2024
A Survey on Transformers in NLP with Focus on Efficiency
Wazib Ansar
Saptarsi Goswami
Amlan Chakrabarti
MedIm
293
12
0
15 May 2024
MedVersa: A Generalist Foundation Model for Medical Image Interpretation
Hong-Yu Zhou
Subathra Adithan
J. N. Acosta
Suvrankar Datta
E. Topol
Pranav Rajpurkar
MedIm
377
39
0
13 May 2024
Topicwise Separable Sentence Retrieval for Medical Report Generation
Junting Zhao
Yang Zhou
Zhihao Chen
Huazhu Fu
Liang Wan
MedIm
199
3
0
07 May 2024
PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection
IEEE International Conference on Robotics and Automation (ICRA), 2024
Zhaoqi Leng
Pei Sun
Tong He
Drago Anguelov
Mingxing Tan
ViT
3DPC
180
5
0
05 May 2024
Adapting to Distribution Shift by Visual Domain Prompt Generation
International Conference on Learning Representations (ICLR), 2024
Zhixiang Chi
Li Gu
Tao Zhong
Huan Liu
Yuanhao Yu
Konstantinos N Plataniotis
Yang Wang
VLM
OOD
242
20
0
05 May 2024
What matters when building vision-language models?
Neural Information Processing Systems (NeurIPS), 2024
Hugo Laurençon
Léo Tronchon
Matthieu Cord
Victor Sanh
VLM
280
270
0
03 May 2024
What Foundation Models can Bring for Robot Learning in Manipulation : A Survey
Dingzhe Li
Yixiang Jin
A. Yong
Yong A
Hongze Yu
...
Huaping Liu
Gang Hua
F. Sun
Jianwei Zhang
Bin Fang
AI4CE
LM&Ro
855
24
0
28 Apr 2024
Step Differences in Instructional Video
Tushar Nagarajan
Lorenzo Torresani
VGen
369
9
0
24 Apr 2024
AutoAD III: The Prequel -- Back to the Pixels
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
275
33
0
22 Apr 2024
Socialized Learning: A Survey of the Paradigm Shift for Edge Intelligence in Networked Systems
Xiaofei Wang
Yunfeng Zhao
Chao Qiu
Qinghua Hu
Victor C. M. Leung
189
9
0
20 Apr 2024
DISC: Latent Diffusion Models with Self-Distillation from Separated Conditions for Prostate Cancer Grading
M. M. Ho
Elham Ghelichkhan
Yosep Chong
Yufei Zhou
Beatrice Knudsen
Tolga Tasdizen
MedIm
DiffM
149
4
0
19 Apr 2024
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory
Zicheng Liu
Li Wang
Siyuan Li
Zedong Wang
Haitao Lin
Stan Z. Li
VLM
199
5
0
17 Apr 2024
UMBRAE: Unified Multimodal Brain Decoding
European Conference on Computer Vision (ECCV), 2024
Weihao Xia
Raoul de Charette
Cengiz Öztireli
Jing-Hao Xue
199
26
0
10 Apr 2024
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
Philip Anastassiou
Zhenyu Tang
Kainan Peng
Dongya Jia
Jiaxin Li
Ming Tu
Yuping Wang
Yuxuan Wang
Mingbo Ma
274
10
0
10 Apr 2024
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min
Shyamal Buch
Arsha Nagrani
Minsu Cho
Cordelia Schmid
LRM
356
61
0
09 Apr 2024
Koala: Key frame-conditioned long video-LLM
Reuben Tan
Ximeng Sun
Ping Hu
Jui-hsien Wang
Hanieh Deilamsalehy
Bryan A. Plummer
Bryan C. Russell
Kate Saenko
339
61
0
05 Apr 2024
PointInfinity: Resolution-Invariant Point Diffusion Models
Computer Vision and Pattern Recognition (CVPR), 2024
Zixuan Huang
Justin Johnson
Shoubhik Debnath
James M. Rehg
Chao-Yuan Wu
152
12
0
04 Apr 2024
MotionChain: Conversational Motion Controllers via Multimodal Prompts
European Conference on Computer Vision (ECCV), 2024
Biao Jiang
Xin Chen
C. Zhang
Fukun Yin
Zhuoyuan Li
Gang Yu
Jiayuan Fan
VGen
LRM
227
20
0
02 Apr 2024
On Difficulties of Attention Factorization through Shared Memory
Uladzislau Yorsh
Martin Holevna
Ondrej Bojar
David Herel
105
1
0
31 Mar 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
215
10
0
28 Mar 2024
A Novel Stochastic Transformer-based Approach for Post-Traumatic Stress Disorder Detection using Audio Recording of Clinical Interviews
M. Dia
G. Khodabandelou
Alice Othmani
231
8
0
28 Mar 2024
Homogeneous Tokenizer Matters: Homogeneous Visual Tokenizer for Remote Sensing Image Understanding
Run Shao
Zhaoyang Zhang
Chao Tao
Yunsheng Zhang
Chengli Peng
Haifeng Li
VLM
269
12
0
27 Mar 2024
Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance
Zan Wang
Yixin Chen
Baoxiong Jia
Puhao Li
Jinlu Zhang
Jingze Zhang
Tengyu Liu
Yixin Zhu
Wei Liang
Siyuan Huang
VGen
DiffM
239
77
0
26 Mar 2024
On permutation-invariant neural networks
Masanari Kimura
Ryotaro Shimizu
Yuki Hirakawa
Ryosuke Goto
Yuki Saito
OOD
AAML
305
16
0
26 Mar 2024
Neural Clustering based Visual Representation Learning
Guikun Chen
Xia Li
Yi Yang
Wenguan Wang
SSL
282
14
0
26 Mar 2024
Residual-based Language Models are Free Boosters for Biomedical Imaging
Zhixin Lai
Jing Wu
Suiyao Chen
Yucheng Zhou
N. Hovakimyan
MedIm
338
35
0
26 Mar 2024
Dia-LLaMA: Towards Large Language Model-driven CT Report Generation
Zhixuan Chen
Luyang Luo
Yequan Bie
Hao Chen
LM&MA
164
31
0
25 Mar 2024
FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos
European Conference on Computer Vision (ECCV), 2024
Florian Langer
Jihong Ju
Georgi Dikov
Gerhard Reitmayr
Mohsen Ghafoorian
3DPC
232
5
0
22 Mar 2024
Unsupervised Audio-Visual Segmentation with Modality Alignment
Swapnil Bhosale
Haosen Yang
Helen Treharne
Jiangkang Deng
Xiatian Zhu
VOS
160
8
0
21 Mar 2024
Learning Decomposable and Debiased Representations via Attribute-Centric Information Bottlenecks
Jinyung Hong
Eunyeong Jeon
Changhoon Kim
Keun Hee Park
Utkarsh Nath
Yezhou Yang
Pavan Turaga
Theodore P. Pavlic
CML
178
0
0
21 Mar 2024
On the Utility of 3D Hand Poses for Action Recognition
European Conference on Computer Vision (ECCV), 2024
Md Salman Shamil
Dibyadip Chatterjee
Fadime Sener
Shugao Ma
Angela Yao
191
10
0
14 Mar 2024
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
International Conference on Machine Learning (ICML), 2024
Jongsuk Kim
Hyeongkeun Lee
Kyeongha Rho
Junmo Kim
Joon Son Chung
181
11
0
14 Mar 2024
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Sun Ao
Weilin Zhao
Xu Han
Cheng Yang
Zhiyuan Liu
Chuan Shi
Maosong Sun
GNN
204
10
0
14 Mar 2024
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
European Conference on Computer Vision (ECCV), 2024
Guanxing Lu
Shiyi Zhang
Ziwei Wang
Changliu Liu
Jiwen Lu
Yansong Tang
306
104
0
13 Mar 2024
DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation
Chen Wang
Haochen Shi
Weizhuo Wang
Ruohan Zhang
Fei-Fei Li
Karen Liu
279
192
0
12 Mar 2024
Synth
2
^2
2
: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
Sahand Sharifzadeh
Christos Kaplanis
Shreya Pathak
D. Kumaran
Anastasija Ilić
Jovana Mitrović
Charles Blundell
Andrea Banino
VLM
200
17
0
12 Mar 2024
Complementing Event Streams and RGB Frames for Hand Mesh Reconstruction
Computer Vision and Pattern Recognition (CVPR), 2024
Jianping Jiang
Xinyu Zhou
Bingxuan Wang
Xiaoming Deng
Chao Xu
Boxin Shi
255
14
0
12 Mar 2024
3DTextureTransformer: Geometry Aware Texture Generation for Arbitrary Mesh Topology
K. Dharma
Clayton T. Morrison
180
0
0
07 Mar 2024
RATSF: Empowering Customer Service Volume Management through Retrieval-Augmented Time-Series Forecasting
Tianfeng Wang
Gaojie Cui
AI4TS
232
1
0
07 Mar 2024
DNAct: Diffusion Guided Multi-Task 3D Policy Learning
Ge Yan
Yueh-hua Wu
Xiaolong Wang
VGen
355
30
0
07 Mar 2024
Adding Multimodal Capabilities to a Text-only Translation Model
Vipin Vijayan
Braeden Bowen
Scott Grigsby
Timothy Anderson
Jeremy Gwinnup
LRM
230
10
0
05 Mar 2024
Unifying Linear-Time Attention via Latent Probabilistic Modelling
Rares Dolga
Marius Cobzarenco
Marius Cobzarenco
David Barber
150
2
0
27 Feb 2024
Parallelized Spatiotemporal Binding
Gautam Singh
Yue Wang
Jiawei Yang
Boris Ivanovic
Sungjin Ahn
Marco Pavone
Tong Che
172
2
0
26 Feb 2024
Disentangled 3D Scene Generation with Layout Learning
Dave Epstein
Ben Poole
B. Mildenhall
Alexei A. Efros
Aleksander Holynski
CoGe
OCL
3DV
173
29
0
26 Feb 2024
VOLoc: Visual Place Recognition by Querying Compressed Lidar Map
Xudong Cai
Yongcai Wang
zhe Huang
Yu Shao
Deying Li
160
7
0
25 Feb 2024
Multimodal Transformer With a Low-Computational-Cost Guarantee
Sungjin Park
Edward Choi
142
2
0
23 Feb 2024
Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning
Johnathan Xie
Yoonho Lee
Annie S. Chen
Chelsea Finn
155
4
0
22 Feb 2024
Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
Fabien Baradel
M. Armando
Salma Galaaoui
Romain Brégier
Philippe Weinzaepfel
Grégory Rogez
Thomas Lucas
3DH
232
54
0
22 Feb 2024
Previous
1
2
3
...
6
7
8
...
14
15
16
Next