Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.03206
Cited By
Perceiver: General Perception with Iterative Attention
4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Perceiver: General Perception with Iterative Attention"
50 / 680 papers shown
Title
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Jinbo Xing
Menghan Xia
Yong Zhang
Haoxin Chen
Wangbo Yu
Hanyuan Liu
Xintao Wang
Tien-Tsin Wong
Ying Shan
VGen
25
198
0
18 Oct 2023
CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition
K. A. Noriy
Xiaosong Yang
Marcin Budka
Jian Jun Zhang
VLM
10
3
0
18 Oct 2023
Sparse Multi-Object Render-and-Compare
Florian Langer
Ignas Budvytis
Roberto Cipolla
3DPC
14
2
0
17 Oct 2023
Detecting Speech Abnormalities with a Perceiver-based Sequence Classifier that Leverages a Universal Speech Model
H. Soltau
Izhak Shafran
Alex Ottenwess
Joseph R. Duffy
Rene L. Utianski
L. Barnard
John L. Stricker
D. Wiepert
David T. Jones
Hugo Botha
46
2
0
16 Oct 2023
Joint Music and Language Attention Models for Zero-shot Music Tagging
Xingjian Du
Zhesong Yu
Jiaju Lin
Bilei Zhu
Qiuqiang Kong
BDL
VLM
33
4
0
16 Oct 2023
Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning
Jiachen Li
Qiaozi Gao
Michael Johnston
Xiaofeng Gao
Xuehai He
Suhaila Shakiah
Hangjie Shi
R. Ghanadan
William Yang Wang
LM&Ro
19
12
0
14 Oct 2023
An Expression Tree Decoding Strategy for Mathematical Equation Generation
Wenqi Zhang
Yongliang Shen
Qingpeng Nong
Zeqi Tan
Zeqi Tan Yanna Ma
Weiming Lu
AIMat
21
4
0
14 Oct 2023
Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Samira Abnar
Omid Saremi
Laurent Dinh
Shantel Wilson
Miguel Angel Bautista
...
Vimal Thilak
Etai Littwin
Jiatao Gu
Josh Susskind
Samy Bengio
19
5
0
13 Oct 2023
Learning to Act from Actionless Videos through Dense Correspondences
Po-Chen Ko
Jiayuan Mao
Yilun Du
Shao-Hua Sun
Josh Tenenbaum
20
73
0
12 Oct 2023
URLOST: Unsupervised Representation Learning without Stationarity or Topology
Zeyu Yun
Juexiao Zhang
Bruno A. Olshausen
Yann LeCun
21
0
0
06 Oct 2023
Perceiver-based CDF Modeling for Time Series Forecasting
Cat P. Le
Chris Cannella
Ali Hasan
Yuting Ng
Vahid Tarokh
AI4TS
8
1
0
03 Oct 2023
Scaling Up Music Information Retrieval Training with Semi-Supervised Learning
Yun-Ning Hung
Ju-Chiang Wang
Minz Won
Duc Le
17
0
0
02 Oct 2023
A Framework for Inference Inspired by Human Memory Mechanisms
Xiangyu Zeng
Jie Lin
Piao Hu
Ruizheng Huang
Zhicheng Zhang
18
2
0
01 Oct 2023
A Survey on Deep Learning Techniques for Action Anticipation
Zeyun Zhong
Manuel Martin
Michael Voit
Juergen Gall
Jürgen Beyerer
24
7
0
29 Sep 2023
Vision Transformers Need Registers
Zilong Chen
Maxime Oquab
Julien Mairal
Huaping Liu
ViT
35
308
0
28 Sep 2023
MotionLM: Multi-Agent Motion Forecasting as Language Modeling
Ari Seff
Brian Cera
Dian Chen
Mason Ng
Aurick Zhou
Nigamaa Nayakanti
Khaled S. Refaat
Rami Al-Rfou
Benjamin Sapp
9
92
0
28 Sep 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
24
15
0
28 Sep 2023
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
Bipin Rajendran
Bashir M. Al-Hashimi
MLLM
VLM
26
2
0
27 Sep 2023
PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation
Shizhe Chen
Ricardo Garcia Pinel
Cordelia Schmid
Ivan Laptev
LM&Ro
3DPC
25
33
0
27 Sep 2023
SeMAnD: Self-Supervised Anomaly Detection in Multimodal Geospatial Datasets
Daria Reshetova
Swetava Ganguli
C. V. K. Iyer
Vipul Pandey
18
3
0
26 Sep 2023
MUTEX: Learning Unified Policies from Multimodal Task Specifications
Rutav Shah
Roberto Martín-Martín
Yuke Zhu
OffRL
39
54
0
25 Sep 2023
Only 5\% Attention Is All You Need: Efficient Long-range Document-level Neural Machine Translation
Zihan Liu
Zewei Sun
Shanbo Cheng
Shujian Huang
Mingxuan Wang
18
1
0
25 Sep 2023
Learning Invariant Representations with a Nonparametric Nadaraya-Watson Head
Alan Q. Wang
Minh Nguyen
M. Sabuncu
CML
OOD
27
1
0
23 Sep 2023
Investigating Efficient Deep Learning Architectures For Side-Channel Attacks on AES
Yohai-Eliel Berreby
L. Sauvage
AAML
13
2
0
22 Sep 2023
Associative Transformer
Yuwei Sun
H. Ochiai
Zhirong Wu
Stephen Lin
Ryota Kanai
ViT
41
0
0
22 Sep 2023
DreamLLM: Synergistic Multimodal Comprehension and Creation
Runpei Dong
Chunrui Han
Yuang Peng
Zekun Qi
Zheng Ge
...
Hao-Ran Wei
Xiangwen Kong
Xiangyu Zhang
Kaisheng Ma
Li Yi
MLLM
28
168
0
20 Sep 2023
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Zigang Geng
Binxin Yang
Tiankai Hang
Chen Li
Shuyang Gu
...
Jianmin Bao
Zheng-Wei Zhang
Han Hu
Dongdong Chen
Baining Guo
DiffM
VLM
38
92
0
07 Sep 2023
Text-to-feature diffusion for audio-visual few-shot learning
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
VLM
19
2
0
07 Sep 2023
Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Hyungseob Lim
Kyungguen Byun
Sunkuk Moon
Erik Visser
DiffM
24
2
0
06 Sep 2023
Extract-and-Adaptation Network for 3D Interacting Hand Mesh Recovery
J. Park
Daniel Sungho Jung
Gyeongsik Moon
Kyoung Mu Lee
16
6
0
05 Sep 2023
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Zhuofan Xia
Xuran Pan
Shiji Song
Li Erran Li
Gao Huang
ViT
19
22
0
04 Sep 2023
Multitask Deep Learning for Accurate Risk Stratification and Prediction of Next Steps for Coronary CT Angiography Patients
Juan Lu
Bennamoun
J. Stewart
J. Eshraghian
Yanbin Liu
B. Chow
Frank M. Sanfilippo
Girish Dwivedi
OOD
12
1
0
01 Sep 2023
GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields
Yanjie Ze
Ge Yan
Yueh-hua Wu
Annabella Macaluso
Yuying Ge
Jianglong Ye
Nicklas Hansen
Li Erran Li
X. Wang
DiffM
AI4CE
12
80
0
31 Aug 2023
LatentDR: Improving Model Generalization Through Sample-Aware Latent Degradation and Restoration
Ran Liu
Sahil Khose
Jingyun Xiao
Lakshmi Sathidevi
Keerthan Ramnath
Z. Kira
Eva L. Dyer
19
3
0
28 Aug 2023
MEGA: Multimodal Alignment Aggregation and Distillation For Cinematic Video Segmentation
Najmeh Sadoughi
Xinyu Li
Avijit Vajpayee
D. Fan
Bing Shuai
H. Santos-Villalobos
Vimal Bhat
M. Rohith
18
3
0
22 Aug 2023
ViT-Lens: Initiating Omni-Modal Exploration through 3D Insights
Weixian Lei
Yixiao Ge
Jianfeng Zhang
Dylan Sun
Kun Yi
Ying Shan
Mike Zheng Shou
25
1
0
20 Aug 2023
Language-Guided Diffusion Model for Visual Grounding
Sijia Chen
Baochun Li
27
5
0
18 Aug 2023
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
Patrick Butlin
R. Long
Eric Elmoznino
Yoshua Bengio
Jonathan C. P. Birch
...
L. Mudrik
Megan A. K. Peters
Eric Schwitzgebel
Jonathan Simon
Rufin VanRullen
LLMAG
16
95
0
17 Aug 2023
ModelScope Text-to-Video Technical Report
Jiuniu Wang
Hangjie Yuan
Dayou Chen
Yingya Zhang
Xiang Wang
Shiwei Zhang
VGen
DiffM
16
388
0
12 Aug 2023
Zero-shot Text-driven Physically Interpretable Face Editing
Yapeng Meng
Songru Yang
Xuyang Hu
Rui Zhao
Lincheng Li
Z. Shi
Zhengxia Zou
DiffM
16
0
0
11 Aug 2023
Deformable Mixer Transformer with Gating for Multi-Task Learning of Dense Prediction
Yangyang Xu
Yibo Yang
Bernard Ghanemm
L. Zhang
Du Bo
Dacheng Tao
8
1
0
10 Aug 2023
Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data
Chaoyi Wu
Xiaoman Zhang
Ya-Qin Zhang
Yanfeng Wang
Weidi Xie
MedIm
LM&MA
24
140
0
04 Aug 2023
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Anas Awadalla
Irena Gao
Josh Gardner
Jack Hessel
Yusuf Hanafy
...
Simon Kornblith
Pang Wei Koh
Gabriel Ilharco
Mitchell Wortsman
Ludwig Schmidt
MLLM
10
399
0
02 Aug 2023
From Sparse to Soft Mixtures of Experts
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
121
114
0
02 Aug 2023
Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model
Sankalpa Rijal
Rajan Neupane
Saroj Prasad Mainali
Shishir K. Regmi
Shanta Maharjan
14
0
0
29 Jul 2023
Towards Generalist Biomedical AI
Tao Tu
Shekoofeh Azizi
Danny Driess
M. Schaekermann
Mohamed Amin
...
Yossi Matias
K. Singhal
Peter R. Florence
Alan Karthikesalingam
Vivek Natarajan
LM&MA
MedIm
AI4MH
33
239
0
26 Jul 2023
OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured Traffic Scenarios
Aditya Nalgunda Ganesh
Dhruval Pobbathi Badrinath
Harshith Mohan Kumar
S.Sony Priya
Surabhi Narayan
ViT
13
3
0
20 Jul 2023
Does Visual Pretraining Help End-to-End Reasoning?
Chen Sun
Calvin Luo
Xingyi Zhou
Anurag Arnab
Cordelia Schmid
OCL
LRM
ViT
28
3
0
17 Jul 2023
Transformers are Universal Predictors
Sourya Basu
Moulik Choraria
L. Varshney
18
4
0
15 Jul 2023
Dual-Query Multiple Instance Learning for Dynamic Meta-Embedding based Tumor Classification
Simon Holdenried-Krafft
Peter Somers
Ivonne A. Montes-Majarro
Diana Silimon
Cristina Tarín
F. Fend
Hendrik P. A. Lensch
MedIm
14
3
0
14 Jul 2023
Previous
1
2
3
...
6
7
8
...
12
13
14
Next