Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.03206
Cited By
Perceiver: General Perception with Iterative Attention
4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Perceiver: General Perception with Iterative Attention"
50 / 680 papers shown
Title
Multi-State-Action Tokenisation in Decision Transformers for Multi-Discrete Action Spaces
Perusha Moodley
Pramod S. Kaushik
Dhillu Thambi
Mark Trovinger
Praveen Paruchuri
Xia Hong
Benjamin Rosman
36
0
0
01 Jul 2024
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
William Berman
A. Peysakhovich
23
4
0
26 Jun 2024
MatchTime: Towards Automatic Soccer Game Commentary Generation
Jiayuan Rao
Haoning Wu
Chang-rui Liu
Yanfeng Wang
Weidi Xie
24
7
0
26 Jun 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
...
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
3DV
MLLM
37
278
0
24 Jun 2024
GeoMFormer: A General Architecture for Geometric Molecular Representation Learning
Tianlang Chen
Shengjie Luo
Di He
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
AI4CE
29
5
0
24 Jun 2024
Towards Zero-Shot Text-To-Speech for Arabic Dialects
Khai Duy Doan
Abdul Waheed
Muhammad Abdul-Mageed
30
0
0
24 Jun 2024
Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models
Yang Zhang
Chenjia Bai
Bin Zhao
Junchi Yan
Xiu Li
Xuelong Li
OffRL
17
0
0
22 Jun 2024
Open-vocabulary Pick and Place via Patch-level Semantic Maps
Mingxi Jia
Haojie Huang
Zhewen Zhang
Chenghao Wang
Linfeng Zhao
Dian Wang
J. Liu
Robin Walters
Robert Platt
Stefanie Tellex
LM&Ro
37
5
0
21 Jun 2024
Multi-modal Transfer Learning between Biological Foundation Models
Juan Jose Garau-Luis
Patrick Bordes
Liam Gonzalez
Masa Roller
Bernardo P. de Almeida
...
Stefan Laurent
Jan Grzegorzewski
Maren Lang
Thomas Pierrot
Guillaume Richard
AI4CE
28
3
0
20 Jun 2024
In-Context In-Context Learning with Transformer Neural Processes
Matthew Ashman
Cristiana-Diana Diaconu
Adrian Weller
Richard E. Turner
18
3
0
19 Jun 2024
Approximately Equivariant Neural Processes
Matthew Ashman
Cristiana-Diana Diaconu
Adrian Weller
W. Bruinsma
Richard E. Turner
BDL
32
1
0
19 Jun 2024
Recurrence over Video Frames (RoVF) for the Re-identification of Meerkats
Mitchell Rogers
Kobe Knowles
Gael Gendron
Shahrokh Heidari
David Arturo Soriano Valdez
Mihailo Azhar
Padriac O'Leary
Simon Eyre
Michael Witbrock
Patrice Delmas
27
0
0
18 Jun 2024
Translation Equivariant Transformer Neural Processes
Matthew Ashman
Cristiana-Diana Diaconu
Junhyuck Kim
Lakee Sivaraya
Stratis Markou
James Requeima
W. Bruinsma
Richard E. Turner
33
4
0
18 Jun 2024
Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers
Omer Sahin Tas
Royden Wagner
39
1
0
17 Jun 2024
Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
Yicong Jiang
Tianzi Wang
Xurong Xie
Juan Liu
Wei Sun
Nan Yan
Hui Chen
Lan Wang
Xunying Liu
Feng Tian
16
1
0
14 Jun 2024
Contrastive Imitation Learning for Language-guided Multi-Task Robotic Manipulation
Teli Ma
Jiaming Zhou
Zifan Wang
Ronghe Qiu
Junwei Liang
40
8
0
14 Jun 2024
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing
Neha Sahipjohn
Ashishkumar Gudmalwar
Nirmesh Shah
Pankaj Wasnik
R. Shah
24
5
0
13 Jun 2024
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Yi-Fan Zhang
Qingsong Wen
Chaoyou Fu
Xue Wang
Zhang Zhang
L. Wang
Rong Jin
34
40
0
12 Jun 2024
Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities
Sai Munikoti
Ian Stewart
Sameera Horawalavithana
Henry Kvinge
Tegan H. Emerson
Sandra E Thompson
Karl Pazdernik
35
2
0
08 Jun 2024
CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling
Matthew Fortier
Mats L. Richter
O. Sonnentag
Chris Pal
AI4CE
18
0
0
07 Jun 2024
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
Lingchen Meng
Jianwei Yang
Rui Tian
Xiyang Dai
Zuxuan Wu
Jianfeng Gao
Yu-Gang Jiang
VLM
22
8
0
06 Jun 2024
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Jinlong Xue
Yayue Deng
Yicheng Han
Yingming Gao
Ya Li
35
4
0
06 Jun 2024
Item-Language Model for Conversational Recommendation
Li Yang
Anushya Subbiah
Hardik Patel
Judith Yue Li
Yanwei Song
Reza Mirghaderi
Vikram Aggarwal
KELM
24
4
0
05 Jun 2024
Diffusion Features to Bridge Domain Gap for Semantic Segmentation
Yuxiang Ji
Boyong He
Chenyuan Qu
Zhuoyue Tan
Chuan Qin
Liaoni Wu
37
2
0
02 Jun 2024
Direct Cardiac Segmentation from Undersampled K-space Using Transformers
Yundi Zhang
Nil Stolt Ansó
Jiazhen Pan
Wenqi Huang
Kerstin Hammernik
Daniel Rueckert
MedIm
42
3
0
31 May 2024
Evaluating Vision-Language Models on Bistable Images
Artemis Panagopoulou
Coby Melkin
Chris Callison-Burch
39
0
0
29 May 2024
VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers
Jun Zheng
Fuwei Zhao
Youjiang Xu
Xin Dong
Xiaodan Liang
VGen
DiffM
26
5
0
28 May 2024
Lateralization MLP: A Simple Brain-inspired Architecture for Diffusion
Zizhao Hu
Mohammad Rostami
19
0
0
25 May 2024
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
Yue Zhang
Hehe Fan
Yi Yang
41
3
0
24 May 2024
Planted: a dataset for planted forest identification from multi-satellite time series
L. M. Pazos-Outón
Cristina Nader Vasconcelos
Anton Raichuk
Anurag Arnab
Dan Morris
Maxim Neumann
31
3
0
24 May 2024
CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner
Weiyu Li
Jiarui Liu
Rui Chen
Yixun Liang
Xuelin Chen
Ping Tan
Xiaoxiao Long
DiffM
21
48
0
23 May 2024
Direct3D: Scalable Image-to-3D Generation via 3D Latent Diffusion Transformer
Shuang Wu
Youtian Lin
Feihu Zhang
Yifei Zeng
Jingxi Xu
Philip H. S. Torr
Xun Cao
Yao Yao
23
45
0
23 May 2024
Transformers for Image-Goal Navigation
Nikhilanj Pelluri
ViT
22
0
0
23 May 2024
Attention as an RNN
Leo Feng
Frederick Tung
Hossein Hajimirsadeghi
Mohamed Osama Ahmed
Yoshua Bengio
Greg Mori
GNN
AI4TS
41
8
0
22 May 2024
From CNNs to Transformers in Multimodal Human Action Recognition: A Survey
Muhammad Bilal Shaikh
Syed Mohammed Shamsul Islam
Douglas Chai
Naveed Akhtar
30
9
0
22 May 2024
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction
Yunfan Jiang
Chen Wang
Ruohan Zhang
Jiajun Wu
Fei-Fei Li
OnRL
30
25
0
16 May 2024
PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology
George Shaikovski
Adam Casson
Kristen Severson
Eric Zimmermann
Yi Kan Wang
...
Peter Hamilton
William A. Moye
Eugene Vorontsov
Siqi Liu
Thomas J. Fuchs
MedIm
27
22
0
16 May 2024
Cross-sensor self-supervised training and alignment for remote sensing
V. Marsocci
Nicolas Audebert
23
1
0
16 May 2024
A Survey on Transformers in NLP with Focus on Efficiency
Wazib Ansar
Saptarsi Goswami
Amlan Chakrabarti
MedIm
25
2
0
15 May 2024
A Generalist Learner for Multifaceted Medical Image Interpretation
Hong-Yu Zhou
Subathra Adithan
J. N. Acosta
E. Topol
Pranav Rajpurkar
MedIm
30
23
0
13 May 2024
Topicwise Separable Sentence Retrieval for Medical Report Generation
Junting Zhao
Yang Zhou
Zhihao Chen
Huazhu Fu
Liang Wan
MedIm
25
1
0
07 May 2024
PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection
Zhaoqi Leng
Pei Sun
Tong He
Drago Anguelov
Mingxing Tan
ViT
3DPC
29
0
0
05 May 2024
Adapting to Distribution Shift by Visual Domain Prompt Generation
Zhixiang Chi
Li Gu
Tao Zhong
Huan Liu
Yuanhao Yu
Konstantinos N Plataniotis
Yang Wang
VLM
OOD
21
4
0
05 May 2024
What matters when building vision-language models?
Hugo Laurençon
Léo Tronchon
Matthieu Cord
Victor Sanh
VLM
30
155
0
03 May 2024
What Foundation Models can Bring for Robot Learning in Manipulation : A Survey
Dingzhe Li
Yixiang Jin
A. Yong
Hongze Yu
Jun Shi
Xiaoshuai Hao
Peng Hao
Huaping Liu
Fuchun Sun
Bin Fang
AI4CE
LM&Ro
64
12
0
28 Apr 2024
Step Differences in Instructional Video
Tushar Nagarajan
Lorenzo Torresani
VGen
27
5
0
24 Apr 2024
AutoAD III: The Prequel -- Back to the Pixels
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
36
20
0
22 Apr 2024
Socialized Learning: A Survey of the Paradigm Shift for Edge Intelligence in Networked Systems
Xiaofei Wang
Yunfeng Zhao
Chao Qiu
Qinghua Hu
Victor C. M. Leung
19
6
0
20 Apr 2024
DISC: Latent Diffusion Models with Self-Distillation from Separated Conditions for Prostate Cancer Grading
M. M. Ho
Elham Ghelichkhan
Yosep Chong
Yufei Zhou
Beatrice S. Knudsen
Tolga Tasdizen
MedIm
DiffM
21
0
0
19 Apr 2024
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory
Zicheng Liu
Li Wang
Siyuan Li
Zedong Wang
Haitao Lin
Stan Z. Li
VLM
24
4
0
17 Apr 2024
Previous
1
2
3
4
5
...
12
13
14
Next