Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.14795
Cited By
Perceiver IO: A General Architecture for Structured Inputs & Outputs
30 July 2021
Andrew Jaegle
Sebastian Borgeaud
Jean-Baptiste Alayrac
Carl Doersch
Catalin Ionescu
David Ding
Skanda Koppula
Daniel Zoran
Andrew Brock
Evan Shelhamer
Olivier J. Hénaff
M. Botvinick
Andrew Zisserman
Oriol Vinyals
João Carreira
MLLM
VLM
GNN
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Perceiver IO: A General Architecture for Structured Inputs & Outputs"
50 / 91 papers shown
Title
Compact Recurrent Transformer with Persistent Memory
Edison Mucllari
Z. Daniels
David C. Zhang
Qiang Ye
CLL
VLM
46
0
0
02 May 2025
A Self-Supervised Framework for Space Object Behaviour Characterisation
Ian Groves
Andrew Campbell
James Fernandes
Diego Rodriguez
Paul Murray
Massimiliano Vasile
Victoria Nockles
21
0
0
08 Apr 2025
Evaluation of (Un-)Supervised Machine Learning Methods for GNSS Interference Classification with Real-World Data Discrepancies
Lucas Heublein
N. Raichur
Tobias Feigl
Tobias Brieger
Fin Heuer
Lennart Asbach
A. Rügamer
Felix Ott
47
7
0
31 Mar 2025
CASE -- Condition-Aware Sentence Embeddings for Conditional Semantic Textual Similarity Measurement
Gaifan Zhang
Yi Zhou
Danushka Bollegala
61
0
0
21 Mar 2025
AirExo-2: Scaling up Generalizable Robotic Imitation Learning with Low-Cost Exoskeletons
Hongjie Fang
Chenxi Wang
Yiming Wang
J. Chen
Shangning Xia
...
Xinyu Zhan
Lixin Yang
Weiming Wang
Cewu Lu
Hao-Shu Fang
80
1
0
05 Mar 2025
Principles for Responsible AI Consciousness Research
Patrick Butlin
Theodoros Lappas
33
1
0
13 Jan 2025
CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM
Jingwei Xu
Chenyu Wang
Zibo Zhao
Wen Liu
Yi-An Ma
Shenghua Gao
50
11
0
07 Nov 2024
Unleashing the Power of Generic Segmentation Models: A Simple Baseline for Infrared Small Target Detection
Mingjin Zhang
Chi Zhang
Qiming Zhang
Yunsong Li
Xinbo Gao
Jing Zhang
VLM
30
3
0
07 Sep 2024
Atlas Gaussians Diffusion for 3D Generation
Haitao Yang
Yuan Dong
Hanwen Jiang
Dejia Xu
Georgios Pavlakos
Qixing Huang
3DGS
62
3
0
23 Aug 2024
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Xi Victoria Lin
Akshat Shrivastava
Liang Luo
Srinivasan Iyer
Mike Lewis
Gargi Gosh
Luke Zettlemoyer
Armen Aghajanyan
MoE
28
20
0
31 Jul 2024
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
Yuanhong Chen
Chong Wang
Yuyuan Liu
Hu Wang
Gustavo Carneiro
32
2
0
07 Jul 2024
GeoMFormer: A General Architecture for Geometric Molecular Representation Learning
Tianlang Chen
Shengjie Luo
Di He
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
AI4CE
29
5
0
24 Jun 2024
MALT: Multi-scale Action Learning Transformer for Online Action Detection
Zhipeng Yang
Ruoyu Wang
Yang Tan
Liping Xie
OffRL
38
0
0
31 May 2024
Stochastic Optimal Control for Diffusion Bridges in Function Spaces
Byoungwoo Park
Jungwon Choi
Sungbin Lim
Juho Lee
45
3
0
31 May 2024
Interpretable Robotic Manipulation from Language
Boyuan Zheng
Jianlong Zhou
Fang Chen
LM&Ro
27
0
0
27 May 2024
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
Chankyu Lee
Rajarshi Roy
Mengyao Xu
Jonathan Raiman
M. Shoeybi
Bryan Catanzaro
Wei Ping
RALM
36
132
0
27 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
64
38
0
23 May 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
53
249
0
16 May 2024
DeVOS: Flow-Guided Deformable Transformer for Video Object Segmentation
Volodymyr Fedynyak
Yaroslav Romanus
Bohdan Hlovatskyi
Bohdan Sydor
Oles Dobosevych
Igor Babin
Roman Riazantsev
VOS
29
3
0
11 May 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
30
5
0
28 Mar 2024
Universal Physics Transformers: A Framework For Efficiently Scaling Neural Operators
Benedikt Alkin
Andreas Fürst
Simon Schmid
Lukas Gruber
Markus Holzleitner
Johannes Brandstetter
PINN
AI4CE
35
8
0
19 Feb 2024
Rethinking Patch Dependence for Masked Autoencoders
Letian Fu
Long Lian
Renhao Wang
Baifeng Shi
Xudong Wang
Adam Yala
Trevor Darrell
Alexei A. Efros
Ken Goldberg
20
14
0
25 Jan 2024
Geometry-Biased Transformer for Robust Multi-View 3D Human Pose Reconstruction
Olivier Moliner
Sangxia Huang
Kalle AAstrom
ViT
16
3
0
28 Dec 2023
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
25
62
0
11 Dec 2023
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
16
64
0
07 Nov 2023
URLOST: Unsupervised Representation Learning without Stationarity or Topology
Zeyu Yun
Juexiao Zhang
Bruno A. Olshausen
Yann LeCun
16
0
0
06 Oct 2023
Vision Transformers Need Registers
Zilong Chen
Maxime Oquab
Julien Mairal
Huaping Liu
ViT
33
308
0
28 Sep 2023
Associative Transformer
Yuwei Sun
H. Ochiai
Zhirong Wu
Stephen Lin
Ryota Kanai
ViT
41
0
0
22 Sep 2023
Heterogeneous Forgetting Compensation for Class-Incremental Learning
Jiahua Dong
Wenqi Liang
Yang Cong
Gan Sun
CLL
16
19
0
07 Aug 2023
Robotic Vision for Human-Robot Interaction and Collaboration: A Survey and Systematic Review
Nicole L. Robinson
Brendan Tidd
Dylan Campbell
Dana Kulić
Peter Corke
33
54
0
28 Jul 2023
Towards Deeply Unified Depth-aware Panoptic Segmentation with Bi-directional Guidance Learning
Ju He
Yifan Wang
Lijun Wang
Huchuan Lu
Jun-Yan He
Jinpeng Lan
Bin Luo
Yifeng Geng
Xuansong Xie
MDE
16
8
0
27 Jul 2023
CoTracker: It is Better to Track Together
Nikita Karaev
Ignacio Rocco
Benjamin Graham
Natalia Neverova
Andrea Vedaldi
Christian Rupprecht
VOT
ViT
45
243
0
14 Jul 2023
Advances and Challenges in Meta-Learning: A Technical Review
Anna Vettoruzzo
Mohamed-Rafik Bouguelia
Joaquin Vanschoren
Thorsteinn Rögnvaldsson
K. Santosh
OffRL
19
69
0
10 Jul 2023
Motion Perceiver: Real-Time Occupancy Forecasting for Embedded Systems
Bryce Ferenczi
Michael G. Burke
Tom Drummond
19
4
0
15 Jun 2023
A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics
Hong-Yu Zhou
Yizhou Yu
Chengdi Wang
Shu Zhen Zhang
Yuanxu Gao
Jia-Yu Pan
Jun Shao
Guangming Lu
Kang Zhang
Weimin Li
MedIm
11
149
0
01 Jun 2023
Adapting Language-Audio Models as Few-Shot Audio Learners
Jinhua Liang
Xubo Liu
Haohe Liu
Huy P Phan
Emmanouil Benetos
Mark D. Plumbley
Wenwu Wang
VLM
17
19
0
28 May 2023
FIT: Far-reaching Interleaved Transformers
Ting-Li Chen
Lala Li
19
12
0
22 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
13
113
0
18 May 2023
Ray-Patch: An Efficient Querying for Light Field Transformers
T. B. Martins
Javier Civera
ViT
29
0
0
16 May 2023
What is the best recipe for character-level encoder-only modelling?
Kris Cao
25
2
0
09 May 2023
Inductive biases in deep learning models for weather prediction
Jannik Thümmel
Matthias Karlbauer
S. Otte
C. Zarfl
Georg Martius
...
Thomas Scholten
Ulrich Friedrich
V. Wulfmeyer
B. Goswami
Martin Volker Butz
AI4CE
31
4
0
06 Apr 2023
Procedure-Aware Pretraining for Instructional Video Understanding
Honglu Zhou
Roberto Martín-Martín
Mubbasir Kapadia
Silvio Savarese
Juan Carlos Niebles
23
38
0
31 Mar 2023
Top-Down Visual Attention from Analysis by Synthesis
Baifeng Shi
Trevor Darrell
Xin Eric Wang
17
28
0
23 Mar 2023
Machine Learning for Brain Disorders: Transformers and Visual Transformers
Robin Courant
Maika Edberg
Nicolas Dufour
Vicky Kalogeiton
MedIm
ViT
17
1
0
21 Mar 2023
Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers
Jaehoon Yoo
Semin Kim
Doyup Lee
Chiheon Kim
Seunghoon Hong
21
3
0
20 Mar 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
8
7
0
16 Feb 2023
Open Problems in Applied Deep Learning
M. Raissi
AI4CE
18
2
0
26 Jan 2023
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
Jia Ning
Chen Li
Zheng-Wei Zhang
Zigang Geng
Qi Dai
Kun He
Han Hu
25
42
0
05 Jan 2023
Manifestations of Xenophobia in AI Systems
Nenad Tomašev
J. L. Maynard
Iason Gabriel
19
9
0
15 Dec 2022
Egocentric Video Task Translation
Zihui Xue
Yale Song
Kristen Grauman
Lorenzo Torresani
EgoV
16
13
0
13 Dec 2022
1
2
Next