ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.14795
  4. Cited By
Perceiver IO: A General Architecture for Structured Inputs & Outputs

Perceiver IO: A General Architecture for Structured Inputs & Outputs

30 July 2021
Andrew Jaegle
Sebastian Borgeaud
Jean-Baptiste Alayrac
Carl Doersch
Catalin Ionescu
David Ding
Skanda Koppula
Daniel Zoran
Andrew Brock
Evan Shelhamer
Olivier J. Hénaff
M. Botvinick
Andrew Zisserman
Oriol Vinyals
João Carreira
    MLLM
    VLM
    GNN
ArXivPDFHTML

Papers citing "Perceiver IO: A General Architecture for Structured Inputs & Outputs"

41 / 91 papers shown
Title
Images Speak in Images: A Generalist Painter for In-Context Visual
  Learning
Images Speak in Images: A Generalist Painter for In-Context Visual Learning
Xinlong Wang
Wen Wang
Yue Cao
Chunhua Shen
Tiejun Huang
VLM
MLLM
33
244
0
05 Dec 2022
Self-supervised AutoFlow
Self-supervised AutoFlow
Hsin-Ping Huang
Charles Herrmann
Junhwa Hur
Erika Lu
Kyle Sargent
Austin Stone
Ming Yang
Deqing Sun
17
8
0
04 Dec 2022
Task Discovery: Finding the Tasks that Neural Networks Generalize on
Task Discovery: Finding the Tasks that Neural Networks Generalize on
Andrei Atanov
Andrei Filatov
Teresa Yeo
Ajay Sohmshetty
Amir Zamir
OOD
23
10
0
01 Dec 2022
Unifying Flow, Stereo and Depth Estimation
Unifying Flow, Stereo and Depth Estimation
Haofei Xu
Jing Zhang
Jianfei Cai
Hamid Rezatofighi
F. I. F. Richard Yu
Dacheng Tao
Andreas Geiger
MDE
10
188
0
10 Nov 2022
Active Acquisition for Multimodal Temporal Data: A Challenging
  Decision-Making Task
Active Acquisition for Multimodal Temporal Data: A Challenging Decision-Making Task
Jannik Kossen
Cătălina Cangea
Eszter Vértes
Andrew Jaegle
Viorica Patraucean
Ira Ktena
Nenad Tomašev
Danielle Belgrave
19
8
0
09 Nov 2022
A General Purpose Neural Architecture for Geospatial Systems
A General Purpose Neural Architecture for Geospatial Systems
Nasim Rahaman
Martin Weiss
Frederik Trauble
Francesco Locatello
Alexandre Lacoste
Yoshua Bengio
C. Pal
Li Erran Li
Bernhard Schölkopf
AI4TS
AI4CE
19
5
0
04 Nov 2022
Attention-based Neural Cellular Automata
Attention-based Neural Cellular Automata
Mattie Tesfaldet
Derek Nowrouzezahrai
C. Pal
ViT
13
16
0
02 Nov 2022
Coordinates Are NOT Lonely -- Codebook Prior Helps Implicit Neural 3D
  Representations
Coordinates Are NOT Lonely -- Codebook Prior Helps Implicit Neural 3D Representations
Fukun Yin
Wen Liu
Zilong Huang
Pei Cheng
Tao Chen
Gang Yu
6
18
0
20 Oct 2022
Neural Attentive Circuits
Neural Attentive Circuits
Nasim Rahaman
M. Weiß
Francesco Locatello
C. Pal
Yoshua Bengio
Bernhard Schölkopf
Erran L. Li
Nicolas Ballas
13
6
0
14 Oct 2022
VIMA: General Robot Manipulation with Multimodal Prompts
VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang
Agrim Gupta
Zichen Zhang
Guanzhi Wang
Yongqiang Dou
Yanjun Chen
Li Fei-Fei
Anima Anandkumar
Yuke Zhu
Linxi Fan
LM&Ro
15
332
0
06 Oct 2022
Adapting Pretrained Text-to-Text Models for Long Text Sequences
Adapting Pretrained Text-to-Text Models for Long Text Sequences
Wenhan Xiong
Anchit Gupta
Shubham Toshniwal
Yashar Mehdad
Wen-tau Yih
RALM
VLM
49
30
0
21 Sep 2022
Topic Detection in Continuous Sign Language Videos
Topic Detection in Continuous Sign Language Videos
Álvaro Budria
Laia Tarrés
Gerard I. Gállego
Francesc Moreno-Noguer
Jordi Torres
Xavier Giró-i-Nieto
SLR
VLM
28
1
0
01 Sep 2022
Learning to Generalize with Object-centric Agents in the Open World
  Survival Game Crafter
Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter
Aleksandar Stanić
Yujin Tang
David R Ha
Jürgen Schmidhuber
ELM
16
11
0
05 Aug 2022
COPER: Continuous Patient State Perceiver
COPER: Continuous Patient State Perceiver
V. Chauhan
Anshul Thakur
Odhran O'Donoghue
David A. Clifton
AI4TS
OOD
19
5
0
05 Aug 2022
Depth Field Networks for Generalizable Multi-view Scene Representation
Depth Field Networks for Generalizable Multi-view Scene Representation
Vitor Campagnolo Guizilini
Igor Vasiljevic
Jiading Fang
Rares Ambrus
G. Shakhnarovich
Matthew R. Walter
Adrien Gaidon
3DV
MDE
13
15
0
28 Jul 2022
Recurrent Memory Transformer
Recurrent Memory Transformer
Aydar Bulatov
Yuri Kuratov
Mikhail Burtsev
CLL
11
101
0
14 Jul 2022
Wayformer: Motion Forecasting via Simple & Efficient Attention Networks
Wayformer: Motion Forecasting via Simple & Efficient Attention Networks
Nigamaa Nayakanti
Rami Al-Rfou
Aurick Zhou
Kratarth Goel
Khaled S. Refaat
Benjamin Sapp
AI4TS
28
233
0
12 Jul 2022
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech
  Insertion
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
Dacheng Yin
Chuanxin Tang
Yanqing Liu
Xiaoqiang Wang
Zhiyuan Zhao
Yucheng Zhao
Zhiwei Xiong
Sheng Zhao
Chong Luo
10
12
0
28 Jun 2022
LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic
  Segmentation
LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation
Florent Bartoccioni
Éloi Zablocki
Andrei Bursuc
Patrick Pérez
Matthieu Cord
Alahari Karteek
12
33
0
27 Jun 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
31
391
0
17 Jun 2022
GateHUB: Gated History Unit with Background Suppression for Online
  Action Detection
GateHUB: Gated History Unit with Background Suppression for Online Action Detection
Junwen Chen
Gaurav Mittal
Ye Yu
Yu Kong
Mei Chen
30
33
0
09 Jun 2022
A Generalist Agent
A Generalist Agent
Scott E. Reed
Konrad Zolna
Emilio Parisotto
Sergio Gomez Colmenarejo
Alexander Novikov
...
Yutian Chen
R. Hadsell
Oriol Vinyals
Mahyar Bordbar
Nando de Freitas
LM&Ro
LLMAG
AI4CE
25
782
0
12 May 2022
Event Transformer. A sparse-aware solution for efficient event data
  processing
Event Transformer. A sparse-aware solution for efficient event data processing
Alberto Sabater
Luis Montesano
Ana C. Murillo
19
50
0
07 Apr 2022
CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow
CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow
Xiuchao Sui
Shaohua Li
Xue Geng
Yan Wu
Xinxing Xu
Yong Liu
Rick Siow Mong Goh
Hongyuan Zhu
ViT
24
94
0
31 Mar 2022
Unsupervised Learning of Temporal Abstractions with Slot-based
  Transformers
Unsupervised Learning of Temporal Abstractions with Slot-based Transformers
Anand Gopalakrishnan
Kazuki Irie
Jürgen Schmidhuber
Sjoerd van Steenkiste
OffRL
19
16
0
25 Mar 2022
Transform your Smartphone into a DSLR Camera: Learning the ISP in the
  Wild
Transform your Smartphone into a DSLR Camera: Learning the ISP in the Wild
A. S. Tripathi
Martin Danelljan
Samarth Shukla
Radu Timofte
Luc Van Gool
15
9
0
20 Mar 2022
Backbone is All Your Need: A Simplified Architecture for Visual Object
  Tracking
Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking
Boyu Chen
Peixia Li
Lei Bai
Leixian Qiao
Qiuhong Shen
Bo-wen Li
Weihao Gan
Wei Wu
Wanli Ouyang
ViT
VOT
20
182
0
10 Mar 2022
Temporal Perceiver: A General Architecture for Arbitrary Boundary
  Detection
Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection
Jing Tan
Yuhong Wang
Gangshan Wu
Limin Wang
39
14
0
01 Mar 2022
High-Resolution Image Synthesis with Latent Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
50
14,533
0
20 Dec 2021
GMFlow: Learning Optical Flow via Global Matching
GMFlow: Learning Optical Flow via Global Matching
Haofei Xu
Jing Zhang
Jianfei Cai
Hamid Rezatofighi
Dacheng Tao
51
338
0
26 Nov 2021
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
33
73
0
25 Nov 2021
Conditional Object-Centric Learning from Video
Conditional Object-Centric Learning from Video
Thomas Kipf
Gamaleldin F. Elsayed
Aravindh Mahendran
Austin Stone
S. Sabour
G. Heigold
Rico Jonschkowski
Alexey Dosovitskiy
Klaus Greff
OCL
39
213
0
24 Nov 2021
DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion
DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion
Arthur Douillard
Alexandre Ramé
Guillaume Couairon
Matthieu Cord
CLL
19
292
0
22 Nov 2021
The Efficiency Misnomer
The Efficiency Misnomer
Daoyuan Chen
Liuyi Yao
Dawei Gao
Ashish Vaswani
Yaliang Li
23
96
0
25 Oct 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
231
573
0
22 Apr 2021
Coordination Among Neural Modules Through a Shared Global Workspace
Coordination Among Neural Modules Through a Shared Global Workspace
Anirudh Goyal
Aniket Didolkar
Alex Lamb
Kartikeya Badola
Nan Rosemary Ke
Nasim Rahaman
Jonathan Binas
Charles Blundell
Michael C. Mozer
Yoshua Bengio
144
98
0
01 Mar 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
220
510
0
11 Feb 2021
Meta Pseudo Labels
Meta Pseudo Labels
Hieu H. Pham
Zihang Dai
Qizhe Xie
Minh-Thang Luong
Quoc V. Le
VLM
245
648
0
23 Mar 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
223
4,424
0
23 Jan 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
Previous
12