ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.03206
  4. Cited By
Perceiver: General Perception with Iterative Attention
v1v2 (latest)

Perceiver: General Perception with Iterative Attention

International Conference on Machine Learning (ICML), 2021
4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
    VLMViTMDE
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "Perceiver: General Perception with Iterative Attention"

50 / 783 papers shown
Title
VideoGameBunny: Towards vision assistants for video games
VideoGameBunny: Towards vision assistants for video games
Mohammad Reza Taesiri
Cor-Paul Bezemer
VLMMLLM
181
7
0
21 Jul 2024
Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation
Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation
Yu Zhang
Ruijie Yu
Kaipeng Zeng
Ding Li
Feng Zhu
Yunbo Wang
Yaohui Jin
Yanyan Xu
145
1
0
21 Jul 2024
Large-vocabulary forensic pathological analyses via prototypical
  cross-modal contrastive learning
Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning
Chen Shen
Chunfeng Lian
Wanqing Zhang
Fan Wang
Jianhua Zhang
...
Hongshu Mu
Hao Wu
Xinggong Liang
Jianhua Ma
Zhenyuan Wang
167
5
0
20 Jul 2024
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
S. Swetha
Jinyu Yang
T. Neiman
Mamshad Nayeem Rizve
Son Tran
Benjamin Z. Yao
Trishul Chilimbi
Mubarak Shah
236
9
0
18 Jul 2024
Audio-visual Generalized Zero-shot Learning the Easy Way
Audio-visual Generalized Zero-shot Learning the Easy Way
Shentong Mo
Pedro Morgado
207
7
0
18 Jul 2024
MetaSumPerceiver: Multimodal Multi-Document Evidence Summarization for
  Fact-Checking
MetaSumPerceiver: Multimodal Multi-Document Evidence Summarization for Fact-Checking
Ting-Chih Chen
Chia-Wei Tang
Chris Thomas
222
10
0
18 Jul 2024
IoT-LM: Large Multisensory Language Models for the Internet of Things
IoT-LM: Large Multisensory Language Models for the Internet of Things
Shentong Mo
Russ Salakhutdinov
Louis-Philippe Morency
Paul Pu Liang
MLLM
140
19
0
13 Jul 2024
Paving the way toward foundation models for irregular and unaligned
  Satellite Image Time Series
Paving the way toward foundation models for irregular and unaligned Satellite Image Time Series
Iris Dumeur
Silvia Valero
Jordi Inglada
235
7
0
11 Jul 2024
YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer
  Architectures and Cross-dataset Stem Augmentation
YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation
Sungkyun Chang
Emmanouil Benetos
Holger Kirchhoff
Simon Dixon
254
8
0
05 Jul 2024
LaRa: Efficient Large-Baseline Radiance Fields
LaRa: Efficient Large-Baseline Radiance Fields
Anpei Chen
Haofei Xu
Stefano Esposito
Siyu Tang
Andreas Geiger
AI4CE
321
48
0
05 Jul 2024
ADAPT: Multimodal Learning for Detecting Physiological Changes under
  Missing Modalities
ADAPT: Multimodal Learning for Detecting Physiological Changes under Missing Modalities
Julie Mordacq
Léo Milecki
Maria Vakalopoulou
Steve Oudot
Vicky Kalogeiton
OffRLMedIm
133
7
0
04 Jul 2024
Multi-State-Action Tokenisation in Decision Transformers for
  Multi-Discrete Action Spaces
Multi-State-Action Tokenisation in Decision Transformers for Multi-Discrete Action Spaces
Perusha Moodley
Pramod S. Kaushik
Dhillu Thambi
Mark Trovinger
Praveen Paruchuri
Xia Hong
Benjamin Rosman
278
0
0
01 Jul 2024
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
William Berman
A. Peysakhovich
217
5
0
26 Jun 2024
MatchTime: Towards Automatic Soccer Game Commentary Generation
MatchTime: Towards Automatic Soccer Game Commentary Generation
Jiayuan Rao
Haoning Wu
Chang-rui Liu
Yanfeng Wang
Weidi Xie
213
25
0
26 Jun 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
...
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
3DVMLLM
311
603
0
24 Jun 2024
GeoMFormer: A General Architecture for Geometric Molecular
  Representation Learning
GeoMFormer: A General Architecture for Geometric Molecular Representation Learning
Tianlang Chen
Shengjie Luo
Di He
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
AI4CE
194
9
0
24 Jun 2024
Towards Zero-Shot Text-To-Speech for Arabic Dialects
Towards Zero-Shot Text-To-Speech for Arabic Dialects
Khai Duy Doan
Abdul Waheed
Muhammad Abdul-Mageed
278
2
0
24 Jun 2024
Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models
Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models
Xicheng Zhang
Fuchun Sun
Bin Zhao
Junchi Yan
Xiu Li
Xuelong Li
OffRL
201
10
0
22 Jun 2024
Learning Efficient and Robust Language-conditioned Manipulation using Textual-Visual Relevancy and Equivariant Language Mapping
Learning Efficient and Robust Language-conditioned Manipulation using Textual-Visual Relevancy and Equivariant Language Mapping
Mingxi Jia
Haojie Huang
Zhewen Zhang
Chenghao Wang
Linfeng Zhao
Dian Wang
J. Liu
Robin Walters
Robert Platt
Stefanie Tellex
LM&Ro
335
6
0
21 Jun 2024
Multi-modal Transfer Learning between Biological Foundation Models
Multi-modal Transfer Learning between Biological Foundation Models
Juan Jose Garau-Luis
Patrick Bordes
Liam Gonzalez
Masa Roller
Bernardo P. de Almeida
...
Stefan Laurent
Jan Grzegorzewski
Maren Lang
Thomas Pierrot
Guillaume Richard
AI4CE
257
9
0
20 Jun 2024
In-Context In-Context Learning with Transformer Neural Processes
In-Context In-Context Learning with Transformer Neural ProcessesSymposium on Advances in Approximate Bayesian Inference (AABI), 2024
Matthew Ashman
Cristiana-Diana Diaconu
Adrian Weller
Richard E. Turner
170
4
0
19 Jun 2024
Approximately Equivariant Neural Processes
Approximately Equivariant Neural ProcessesNeural Information Processing Systems (NeurIPS), 2024
Matthew Ashman
Cristiana-Diana Diaconu
Adrian Weller
W. Bruinsma
Richard E. Turner
BDL
166
4
0
19 Jun 2024
Recurrence over Video Frames (RoVF) for the Re-identification of
  Meerkats
Recurrence over Video Frames (RoVF) for the Re-identification of Meerkats
Mitchell Rogers
Kobe Knowles
Gaël Gendron
Shahrokh Heidari
David Arturo Soriano Valdez
Mihailo Azhar
Padriac O'Leary
Simon Eyre
Michael Witbrock
Patrice Delmas
121
2
0
18 Jun 2024
Translation Equivariant Transformer Neural Processes
Translation Equivariant Transformer Neural Processes
Matthew Ashman
Cristiana-Diana Diaconu
Junhyuck Kim
Lakee Sivaraya
Stratis Markou
James Requeima
W. Bruinsma
Richard E. Turner
215
8
0
18 Jun 2024
Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers
Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers
Omer Sahin Tas
Royden Wagner
353
1
0
17 Jun 2024
Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese
  Disordered Speech Recognition
Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech RecognitionInterspeech (Interspeech), 2024
Yicong Jiang
Tianzi Wang
Xurong Xie
Juan Liu
Wei Sun
Nan Yan
Hui Chen
Lan Wang
Xunying Liu
Feng Tian
121
7
0
14 Jun 2024
Contrastive Imitation Learning for Language-guided Multi-Task Robotic
  Manipulation
Contrastive Imitation Learning for Language-guided Multi-Task Robotic ManipulationConference on Robot Learning (CoRL), 2024
Teli Ma
Jiaming Zhou
Zifan Wang
Ronghe Qiu
Junwei Liang
200
17
0
14 Jun 2024
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based
  Text-to-Speech for Dubbing
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing
Neha Sahipjohn
Ashishkumar Gudmalwar
Nirmesh Shah
Pankaj Wasnik
R. Shah
237
12
0
13 Jun 2024
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Yi-Fan Zhang
Qingsong Wen
Chaoyou Fu
Xue Wang
Zhang Zhang
Liwen Wang
Rong Jin
209
64
0
12 Jun 2024
Generalist Multimodal AI: A Review of Architectures, Challenges and
  Opportunities
Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities
Sai Munikoti
Ian Stewart
Sameera Horawalavithana
Henry Kvinge
Tegan H. Emerson
Sandra E Thompson
Karl Pazdernik
199
4
0
08 Jun 2024
CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling
CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux ModellingInternational Conference on Learning Representations (ICLR), 2024
Matthew Fortier
Mats L. Richter
O. Sonnentag
Chris Pal
AI4CE
213
2
0
07 Jun 2024
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and
  Effective for LMMs
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMsNeural Information Processing Systems (NeurIPS), 2024
Lingchen Meng
Jianwei Yang
Rui Tian
Xiyang Dai
Zuxuan Wu
Jianfeng Gao
Yu-Gang Jiang
VLM
211
22
0
06 Jun 2024
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with
  Multi-Modal Context and Large Language Model
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Jinlong Xue
Yayue Deng
Yicheng Han
Yingming Gao
Ya Li
219
6
0
06 Jun 2024
Item-Language Model for Conversational Recommendation
Item-Language Model for Conversational Recommendation
Li Yang
Anushya Subbiah
Hardik Patel
Judith Yue Li
Yanwei Song
Reza Mirghaderi
Vikram Aggarwal
Qifan Wang
KELM
178
9
0
05 Jun 2024
Diffusion Features to Bridge Domain Gap for Semantic Segmentation
Diffusion Features to Bridge Domain Gap for Semantic Segmentation
Yuxiang Ji
Boyong He
Chenyuan Qu
Zhuoyue Tan
Chuan Qin
Liaoni Wu
333
5
0
02 Jun 2024
Direct Cardiac Segmentation from Undersampled K-space Using Transformers
Direct Cardiac Segmentation from Undersampled K-space Using Transformers
Yundi Zhang
Nil Stolt Ansó
Jiazhen Pan
Wenqi Huang
Kerstin Hammernik
Daniel Rueckert
MedIm
194
4
0
31 May 2024
Evaluating Vision-Language Models on Bistable Images
Evaluating Vision-Language Models on Bistable Images
Artemis Panagopoulou
Coby Melkin
Chris Callison-Burch
149
2
0
29 May 2024
VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via
  Diffusion Transformers
VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers
Jun Zheng
Fuwei Zhao
Youjiang Xu
Xin Dong
Xiaodan Liang
VGenDiffM
216
9
0
28 May 2024
Lateralization MLP: A Simple Brain-inspired Architecture for Diffusion
Lateralization MLP: A Simple Brain-inspired Architecture for Diffusion
Zizhao Hu
Mohammad Rostami
195
0
0
25 May 2024
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for
  Multimodal Large Language Models
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
Yue Zhang
Hehe Fan
Yi Yang
235
6
0
24 May 2024
Planted: a dataset for planted forest identification from
  multi-satellite time series
Planted: a dataset for planted forest identification from multi-satellite time series
L. M. Pazos-Outón
Cristina Nader Vasconcelos
Anton Raichuk
Anurag Arnab
Dan Morris
Maxim Neumann
157
8
0
24 May 2024
Direct3D: Scalable Image-to-3D Generation via 3D Latent Diffusion
  Transformer
Direct3D: Scalable Image-to-3D Generation via 3D Latent Diffusion TransformerNeural Information Processing Systems (NeurIPS), 2024
Shuang Wu
Youtian Lin
Feihu Zhang
Yifei Zeng
Jingxi Xu
Juil Sock
Xun Cao
Yao Yao
225
136
0
23 May 2024
Transformers for Image-Goal Navigation
Transformers for Image-Goal Navigation
Nikhilanj Pelluri
ViT
269
2
0
23 May 2024
Attention as an RNN
Attention as an RNN
Leo Feng
Frederick Tung
Hossein Hajimirsadeghi
Mohamed Osama Ahmed
Yoshua Bengio
Greg Mori
GNNAI4TS
248
16
0
22 May 2024
From CNNs to Transformers in Multimodal Human Action Recognition: A
  Survey
From CNNs to Transformers in Multimodal Human Action Recognition: A Survey
Muhammad Bilal Shaikh
Syed Mohammed Shamsul Islam
Douglas Chai
Naveed Akhtar
277
26
0
22 May 2024
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online CorrectionConference on Robot Learning (CoRL), 2024
Yunfan Jiang
Chen Wang
Ruohan Zhang
Jiajun Wu
Fei-Fei Li
OnRL
256
57
0
16 May 2024
PRISM: A Multi-Modal Generative Foundation Model for Slide-Level
  Histopathology
PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology
Eugene Vorontsov
Adam Casson
Kristen Severson
Eric Zimmermann
Yi Kan Wang
...
Peter Hamilton
William A. Moye
Eugene Vorontsov
Siqi Liu
Thomas J. Fuchs
MedIm
241
61
0
16 May 2024
Cross-sensor self-supervised training and alignment for remote sensing
Cross-sensor self-supervised training and alignment for remote sensingIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE JSTARS), 2024
V. Marsocci
Nicolas Audebert
245
4
0
16 May 2024
A Survey on Transformers in NLP with Focus on Efficiency
A Survey on Transformers in NLP with Focus on Efficiency
Wazib Ansar
Saptarsi Goswami
Amlan Chakrabarti
MedIm
269
11
0
15 May 2024
MedVersa: A Generalist Foundation Model for Medical Image Interpretation
MedVersa: A Generalist Foundation Model for Medical Image Interpretation
Hong-Yu Zhou
Subathra Adithan
J. N. Acosta
Suvrankar Datta
E. Topol
Pranav Rajpurkar
MedIm
357
29
0
13 May 2024
Previous
123...567...141516
Next