ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.03206
  4. Cited By
Perceiver: General Perception with Iterative Attention

Perceiver: General Perception with Iterative Attention

4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
    VLM
    ViT
    MDE
ArXivPDFHTML

Papers citing "Perceiver: General Perception with Iterative Attention"

50 / 680 papers shown
Title
UMBRAE: Unified Multimodal Brain Decoding
UMBRAE: Unified Multimodal Brain Decoding
Weihao Xia
Raoul de Charette
Cengiz Öztireli
Jing-Hao Xue
27
6
0
10 Apr 2024
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving
  Zero-Shot Voice Editing
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
Philip Anastassiou
Zhenyu Tang
Kainan Peng
Dongya Jia
Jiaxin Li
Ming Tu
Yuping Wang
Yuxuan Wang
Mingbo Ma
37
4
0
10 Apr 2024
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min
Shyamal Buch
Arsha Nagrani
Minsu Cho
Cordelia Schmid
LRM
34
20
0
09 Apr 2024
Koala: Key frame-conditioned long video-LLM
Koala: Key frame-conditioned long video-LLM
Reuben Tan
Ximeng Sun
Ping Hu
Jui-hsien Wang
Hanieh Deilamsalehy
Bryan A. Plummer
Bryan C. Russell
Kate Saenko
38
35
0
05 Apr 2024
PointInfinity: Resolution-Invariant Point Diffusion Models
PointInfinity: Resolution-Invariant Point Diffusion Models
Zixuan Huang
Justin Johnson
Shoubhik Debnath
James M. Rehg
Chao-Yuan Wu
23
11
0
04 Apr 2024
MotionChain: Conversational Motion Controllers via Multimodal Prompts
MotionChain: Conversational Motion Controllers via Multimodal Prompts
Biao Jiang
Xin Chen
C. Zhang
Fukun Yin
Zhuoyuan Li
Gang Yu
Jiayuan Fan
VGen
LRM
21
10
0
02 Apr 2024
On Difficulties of Attention Factorization through Shared Memory
On Difficulties of Attention Factorization through Shared Memory
Uladzislau Yorsh
Martin Holevna
Ondrej Bojar
David Herel
23
0
0
31 Mar 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
37
5
0
28 Mar 2024
A Novel Stochastic Transformer-based Approach for Post-Traumatic Stress
  Disorder Detection using Audio Recording of Clinical Interviews
A Novel Stochastic Transformer-based Approach for Post-Traumatic Stress Disorder Detection using Audio Recording of Clinical Interviews
M. Dia
G. Khodabandelou
Alice Othmani
36
6
0
28 Mar 2024
Homogeneous Tokenizer Matters: Homogeneous Visual Tokenizer for Remote
  Sensing Image Understanding
Homogeneous Tokenizer Matters: Homogeneous Visual Tokenizer for Remote Sensing Image Understanding
Run Shao
Zhaoyang Zhang
Chao Tao
Yunsheng Zhang
Chengli Peng
Haifeng Li
VLM
30
4
0
27 Mar 2024
Move as You Say, Interact as You Can: Language-guided Human Motion
  Generation with Scene Affordance
Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance
Zan Wang
Yixin Chen
Baoxiong Jia
Puhao Li
Jinlu Zhang
Jingze Zhang
Tengyu Liu
Yixin Zhu
Wei Liang
Siyuan Huang
VGen
DiffM
39
36
0
26 Mar 2024
On permutation-invariant neural networks
On permutation-invariant neural networks
Masanari Kimura
Ryotaro Shimizu
Yuki Hirakawa
Ryosuke Goto
Yuki Saito
OOD
AAML
33
12
0
26 Mar 2024
Neural Clustering based Visual Representation Learning
Neural Clustering based Visual Representation Learning
Guikun Chen
Xia Li
Yi Yang
Wenguan Wang
SSL
27
8
0
26 Mar 2024
Residual-based Language Models are Free Boosters for Biomedical Imaging
Residual-based Language Models are Free Boosters for Biomedical Imaging
Zhixin Lai
Jing Wu
Suiyao Chen
Yucheng Zhou
N. Hovakimyan
MedIm
25
26
0
26 Mar 2024
Dia-LLaMA: Towards Large Language Model-driven CT Report Generation
Dia-LLaMA: Towards Large Language Model-driven CT Report Generation
Zhixuan Chen
Luyang Luo
Yequan Bie
Hao Chen
LM&MA
11
13
0
25 Mar 2024
FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos
FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos
Florian Langer
Jihong Ju
Georgi Dikov
Gerhard Reitmayr
Mohsen Ghafoorian
3DPC
24
3
0
22 Mar 2024
Unsupervised Audio-Visual Segmentation with Modality Alignment
Unsupervised Audio-Visual Segmentation with Modality Alignment
Swapnil Bhosale
Haosen Yang
Diptesh Kanojia
Jiangkang Deng
Xiatian Zhu
VOS
30
5
0
21 Mar 2024
Learning Decomposable and Debiased Representations via Attribute-Centric
  Information Bottlenecks
Learning Decomposable and Debiased Representations via Attribute-Centric Information Bottlenecks
Jinyung Hong
Eunyeong Jeon
Changhoon Kim
Keun Hee Park
Utkarsh Nath
Yezhou Yang
P. Turaga
Theodore P. Pavlic
CML
14
0
0
21 Mar 2024
On the Utility of 3D Hand Poses for Action Recognition
On the Utility of 3D Hand Poses for Action Recognition
Md Salman Shamil
Dibyadip Chatterjee
Fadime Sener
Shugao Ma
Angela Yao
32
5
0
14 Mar 2024
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
Jongsuk Kim
Hyeongkeun Lee
Kyeongha Rho
Junmo Kim
Joon Son Chung
16
4
0
14 Mar 2024
BurstAttention: An Efficient Distributed Attention Framework for
  Extremely Long Sequences
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Sun Ao
Weilin Zhao
Xu Han
Cheng Yang
Zhiyuan Liu
Chuan Shi
Maosong Sun
GNN
24
8
0
14 Mar 2024
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic
  Manipulation
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
Guanxing Lu
Shiyi Zhang
Ziwei Wang
Changliu Liu
Jiwen Lu
Yansong Tang
41
49
0
13 Mar 2024
DexCap: Scalable and Portable Mocap Data Collection System for Dexterous
  Manipulation
DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation
Chen Wang
Haochen Shi
Weizhuo Wang
Ruohan Zhang
Fei-Fei Li
Karen Liu
45
103
0
12 Mar 2024
Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and
  Image Embeddings
Synth2^22: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
Sahand Sharifzadeh
Christos Kaplanis
Shreya Pathak
D. Kumaran
Anastasija Ilić
Jovana Mitrović
Charles Blundell
Andrea Banino
VLM
29
9
0
12 Mar 2024
Complementing Event Streams and RGB Frames for Hand Mesh Reconstruction
Complementing Event Streams and RGB Frames for Hand Mesh Reconstruction
Jianping Jiang
Xinyu Zhou
Bingxuan Wang
Xiaoming Deng
Chao Xu
Boxin Shi
46
7
0
12 Mar 2024
3DTextureTransformer: Geometry Aware Texture Generation for Arbitrary
  Mesh Topology
3DTextureTransformer: Geometry Aware Texture Generation for Arbitrary Mesh Topology
K. Dharma
Clayton T. Morrison
25
0
0
07 Mar 2024
RATSF: Empowering Customer Service Volume Management through
  Retrieval-Augmented Time-Series Forecasting
RATSF: Empowering Customer Service Volume Management through Retrieval-Augmented Time-Series Forecasting
Tianfeng Wang
Gaojie Cui
AI4TS
39
0
0
07 Mar 2024
DNAct: Diffusion Guided Multi-Task 3D Policy Learning
DNAct: Diffusion Guided Multi-Task 3D Policy Learning
Ge Yan
Yueh-hua Wu
Xiaolong Wang
VGen
27
20
0
07 Mar 2024
Adding Multimodal Capabilities to a Text-only Translation Model
Adding Multimodal Capabilities to a Text-only Translation Model
Vipin Vijayan
Braeden Bowen
Scott Grigsby
Timothy Anderson
Jeremy Gwinnup
LRM
14
5
0
05 Mar 2024
Latent Attention for Linear Time Transformers
Latent Attention for Linear Time Transformers
Rares Dolga
Marius Cobzarenco
David Barber
18
1
0
27 Feb 2024
Parallelized Spatiotemporal Binding
Parallelized Spatiotemporal Binding
Gautam Singh
Yue Wang
Jiawei Yang
B. Ivanovic
Sungjin Ahn
Marco Pavone
Tong Che
36
1
0
26 Feb 2024
Disentangled 3D Scene Generation with Layout Learning
Disentangled 3D Scene Generation with Layout Learning
Dave Epstein
Ben Poole
B. Mildenhall
Alexei A. Efros
Aleksander Holynski
CoGe
OCL
3DV
37
20
0
26 Feb 2024
VOLoc: Visual Place Recognition by Querying Compressed Lidar Map
VOLoc: Visual Place Recognition by Querying Compressed Lidar Map
Xudong Cai
Yongcai Wang
zhe Huang
Yu Shao
Deying Li
18
4
0
25 Feb 2024
Multimodal Transformer With a Low-Computational-Cost Guarantee
Multimodal Transformer With a Low-Computational-Cost Guarantee
Sungjin Park
Edward Choi
28
1
0
23 Feb 2024
Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised
  Learning
Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning
Johnathan Xie
Yoonho Lee
Annie S. Chen
Chelsea Finn
20
3
0
22 Feb 2024
Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
Fabien Baradel
M. Armando
Salma Galaaoui
Romain Brégier
Philippe Weinzaepfel
Grégory Rogez
Thomas Lucas
3DH
33
18
0
22 Feb 2024
Semantic Image Synthesis with Unconditional Generator
Semantic Image Synthesis with Unconditional Generator
Jungwoo Chae
Hyunin Cho
Sooyeon Go
Kyungmook Choi
Youngjung Uh
42
4
0
22 Feb 2024
PQA: Zero-shot Protein Question Answering for Free-form Scientific
  Enquiry with Large Language Models
PQA: Zero-shot Protein Question Answering for Free-form Scientific Enquiry with Large Language Models
Eli M. Carrami
Sahand Sharifzadeh
24
2
0
21 Feb 2024
YOLOv9: Learning What You Want to Learn Using Programmable Gradient
  Information
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Chien-Yao Wang
I-Hau Yeh
Hongpeng Liao
41
1,031
0
21 Feb 2024
User-LLM: Efficient LLM Contextualization with User Embeddings
User-LLM: Efficient LLM Contextualization with User Embeddings
Lin Ning
Luyang Liu
Jiaxing Wu
Neo Wu
D. Berlowitz
Sushant Prakash
Bradley Green
S. O’Banion
Jun Xie
37
32
0
21 Feb 2024
The Revolution of Multimodal Large Language Models: A Survey
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni
Federico Cocchi
Luca Barsellotti
Nicholas Moratelli
Sara Sarto
Lorenzo Baraldi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
LRM
VLM
46
41
0
19 Feb 2024
Perceiving Longer Sequences With Bi-Directional Cross-Attention
  Transformers
Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers
Markus Hiller
Krista A. Ehinger
Tom Drummond
33
0
0
19 Feb 2024
Universal Physics Transformers: A Framework For Efficiently Scaling Neural Operators
Universal Physics Transformers: A Framework For Efficiently Scaling Neural Operators
Benedikt Alkin
Andreas Fürst
Simon Schmid
Lukas Gruber
Markus Holzleitner
Johannes Brandstetter
PINN
AI4CE
35
8
0
19 Feb 2024
Semantically-aware Neural Radiance Fields for Visual Scene
  Understanding: A Comprehensive Review
Semantically-aware Neural Radiance Fields for Visual Scene Understanding: A Comprehensive Review
Thang-Anh-Quan Nguyen
Amine Bourki
Mátyás Macudzinski
Anthony Brunel
M. Bennamoun
25
9
0
17 Feb 2024
3D Diffuser Actor: Policy Diffusion with 3D Scene Representations
3D Diffuser Actor: Policy Diffusion with 3D Scene Representations
Tsung-Wei Ke
N. Gkanatsios
Katerina Fragkiadaki
VGen
28
102
0
16 Feb 2024
Are Semi-Dense Detector-Free Methods Good at Matching Local Features?
Are Semi-Dense Detector-Free Methods Good at Matching Local Features?
Matthieu Vilain
Rémi Giraud
Hugo Germain
Guillaume Bourmaud
16
1
0
13 Feb 2024
Offline Actor-Critic Reinforcement Learning Scales to Large Models
Offline Actor-Critic Reinforcement Learning Scales to Large Models
Jost Tobias Springenberg
A. Abdolmaleki
Jingwei Zhang
Oliver Groth
Michael Bloesch
...
Sarah Bechtle
Steven Kapturowski
Roland Hafner
N. Heess
Martin Riedmiller
OffRL
LRM
19
11
0
08 Feb 2024
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu
Jaehong Yoon
Mohit Bansal
77
4
0
08 Feb 2024
Tuning In: Analysis of Audio Classifier Performance in Clinical Settings
  with Limited Data
Tuning In: Analysis of Audio Classifier Performance in Clinical Settings with Limited Data
Hamza Mahdi
Eptehal Nashnoush
Rami Saab
Arjun Balachandar
Rishit Dagli
Lucas X. Perri
H. Khosravani
11
1
0
07 Feb 2024
Positional Encoding Helps Recurrent Neural Networks Handle a Large
  Vocabulary
Positional Encoding Helps Recurrent Neural Networks Handle a Large Vocabulary
Takashi Morita
6
3
0
31 Jan 2024
Previous
123456...121314
Next